Tensorflow解码JPEG:预期的图像(JPEG,PNG或GIF),格式从'\ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 00'开始的未知格式
我正在循环浏览图像文件夹,并且这种情况一直在发生.
I'm cycling through an image folder and this keeps happening.
tensorflow.python.framework.errors_impl.InvalidArgumentError:预期的图像(JPEG,PNG或GIF),格式从'\ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000 \ 000开始的未知格式\ 000 \ 000 \ 000 \ 000 \ 000 \ 000'[[{{node DecodeJpeg}}]]
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000' [[{{node DecodeJpeg}}]]
此文件夹中有一些不是图像的文件,但是应该由我的上一步过滤掉.任何人都知道发生了什么事吗?
There are files in this folder that aren't images, but they should be filtered by my previous step. Anyone has an idea of what's going on?
test_files_ds = tf.data.Dataset.list_files(myFolder + '/*.jpg')
AUTOTUNE = tf.data.experimental.AUTOTUNE
def process_unlabeled_img(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(images=img, size=(224, 224))
return file_path, img
在没有手头文件的情况下很难确切知道发生了什么,但是这里可能发生的情况是您的数据集中有文件 .jpg
, .jpeg
, .png
或 .gif
扩展名,但实际上不是JPEG,PNGGIF图片.因此,TensorFlow无法正确加载它们.
It's hard to know exactly what is going on without having the file at hand, but what is probably happening here is that you have files in your dataset that have either a .jpg
, .jpeg
, .png
or .gif
extension but that are not actually JPEG, PNG of GIF images. Thus, TensorFlow isn't able to properly load them.
解决此问题的一种方法是检查您认为是图像的文件,并去除不是实际JPEG,PNG或GIF图像的文件.
One way to overcome this problem would be to check your files that are supposedly images and get rid of the ones that aren't actual JPEG, PNG or GIF images.
检查文件是否为有效 JPEG,PNG或GIF图像肯定比看起来复杂得多,但要检查文件签名/幻数(即,文件的前几个字节)文件)是一个好的开始,应该在大多数时候可以解决您的问题.
Checking if a file is a valid JPEG, PNG or GIF image is definitely more complicated than it seems, but checking for the file signature / magic number (that is, the first few bytes of your file) is a good start and should most of the time solve your problems.
因此,实际上,您可以采用许多不同的方式进行操作,其中一种方法是使用某种功能分别检查每张图片是否有效( ).
So, practically, you could do so in many different ways, one of which being checking for each picture individually if it is valid or not, with some function of this sort:
def is_image(filename, verbose=False):
data = open(filename,'rb').read(10)
# check if file is JPG or JPEG
if data[:3] == b'\xff\xd8\xff':
if verbose == True:
print(filename+" is: JPG/JPEG.")
return True
# check if file is PNG
if data[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a':
if verbose == True:
print(filename+" is: PNG.")
return True
# check if file is GIF
if data[:6] in [b'\x47\x49\x46\x38\x37\x61', b'\x47\x49\x46\x38\x39\x61']:
if verbose == True:
print(filename+" is: GIF.")
return True
return False
然后您可以通过执行以下操作来删除您的无效图像(这将删除您的无效图片):
You would then be able to get rid of your non valid images by doing something like this (this would delete your non valid images):
import os
# go through all files in desired folder
for filename in os.listdir(folder):
# check if file is actually an image file
if is_image(filename, verbose=False) == False:
# if the file is not valid, remove it
os.remove(os. path. join(folder, filename))
现在,正如我所说,这可能会解决您的问题,但是请注意, is_image
函数将不能够确定文件是否可以被读取为JPG,JPEG,PNG或GIF图像.但这只是一种快速而肮脏的解决方案,它可以消除绝大多数错误,但不是全部.
Now, as I said, this would probably solve your problem but please note that the function is_image
will not be able to tell for sure if a file can or cannot be read as a JPG, JPEG, PNG or GIF image. It is only a quick and dirty solution that will get the vast majority of errors alike away, but not all.