Tensorflow,尝试和除外不处理异常

问题描述:

我是tensorflow的新手,在这里我一直困扰着一个烦人的问题.

I'm new to tensorflow and I've been stuck at an annoying problem here.

我正在编写一个程序,该程序从tfrecord文件中加载使用tf.WholeFileReader.read(image_name_queue)拍摄的图像原始数据",然后使用tf.image.decode_jpeg(raw_data, channels=3)对其进行解码,然后将其传递给对它进行矢量化的函数.

I'm making a program that loads image "raw data" taken with tf.WholeFileReader.read(image_name_queue) from a tfrecord file and then decodes it with tf.image.decode_jpeg(raw_data, channels=3) and then passes it through a function that vectorizes it.

主要代码

logging.info('setting up folder')
create_image_data_folder()
save_configs()

logging.info('creating graph')
filename_queue = tf.train.string_input_producer([
                                             configs.TFRECORD_IMAGES_PATH],
                                             num_epochs=1)

image_tensor, name_tensor = read_and_decode(filename_queue)
image_batch_tensor, name_batch_tensor = tf.train.shuffle_batch(
                                        [image_tensor, name_tensor],
                                        configs.BATCH_SIZE,
                                        1000 + 3 * configs.BATCH_SIZE,
                                        min_after_dequeue=1000)
image_embedding_batch_tensor = configs.IMAGE_EMBEDDING_FUNCTION(image_batch_tensor)

init = tf.initialize_all_variables()
init_local = tf.initialize_local_variables()
logging.info('starting session')
with tf.Session().as_default() as sess:
    sess.run(init)
    sess.run(init_local)
    tf.train.start_queue_runners()

    logging.info('vectorizing')
    data_points = []
    for _ in tqdm(xrange(get_n_batches())):
        name_batch = sess.run(name_batch_tensor)
        image_embedding_batch = sess.run(image_embedding_batch_tensor)
        for vector, name in zip(list(image_embedding_batch), name_batch):
            data_points.append((vector, name))

logging.info('saving')
save_pkl_file(data_points, 'vectors.pkl')

读取和解码功能

def read_and_decode(tfrecord_file_queue):
    logging.debug('reading image and decodes it from queue')
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(tfrecord_file_queue)
    features = tf.parse_single_example(serialized_example,
        features={
            'image': tf.FixedLenFeature([], tf.string),
            'name': tf.FixedLenFeature([], tf.string)
        }
    )
    image = process_image_data(features['image'])

    return image, features['name']

代码正在运行,但最终遇到了错误的非jpeg文件,并引发了错误,程序停止运行.

The code is working, but eventually it comes across a bad, non-jpeg file and an error this is raised and the program stops running.

错误

InvalidArgumentError (see above for traceback): Invalid JPEG data, size 556663

我想跳过这些错误".我试图用tryexcept包围代码.

I want to skip these "errors". I tried to surround the code with try and except.

新代码

for _ in tqdm(xrange(get_n_batches())):
    try:
        name_batch = sess.run(name_batch_tensor)
        image_embedding_batch = sess.run(image_embedding_batch_tensor)
        for vector, name in zip(list(image_embedding_batch), name_batch):
            data_points.append((vector, name))
    except Exception as e:
        logging.warning('error occured: {}'.format(e))

当我再次运行该程序时,会出现相同的错误,tryexcept 似乎无法处理该错误.

When I run the program again the same error occurs, the try and except doesn't seem to handle the error.

如何处理这些异常?另外,如果您发现我误解了张量流的结构",请提及.

How can I handle these exceptions? Also, if you see that I've misunderstood the tensorflow "structure" please mention that.

我知道这不适用于您的示例,但是我偶然发现了一个不同的场景,尽管这样做,TensorFlow似乎也没有捕获到异常>

I know this is not applicable to your example, but I stumbled upon a different scenario in which TensorFlow didn't seem to catch the exception, despite doing

try:
    # Run code that throw tensorflow error
except:
    print('This won't catch the exception...')

使问题难以解决的原因是TensorFlow调试指向错误的行;它告诉我错误在于图的构造,而不是图的执行.

The thing that made the problem so hard to solve was that TensorFlow debugging pointed to the wrong line; it showed me that the error lied in the graph construction, rather than graph execution.

具体问题?

我试图从.meta文件还原模型:

I tried to restore a model from a .meta file:

try:
    saver = tf.train.import_meta_graph('my_error_generating_model.meta') # tf throws err here
    graph = tf.get_default_graph()
except:
    print('This won't run')

with tf.Session() as sess:
    # This is where error is actually generated
    saver.restore(sess, tf.train.latest_checkpoint('./'))
    sess.run(...) # Propagating through graph generates a problem

当然,解决方案是将try-catch包装器放在执行代码周围!

The solution is, of course, to put your try-catch wrapper around the executing code instead!