Keras VGG16预处理输入模式

问题描述:

我正在使用 Keras VGG16模型.

我已经看到它有一个 preprocess_input方法与VGG16模型结合使用.该方法似乎调用了imagenet_utils.py中的 preprocess_input方法(视情况而定)调用 _preprocess_numpy_input方法在imagenet_utils.py 中.

I've seen it there is a preprocess_input method to use in conjunction with the VGG16 model. This method appears to call the preprocess_input method in imagenet_utils.py which (depending on the case) calls _preprocess_numpy_input method in imagenet_utils.py.

preprocess_input具有一个mode参数,该参数需要"caffe","tf"或"torch".如果我在Keras中使用带有TensorFlow后端的模型,我是否应该绝对使用mode="tf"?

The preprocess_input has a mode argument which expects "caffe", "tf", or "torch". If I'm using the model in Keras with TensorFlow backend, should I absolutely use mode="tf"?

如果是,这是否是因为Keras加载的VGG16模型是使用经过相同预处理(即,将输入图像的范围从[0,255]更改为输入范围[-1,1])训练的图像的?

If yes, is this because the VGG16 model loaded by Keras was trained with images which underwent the same preprocessing (i.e. changed input image's range from [0,255] to input range [-1,1])?

此外,用于测试模式的输入图像也应进行此预处理吗?我相信最后一个问题的答案是肯定的,但是我希望得到保证.

Also, should the input images for testing mode also undergo this preprocessing? I'm confident the answer to the last question is yes, but I would like some reassurance.

我希望弗朗索瓦·乔勒(Francois Chollet)能够正确地做到这一点,但是请查看时错了.

I would expect Francois Chollet to have done it correctly, but looking at https://github.com/fchollet/deep-learning-models/blob/master/vgg16.py either he is or I am wrong about using mode="tf".

更新的信息

@FalconUA将我定向到牛津大学的VGG 有一个 Models 部分,其中包含16层模型的链接.有关preprocessing_input mode参数tf缩放到-1到1以及caffe减去一些平均值的信息,可通过以下 Models 模型的16层模型中的链接找到:信息页面.在说明"部分中说:

@FalconUA directed me to the VGG at Oxford which has a Models section with links for the 16-layer model. The information about the preprocessing_input mode argument tf scaling to -1 to 1 and caffe subtracting some mean values is found by following the link in the Models 16-layer model: information page. In the Description section it says:

在本文中,该模型表示为经过比例抖动训练的配置D.输入图像应以平均像素(而不是平均图像)相减为零.即,应减去以下BGR值: [103.939,116.779,123.68]."

"In the paper, the model is denoted as the configuration D trained with scale jittering. The input images should be zero-centered by mean pixel (rather than mean image) subtraction. Namely, the following BGR values should be subtracted: [103.939, 116.779, 123.68]."

这里的mode与后端无关,而与有关模型在其上进行训练和移植的框架有关. keras链接中,指向VGG16:

The mode here is not about the backend, but rather about on what framework the model was trained on and ported from. In the keras link to VGG16, it is stated that:

这些权重均来自牛津 VGG在牛津发布的权重 a>

因此,在Caffe中对VGG16和VGG19模型进行了训练,并将其移植到TensorFlow,因此在此处为mode == 'caffe'(范围为0到255,然后提取均值[103.939, 116.779, 123.68]).

So the VGG16 and VGG19 models were trained in Caffe and ported to TensorFlow, hence mode == 'caffe' here (range from 0 to 255 and then extract the mean [103.939, 116.779, 123.68]).

较新的网络(例如 MobileNet ShuffleNet )在TensorFlow上进行了训练,因此mode'tf',并且输入在以下范围内为零中心-1至1.

Newer networks, like MobileNet and ShuffleNet were trained on TensorFlow, so mode is 'tf' for them and the inputs are zero-centered in the range from -1 to 1.