如何使用Tensorflow Object Detection API启用多GPU训练

问题描述：

我正在尝试使用TensorFlow对象检测API进行Multi-GPU培训.

I am attempting to perform Multi-GPU training with the TensorFlow Object Detection API.

我在NVIDIA-SMI中看到的是实际上只使用了1个GPU.所提供的其他3个GPU均已加载了GPU进程，但内存使用量始终为300MB，利用率始终为0％

What I see in my NVIDIA-SMI is that only 1 GPU is actually being utilized. The other 3 GPUs that are provided have the GPU process loaded to them, but memory usage is at 300MB and utilization sits at 0% at all times

我使用的是在COCO上经过预先训练的基于SSD MobileNetV1的网络，然后使用我的自定义数据集对其进行训练.

I am using the SSD MobileNetV1 based network pretrained on COCO and then training it with my custom dataset.

我希望当我为Tensorflow提供更多GPU时，该框架实际上将使用它们来加快培训速度.

I expect that when I provide Tensorflow with more GPUs, the framework will actually use them to speed up training.

答

对于Tensorflow 2.2.0对象检测API，当您运行model_main_tf2.py时，请启用以下标志:

For Tensorflow 2.2.0 Object Detection API, when you are running model_main_tf2.py, enable this flags:

python model_main_tf2.py --num_workers=2

---num_workers的任何整数>在图1中，tensorflow使用所有可用的GPU，如果您只想使用某些GPU，则必须编辑此model_main_tf2.py文件，在该文件中指定策略，同时将num_workers保持默认值1.例如，这使用了计算机的第一和第二个gpu:

for any integer for --num_workers > 1, tensorflow uses all available gpus, if you want to use only some of the gpus, you have to edit this model_main_tf2.py file where it specifies the strategy while keeping the num_workers in default 1. This for example, uses first and second gpu of the machine:

strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])

如何使用Tensorflow Object Detection API启用多GPU训练

相关推荐