AWS SageMaker - 在本地训练但部署到 AWS?
我在使用 SageMaker 时遇到了以下挑战:
I have a the following challenge with SageMaker:
- 我下载了其中一本教程笔记本(https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_abalone_age_predictor_using_keras/tensorflow_abalone_age_predictor_using_keras.ipynb)
我通过修改以下行在本地(成功)运行了培训:
- I've downloaded one of the tutorial notebooks (https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_abalone_age_predictor_using_keras/tensorflow_abalone_age_predictor_using_keras.ipynb)
I ran the training locally (successfully) with the modifying the following line:
abalone_estimator = TensorFlow(entry_point='abalone.py',
role=role,
training_steps= 100,
evaluation_steps= 100,
hyperparameters={'learning_rate': 0.001},
train_instance_count=1,
**train_instance_type='local'**)
abalone_estimator.fit(inputs)
然后我想使用以下行将我的模型部署到 AWS,但 SDK 似乎是在本地部署它(它没有失败,我只是看到它在我的机器上运行)
I then wanted to deploy my model to AWS with the following line but it seems the SDK deploys it locally (it doesn't fail, I just see it running on my machine)
abalone_predictor = abalone_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
有关如何修复它以便将其部署到 AWS 或重新加载我的训练模型并从头开始部署到 AWS 的任何提示?
Any tips on how to either fix it so it gets deployed to AWS or alternatively re-load my training model and deploy it to AWS from scratch?
非常感谢,斯蒂芬
在 SageMaker 上再次运行培训更容易.否则,以下是您必须执行的步骤.
Its easier to run the training again on SageMaker. Otherwise, here are the steps that you would have to do.
- 将训练期间生成的检查点文件转换为 tensorflow 服务模型.
- 以特定格式压缩它们并上传到 S3
- 然后像上面所做的那样创建估算器并进行推理.
如果您想了解上述每个具体步骤的详细信息,请告诉我,但如果您的数据集不是太大,我会说只需在 SageMaker 上重新训练即可.
If you want details on each of the specific steps above do let me know, but if your dataset is not too big, I would say just retrain on SageMaker.