如何将数据从胶水移动到Dynamodb
我们正在为我们的一个仪表板应用程序设计一个大数据解决方案,并认真考虑将Glue用作我们的初始ETL。当前,Glue支持JDBC和S3作为目标,但是我们的下游服务和组件将与dynamodb一起更好地工作。我们想知道什么是将记录从Glue移至Dynamo的最佳方法。
We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream services and components will work better with dynamodb. We are wondering what is the best approach to eventually move the records from Glue to Dynamo.
我们应该先写入S3,然后运行lambda将数据插入Dynamo吗?那是最佳实践吗?或
我们应该为Dynamodb使用第三方JDBC包装器,还是使用Glue直接写入Dynamo(不确定这是否可行,听起来有点吓人)或
我们是否应该做其他事情?
Should we write to S3 first and then run lambdas to insert the data into Dynamo? Is that the best practice? OR Should we use a third party JDBC wrapper for Dynamodb and use Glue to directly write to Dynamo (Not sure if this is possible, sounds a bit scary) OR Should we do something else?
任何帮助将不胜感激。谢谢!
Any help is greatly appreciated. Thanks!
您可以在Glue ETL脚本中添加以下几行:
You can add the following lines to your Glue ETL script:
glueContext.write_dynamic_frame.from_options(frame =DynamicFrame.fromDF(df, glueContext, "final_df"), connection_type = "dynamodb", connection_options = {"tableName": "pceg_ae_test"})
df应该是DynamicFrame类型
df should be of type DynamicFrame