从Spark到MySQL的JDBC写入速度低
问题描述:
我需要从Spark a DataFrame向MySQL写大约一百万行,但是插入速度太慢.我该如何改善?
I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?
以下代码:
df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')
答
https://stackoverflow.com/a/10617768/3318517 为我工作.将rewriteBatchedStatements=true
添加到连接URL. (请参见连接器的配置属性/J .)
The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true
to the connection URL. (See Configuration Properties for Connector/J.)
我的基准测试从3325秒变为42秒!
My benchmark went from 3325 seconds to 42 seconds!