从Spark到MySQL的JDBC写入速度低

问题描述:

我需要从Spark a DataFrame向MySQL写大约一百万行,但是插入速度太慢.我该如何改善?

I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?

以下代码:

df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')

https://stackoverflow.com/a/10617768/3318517 为我工作.将rewriteBatchedStatements=true添加到连接URL. (请参见连接器的配置属性/J .)

The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true to the connection URL. (See Configuration Properties for Connector/J.)

我的基准测试从3325秒变为42秒!

My benchmark went from 3325 seconds to 42 seconds!