使用 python 和 sqlite3 插入的性能

使用 python 和 sqlite3 插入的性能

问题描述:

我正在对 SQLite3 数据库进行大批量插入,我试图了解我应该期待什么样的性能与我实际看到的性能.

I'm doing big batch inserts into an SQLite3 database and I'm trying to get a sense for what sort of performance I should be expecting versus what I'm actually seeing.

我的桌子是这样的:

cursor.execute(
            "CREATE TABLE tweets(
             tweet_hash TEXT PRIMARY KEY ON CONFLICT REPLACE,
             tweet_id INTEGER,
             tweet_text TEXT)"
        )

我的插入看起来像这样:

and my inserts look like this:

cursor.executemany("INSERT INTO tweets VALUES (?, ?, ?)", to_write)

其中 to_write 是元组列表.

目前,数据库中有大约 1200 万行,插入 50 000 行需要我大约 16 分钟,在 2008 年的 macbook 上运行.

Currently, with about 12 million rows in the database, inserting 50 000 rows is taking me around 16 minutes, running on a 2008 macbook.

这听起来合理吗,还是发生了什么严重的事情?

Does this sound reasonable, or is there something gross happening?

据我所知,性能不佳的主要原因是您浪费时间提交许多 SQLite 事务.怎么办?

As I understand the main reason of bad performance is time you waste to commit many SQLite transactions. What to do?

删除索引,然后

PRAGMA synchronous = OFF (or NORMAL)

插入 N 行块(定义 N,尝试 N=5000 开始).在插入块之前做

Insert blocks of N rows (define N, try N=5000 to start). Before inserting block do

BEGIN TRANSACTION

插入后

COMMIT

另见http://www.sqlite.org/faq.html#q19