如何将 Pandas DataFrame 更新到 PostgreSQL 表?

问题描述:

我从网络资源中抓取了一些数据并将它们全部存储在一个 Pandas DataFrame 中.现在,为了利用 SQLAlchemy 提供的强大的数据库工具,我想将所述 DataFrame 转换为 Table() 对象,并最终将所有数据插入 PostgreSQL 表中.如果这是可行的,那么完成这项任务的可行方法是什么?

I've scraped some data from web sources and stored it all in a pandas DataFrame. Now, in order harness the powerful db tools afforded by SQLAlchemy, I want to convert said DataFrame into a Table() object and eventually upsert all data into a PostgreSQL table. If this is practical, what is a workable method of going about accomplishing this task?

如果您使用的是 PostgreSQL 9.5 或更高版本,您可以使用临时表和 INSERT ... ON CONFLICT 语句执行 UPSERT:

If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT statement:

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
    conn.execute(
        sa.text(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
    )
    conn.execute(
        sa.text(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )
    
    # step 1 - create temporary table and upload DataFrame
    conn.execute(
        sa.text(
            "CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
        )
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.execute(
        sa.text("""\
            INSERT INTO main_table (id, txt) 
            SELECT id, txt FROM temp_table
            ON CONFLICT (id) DO
                UPDATE SET txt = EXCLUDED.txt
            """
        )
    )

    # step 3 - confirm results
    result = conn.execute(sa.text("SELECT * FROM main_table ORDER BY id")).fetchall()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]