如何从大型数据库将数据加载到熊猫中?
我有一个Postgres数据库,其中包含时间序列数据.数据库的大小约为1 GB.当前正在读取数据,这就是我要做的
I have a postgres database which contains time series data.The size of the database is around 1 GB.Currently to read data, this is what I do
import psycopg2
import pandas as pd
import pandas.io.sql as psql
conn = psycopg2.connect(database="metrics", user="*******", password="*******", host="localhost", port="5432")
cur = conn.cursor()
df = psql.read_sql("Select * from timeseries", conn)
print(df)
但这会将全部数据加载到内存中.现在我知道可以将数据库转储到csv文件中,然后可以按此处建议的方式分块读取csv文件的技术了
But this loads the entire data into the memory.Now I am aware of techniques where the database can be dumped to a csv file and then the csv file can be read in chunks as suggested here How to read a 6 GB csv file with pandas
但是对我来说这不是一个选择,因为数据库会不断变化,我需要即时读取它.是否有任何技术可以分块读取数据库内容或使用任何第三方库?
But for me that is not an option since the database will be continously changing and I need to read it on the fly.Is there any technique to read the database content maybe in chunks or use any third party libraries?