熊猫如何按时间间隔按列拆分数据帧
我有一个巨大的数据框,其中的日期时间类型列称为dt
,该数据框已经基于dt
进行了排序.我想根据dt
将数据帧分为几个数据帧,每个数据帧包含1 hr
范围内的行.
I have a gigantic dataframe with a datetime type column called dt
, the data frame is sorted based on dt
already. I want to split the dataframe into several dataframes based on dt
, each dataframe contains rows within 1 hr
range.
拆分
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
进入
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
您需要 groupby
由 List comprehension
解决方案:
S = pd.to_datetime(df.dt)
print ((S - S[0]).astype('timedelta64[h]'))
0 0.0
1 0.0
2 0.0
3 1.0
4 1.0
5 3.0
Name: dt, dtype: float64
L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
print (L[1])
dt text
0 20160811 12:36 d
1 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
旧解决方案,由hour
分开:
Old solution, which split by hour
:
您可以使用 groupby
由 dt.hour
提出,但首先需要转换dt
to_datetime
:>
You can use groupby
by dt.hour
, but first need convert dt
to_datetime
:
for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
解决方案:
L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
print (L[1])
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
或使用list comprehension
将列dt
转换为datetime
:
Or use list comprehension
with converting column dt
to datetime
:
df.dt = pd.to_datetime(df.dt)
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])]
print (L[1])
dt text
0 2016-08-11 12:03:00 c
1 2016-08-11 12:36:00 d
2 2016-08-11 12:52:00 e
print (L[2])
dt text
0 2016-08-11 14:32:00 f
如果需要除以date
s和hour
s:
If need split by date
s and hour
s:
#changed dataframe for testing
print (df)
dt text
0 20160811 11:05 a
1 20160812 11:35 b
2 20160813 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
serie = pd.to_datetime(df.dt)
for i, g in df.groupby([serie.dt.date, serie.dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
dt text
0 20160812 11:35 b
dt text
0 20160813 12:03 c