将 Pandas 列从字符串 Quarters 和 Years 数组转换为日期时间列
问题描述:
我有以下数据框
Date Data
0 [Q1, 10] 8.7
1 [Q2, 10] 8.4
2 [Q3, 10] 14.1
3 [Q4, 10] 16.2
4 [Q1, 11] 18.6
5 [Q2, 11] 20.4
6 [Q3, 11] 17.1
7 [Q4, 11] 37.0
8 [Q1, 12] 35.1
9 [Q2, 12] 26.0
10 [Q3, 12] 26.9
11 [Q4, 12] 47.8
12 [Q1, 13] 37.4
13 [Q2, 13] 31.2
14 [Q3, 13] 33.8
15 [Q4, 13] 51.0
16 [Q1, 14] 43.7
17 [Q2, 14] 35.2
18 [Q3, 14] 39.3
19 [Q4, 14] 74.5
20 [Q1, 15] 61.2
21 [Q2, 15] 47.5
22 [Q3, 15] 48.0
23 [Q4, 15] 74.8
24 [Q1, 16] 51.2
25 [Q2, 16] 40.4
26 [Q3, 16] 45.5
27 [Q4, 16] 78.3
28 [Q1, 17] 50.8
29 [Q2, 17] 38.5
30 [Q3, 17] 46.7
31 [Q4, 17] 77.3
32 [Q1, 18] 52.2
33 [Q2, 18] 41.3
34 [Q3, 18] 46.9
35 [Q4, 18] 68.4
36 [Q1, 19] 36.4
37 [Q2, 19] 33.8
38 [Q3, 19] 46.6
39 [Q4, 19] 73.8
40 [Q1, 20] 36.7
41 [Q2, 20] 37.6
我想把它合并成一个 Date
到一个 Datetime 对象中,
I want to merge it into a Date
into a Datetime object,
所以Q1,10
会变成Q1,2010
,然后变成2010-03-31
我尝试了以下代码,
df['Date'] = pd.to_datetime(df['Date'].str.join('20'))
但它不起作用.
我也试过使用
df['Date'].astype(str)[:1]
访问系列中的第二列以在前面添加一个 20,但它不会让我这样做.
to access the second column in the series to add a 20 at the front but it wont let me.
将这个系列转换为 Pandas 数据时间列的最佳方法是什么?
What the best way to convert this series into a pandas datatime column?
答
首先创建季度 PeriodIndex
,然后通过PeriodIndex.to_timestamp
以及 DatetimeIndex.floor
:
First create quarter PeriodIndex
, then convert to datetimes by PeriodIndex.to_timestamp
and floor to days by DatetimeIndex.floor
:
#if necessary create lists
df['Date'] = df['Date'].str.strip('[]').str.split(',')
#test if format match
print ('20' + df['Date'].str[::-1].str.join(''))
0 2010Q1
1 2010Q2
2 2010Q3
3 2010Q4
4 2011Q1
5 2011Q2
Name: Date, dtype: object
df['Date'] = (pd.PeriodIndex('20' + df['Date'].str[::-1].str.join(''), freq='Q')
.to_timestamp(how='e')
.floor('d'))
print (df)
Date Data
0 2010-03-31 8.7
1 2010-06-30 8.4
2 2010-09-30 14.1
3 2010-12-31 16.2
4 2011-03-31 18.6
5 2011-06-30 20.4
转换为 Period
s 的替代方法:
Alternative for convert to Period
s:
df['Date'] = (df['Date'].str[::-1].str.join('').apply(lambda x: pd.Period(x, freq='Q'))
.dt.to_timestamp(how='e')
.dt.floor('d'))
或者来自@MrFuppes 的解决方案,谢谢:
Or solution from @MrFuppes, thank you:
df['Date'] = (pd.to_datetime("20"+df['Date'].str[::-1].str.join('')) +
pd.offsets.QuarterEnd(0))