从日期时间范围和以熊猫为单位的计算列
我想计算每组每周的最大值,并用熊猫中的这些值创建一个新列.我发布类似的问题并没有解决我的问题,所以我重新构造了问题.
I want to calculate the max value per week per group and to create a new column with these values in pandas. I posted a similar question that did not solve my problem, so I restructured the question.
考虑一个带有时间戳,组和值列的数据框:
Consider a dataframe with timestamp, group and value columns:
datetime group value
2014-05-07 A 3
2014-05-07 B 4
2014-05-14 A 4
2014-05-14 B 2
2014-05-15 A 6
2014-05-15 B 4
2014-05-16 A 7
2014-05-16 B 10
我想创建一个新的列,每个星期的最大值按组:
I want to create a new column with the maximum value per week by group:
datetime group value maxval
2014-05-07 A 3 3
2014-05-07 B 4 4
2014-05-14 A 4 7
2014-05-14 B 2 10
2014-05-15 A 6 7
2014-05-15 B 4 10
2014-05-16 A 7 7
2014-05-16 B 10 10
在链接的问题中,提出的解决方案是转换groupby子句,然后将其附加到数据框,但这在系列中造成了排序错误.
In the linked question, the solution presented was to transform a groupby clause and then attach it to the dataframe, however this is creating ordering errors in the series.
您可以同时在group
和星期上为transform
组建立索引:
You can transform
groups indexed on both group
and the week simultaneously:
>>> week = pd.DatetimeIndex(df.datetime).week
>>> df["maxval"] = df.groupby(['group', week])["value"].transform('max')
>>> df
datetime group value maxval
0 2014-05-07 A 3 3
1 2014-05-07 B 4 4
2 2014-05-14 A 4 7
3 2014-05-14 B 2 10
4 2014-05-15 A 6 7
5 2014-05-15 B 4 10
6 2014-05-16 A 7 7
7 2014-05-16 B 10 10
请注意,如果您有很多年,这会将每年的第二个星期(例如)合并到同一组中.
Note that if you have multiple years this will combine the second week (e.g.) of each year into the same group.
有时候人们会想要,但是如果您不想要,您可以用相同的方式将年份添加到分组数量中.
Sometimes people want that, but if you don't, you could add the year to the grouped quantities in the same way.
如果要改为滚动最大值,则可以使用(适当地)rolling_max
.您可以自己重新采样,也可以让rolling_max
进行采样,例如
If you want instead a rolling maximum, you can use (appropriately enough) rolling_max
. You can either resample yourself or get rolling_max
to do it, something like
def rolling_max_week(x):
rolled = pd.rolling_max(x, 7, min_periods=1, center=True, freq='d')
match_x = rolled.loc[x.index]
return match_x
df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index("datetime")
df["rolling_max"] = df.groupby("group")["value"].transform(rolling_max_week)
df["bin_max"] = df.groupby(["group", df.index.week])["value"].transform(max)
现在,这两种情况在您的样本上产生的输出完全相同:
Now as it happens, these two produce exactly the same output on your sample:
>>> df
group value rolling_max bin_max
datetime
2014-05-07 A 3 3 3
2014-05-07 B 4 4 4
2014-05-14 A 4 7 7
2014-05-14 B 2 10 10
2014-05-15 A 6 7 7
2014-05-15 B 4 10 10
2014-05-16 A 7 7 7
2014-05-16 B 10 10 10
,但通常情况并非如此.您需要阅读rolling_max
的文档,并使用一些测试用例,以确保我正确地解释了您想要的内容.
but that won't be true in general. You'll want to read the documentation for rolling_max
and play around with some test cases to be sure that I'm interpreting what you want correctly.