Python Pandas Dataframe按组中的最大值选择行
我有一个通过df.pivot创建的数据框:
I have a dataframe which was created via a df.pivot:
type start end
F_Type to_date
A 20150908143000 345 316
B 20150908140300 NaN 480
20150908140600 NaN 120
20150908143000 10743 8803
C 20150908140100 NaN 1715
20150908140200 NaN 1062
20150908141000 NaN 145
20150908141500 418 NaN
20150908141800 NaN 450
20150908142900 1973 1499
20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
我想为每个"F_TYPE"过滤并返回一行,只返回最大"to_date"的行.我想返回以下数据框:
I would like to filter and return a single row for each 'F_TYPE' only returning the row with the Maximum 'to_date'. I would like to return the following dataframe:
type start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
谢谢..
一种标准方法是使用groupby(keys)[column].idxmax()
.
但是,要使用 idxmax
您需要idxmax
返回唯一索引值.获取唯一索引的一种方法是调用reset_index
.
A standard approach is to use groupby(keys)[column].idxmax()
.
However, to select the desired rows using idxmax
you need idxmax
to return unique index values. One way to obtain a unique index is to call reset_index
.
一旦从groupby(keys)[column].idxmax()
获得索引值,就可以使用df.loc
选择整个行:
Once you obtain the index values from groupby(keys)[column].idxmax()
you can then select the entire row using df.loc
:
In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]:
start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
注意:idxmax
返回索引 labels ,不一定是普通索引.使用reset_index
后,索引标签也恰好是普通标签,但由于idxmax
返回标签(不是普通标签),最好始终始终将idxmax
与df.loc
结合使用,而不是df.iloc
(就像我最初在这篇文章中所做的那样.)
Note: idxmax
returns index labels, not necessarily ordinals. After using reset_index
the index labels happen to also be ordinals, but since idxmax
is returning labels (not ordinals) it is better to always use idxmax
in conjunction with df.loc
, not df.iloc
(as I originally did in this post.)