根据列中的最大值过滤 Pandas Dataframe
我有一个在索引中包含重复值的 DataFrame.我想通过选择索引中不同列中具有最大值的行来过滤此数据集,以仅显示每个索引的一个实例.例如,我的 DataFrame 如下所示:
I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:
df:
Product ID Store Sales
1 A 50
1 B 200
1 C 20
2 A 400
2 B 10
3 A 200
4 A 50
4 B 100
4 C 500
我想将此数据过滤为:
df2:
Product ID Store Sales
1 B 200
2 A 400
3 A 200
4 C 500
关于如何最好地在 Pandas 中解决这个问题的任何想法?
Any thoughts on how best to approach this issue in pandas?
非常感谢您的时间 -
Thanks very much for your time -
您可以在 'Product ID' 上执行 groupby
,然后在 'Sales' 列上应用 idxmax
.这将创建一个具有最高值索引的系列.然后我们可以使用索引值使用 iloc
You can perform a groupby
on 'Product ID', then apply idxmax
on 'Sales' column.
This will create a series with the index of the highest values.
We can then use the index values to index into the original dataframe using iloc
In [201]:
df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
Product_ID Store Sales
1 1 B 200
3 2 A 400
5 3 A 200
8 4 C 500