根据列中的最大值过滤 Pandas Dataframe

问题描述：

我有一个在索引中包含重复值的 DataFrame.我想通过选择索引中不同列中具有最大值的行来过滤此数据集，以仅显示每个索引的一个实例.例如，我的 DataFrame 如下所示:

I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:

df:

Product ID     Store     Sales
    1            A         50
    1            B        200
    1            C         20
    2            A        400
    2            B         10
    3            A        200
    4            A         50
    4            B        100
    4            C        500

我想将此数据过滤为:

df2:

Product ID     Store     Sales
    1            B        200
    2            A        400
    3            A        200
    4            C        500

关于如何最好地在 Pandas 中解决这个问题的任何想法?

Any thoughts on how best to approach this issue in pandas?

非常感谢您的时间 -

Thanks very much for your time -

答

您可以在 'Product ID' 上执行 groupby，然后在 'Sales' 列上应用 idxmax.这将创建一个具有最高值索引的系列.然后我们可以使用索引值使用 iloc

You can perform a groupby on 'Product ID', then apply idxmax on 'Sales' column. This will create a series with the index of the highest values. We can then use the index values to index into the original dataframe using iloc

In [201]:

df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
   Product_ID Store  Sales
1           1     B    200
3           2     A    400
5           3     A    200
8           4     C    500

根据列中的最大值过滤 Pandas Dataframe

相关推荐