按值范围对数据进行分组

问题描述：

我有一个csv文件，该文件按订单显示零件.列包括延迟天数，数量和商品.

I have a csv file that shows parts on order. The columns include days late, qty and commodity.

我需要将数据按天数分组，并将商品与数量相加.但是，需要将延迟天数分组.

I need to group the data by days late and commodity with a sum of the qty. However the days late needs to be grouped into ranges.

>56
>35 and <= 56
>14 and <= 35
>0 and <=14

我希望我可以以某种方式使用dict.像这样

I was hoping I could use a dict some how. Something like this

{'Red':'>56,'Amber':'>35 and <= 56','Yellow':'>14 and <= 35','White':'>0 and <=14'}

我正在寻找这样的结果

        Red  Amber  Yellow  White
STRSUB  56   60     74      40
BOTDWG  20   67     87      34

我是熊猫新手，所以我根本不知道这是否可能.谁能提供一些建议.

I am new to pandas so I don't know if this is possible at all. Could anyone provide some advice.

谢谢

答

假设您从以下数据开始:

Suppose you start with this data:

df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                   'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                   'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
#    Days Late      ID  quantity
# 0         60  STRSUB        56
# 1         60  BOTDWG        20
# 2         50  STRSUB        60
# 3         50  BOTDWG        67
# 4         20  STRSUB        74
# 5         20  BOTDWG        87
# 6         10  STRSUB        40
# 7         10  BOTDWG        34

然后您可以使用pd.cut查找状态类别.请注意，默认情况下，pd.cut将系列df['Days Late']划分为半开间隔(-1, 14], (14, 35], (35, 56], (56, 365]:

Then you can find the status category using pd.cut. Note that by default, pd.cut splits the Series df['Days Late'] into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]:

df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
#        ID  quantity  status
# 0  STRSUB        56     Red
# 1  BOTDWG        20     Red
# 2  STRSUB        60   Amber
# 3  BOTDWG        67   Amber
# 4  STRSUB        74  Yellow
# 5  BOTDWG        87  Yellow
# 6  STRSUB        40   White
# 7  BOTDWG        34   White

现在使用 pivot 来以所需的形式获取DataFrame:

Now use pivot to get the DataFrame in the desired form:

df = df.pivot(index='ID', columns='status', values='quantity')

并使用reindex获取行和列的所需顺序:

and use reindex to obtain the desired order for the rows and columns:

df = df.reindex(columns=labels[::-1], index=df.index[::-1])

因此

Thus,

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                   'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                   'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)

收益

        Red  Amber  Yellow  White
ID                               
STRSUB   56     60      74     40
BOTDWG   20     67      87     34

按值范围对数据进行分组

相关推荐