按值范围对数据进行分组
我有一个csv文件,该文件按订单显示零件.列包括延迟天数,数量和商品.
I have a csv file that shows parts on order. The columns include days late, qty and commodity.
我需要将数据按天数分组,并将商品与数量相加.但是,需要将延迟天数分组.
I need to group the data by days late and commodity with a sum of the qty. However the days late needs to be grouped into ranges.
>56
>35 and <= 56
>14 and <= 35
>0 and <=14
我希望我可以以某种方式使用dict.像这样
I was hoping I could use a dict some how. Something like this
{'Red':'>56,'Amber':'>35 and <= 56','Yellow':'>14 and <= 35','White':'>0 and <=14'}
我正在寻找这样的结果
Red Amber Yellow White
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
我是熊猫新手,所以我根本不知道这是否可能.谁能提供一些建议.
I am new to pandas so I don't know if this is possible at all. Could anyone provide some advice.
谢谢
假设您从以下数据开始:
Suppose you start with this data:
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
# Days Late ID quantity
# 0 60 STRSUB 56
# 1 60 BOTDWG 20
# 2 50 STRSUB 60
# 3 50 BOTDWG 67
# 4 20 STRSUB 74
# 5 20 BOTDWG 87
# 6 10 STRSUB 40
# 7 10 BOTDWG 34
然后您可以使用pd.cut
查找状态类别.请注意,默认情况下,pd.cut
将系列df['Days Late']
划分为半开间隔(-1, 14], (14, 35], (35, 56], (56, 365]
:
Then you can find the status category using pd.cut
. Note that by default, pd.cut
splits the Series df['Days Late']
into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]
:
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
# ID quantity status
# 0 STRSUB 56 Red
# 1 BOTDWG 20 Red
# 2 STRSUB 60 Amber
# 3 BOTDWG 67 Amber
# 4 STRSUB 74 Yellow
# 5 BOTDWG 87 Yellow
# 6 STRSUB 40 White
# 7 BOTDWG 34 White
现在使用 pivot
来以所需的形式获取DataFrame:
Now use pivot
to get the DataFrame in the desired form:
df = df.pivot(index='ID', columns='status', values='quantity')
并使用reindex
获取行和列的所需顺序:
and use reindex
to obtain the desired order for the rows and columns:
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
因此
Thus,
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)
收益
Red Amber Yellow White
ID
STRSUB 56 60 74 40
BOTDWG 20 67 87 34