向 Pandas DataFrame 添加新列导致 NaN
我有一个带有以下交易数据的 Pandas DataFrame data
:
I have a pandas DataFrame data
with the following transaction data:
A date
0 M000833 2016-08-01
1 M000833 2016-08-01
2 M000833 2016-08-02
3 M000833 2016-08-02
4 M000511 2016-08-05
我想要一个新列,其中包含每个消费者的访问次数(每天多次访问应视为 1).
I want a new column with the count of number of visits (multiple visits per day should be treated as 1) per consumer.
所以我尝试了这个:
import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()
当我只运行该语句而不将其分配给 DataFrame 时,我会得到一个带有所需输出的 Pandas 系列.但是,上述语句导致:
When I just run the statement without assigning it to the DataFrame, I get a pandas series with the desired output. However, the above statement result in:
A date noofvisits
0 M000833 2016-08-01 NaN
1 M000833 2016-08-01 NaN
2 M000833 2016-08-02 NaN
3 M000833 2016-08-02 NaN
4 M000511 2016-08-05 NaN
预期输出为:
A date noofvisits
0 M000833 2016-08-01 2
1 M000833 2016-08-01 2
2 M000833 2016-08-02 2
3 M000833 2016-08-02 2
4 M000511 2016-08-05 1
这种方法有什么问题?为什么 noofvisits 列的结果是 NAs 而不是计数值?
What is wrong with this approach? Why does the column noofvisits results in NAs rather than the count values?
使用 transform
生成一个 Series
,它的索引与原始 df 对齐:
Use transform
to generate a Series
with it's index aligned to the original df:
In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df
Out[32]:
A date noofvisits
index
0 M000833 2016-08-01 2
1 M000833 2016-08-01 2
2 M000833 2016-08-02 2
3 M000833 2016-08-02 2
4 M000511 2016-08-05 1
直接分配的问题是你在 'A'
列上 group
ing 所以这成为 groupby
聚合的索引,然后您尝试分配给您的 df 但索引不一致,因此 NaN
列值.
The problem with direct assigning is that you're group
ing on column 'A'
so this becomes the index of the groupby
aggregation, you then try to assign to your df but the indices don't agree hence the NaN
column values.
此外,即使索引值确实一致,形状仍然不同:
Also even if the index values did agree the shape is different anyway:
In[33]:
df.groupby(['A'])['date'].nunique()
Out[33]:
A
M000511 1
M000833 2
Name: date, dtype: int64