计算DataFrame每行中Series系列项目的出现次数
我有一个像这样的pandas.DataFrame
.
COL1 COL2 COL3
C1 None None
C1 C2 None
C1 C1 None
C1 C2 C3
对于此数据帧中的每一行,我想计算C1,C2,C3中每一个的出现次数,并将此信息作为列附加到此数据帧中.例如,第一行具有1 C1、0 C2和0 C3.最终的数据帧应该看起来像这样
For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this
COL1 COL2 COL3 C1 C2 C3
C1 None None 1 0 0
C1 C2 None 1 1 0
C1 C1 None 2 0 0
C1 C2 C3 1 1 1
因此,我创建了一个以C1,C2和C3为值的系列-一种最重要的方法是,在DataFrame的行和列上循环,然后在该Series上循环,如果计数器匹配,则递增计数器.但是,是否有一种apply
方法可以紧凑地实现这一目标?
So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply
approach that can achieve this in a compact fashion?
您可以申请value_counts
:
In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]:
C1 C2 C3 None
0 1 NaN NaN 2
1 1 1 NaN 1
2 2 NaN NaN 1
3 1 1 1 NaN
因此,您可以仅填写所需的基本值来填充NaN和小程序:
So you can fill the NaN and applend just the base values you want:
In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
注意:直接为DataFrame使用value_counts方法存在一个未解决的问题(我认为应该由pandas 0.15引入).