计算 DataFrame 每一行中系列中项目的出现次数
我有一个看起来像这样的 pandas.DataFrame
.
I have a pandas.DataFrame
that looks like this.
COL1 COL2 COL3
C1 None None
C1 C2 None
C1 C1 None
C1 C2 C3
对于此数据框中的每一行,我想计算 C1、C2、C3 中每一个的出现次数,并将此信息作为列附加到此数据框中.例如,第一行有 1 个 C1、0 个 C2 和 0 个 C3.最终的数据框应该是这样的
For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this
COL1 COL2 COL3 C1 C2 C3
C1 None None 1 0 0
C1 C2 None 1 1 0
C1 C1 None 2 0 0
C1 C2 C3 1 1 1
因此,我创建了一个以 C1、C2 和 C3 作为值的系列 - 一种最高计数的方法是循环遍历 DataFrame 的行和列,然后遍历这个系列,如果匹配,则增加计数器.但是是否有一种 apply
方法可以以紧凑的方式实现这一点?
So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply
approach that can achieve this in a compact fashion?
您可以申请value_counts
:
In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]:
C1 C2 C3 None
0 1 NaN NaN 2
1 1 1 NaN 1
2 2 NaN NaN 1
3 1 1 1 NaN
所以你可以只填充 NaN 和 applend 你想要的基本值:
So you can fill the NaN and applend just the base values you want:
In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]:
C1 C2 C3
0 1 0 0
1 1 1 0
2 2 0 0
3 1 1 1
注意:直接为 DataFrame 提供 value_counts 方法是一个悬而未决的问题(我认为应该由 Pandas 0.15 引入).