计算 DataFrame 每一行中系列中项目的出现次数

问题描述:

我有一个看起来像这样的 pandas.DataFrame.

I have a pandas.DataFrame that looks like this.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

对于此数据框中的每一行,我想计算 C1、C2、C3 中每一个的出现次数,并将此信息作为列附加到此数据框中.例如,第一行有 1 个 C1、0 个 C2 和 0 个 C3.最终的数据框应该是这样的

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

因此,我创建了一个以 C1、C2 和 C3 作为值的系列 - 一种最高计数的方法是循环遍历 DataFrame 的行和列,然后遍历这个系列,如果匹配,则增加计数器.但是是否有一种 apply 方法可以以紧凑的方式实现这一点?

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply approach that can achieve this in a compact fashion?

您可以申请value_counts:

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

所以你可以只填充 NaN 和 applend 你想要的基本值:

So you can fill the NaN and applend just the base values you want:

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

注意:直接为 DataFrame 提供 value_counts 方法是一个悬而未决的问题(我认为应该由 Pandas 0.15 引入).