计算DataFrame每行中Series系列项目的出现次数

问题描述：

我有一个像这样的pandas.DataFrame.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

对于此数据帧中的每一行，我想计算C1，C2，C3中每一个的出现次数，并将此信息作为列附加到此数据帧中.例如，第一行具有1 C1、0 C2和0 C3.最终的数据帧应该看起来像这样

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

因此，我创建了一个以C1，C2和C3为值的系列-一种最重要的方法是，在DataFrame的行和列上循环，然后在该Series上循环，如果计数器匹配，则递增计数器.但是，是否有一种apply方法可以紧凑地实现这一目标?

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an apply approach that can achieve this in a compact fashion?

答

您可以申请value_counts:

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

因此，您可以仅填写所需的基本值来填充NaN和小程序:

So you can fill the NaN and applend just the base values you want:

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

注意:直接为DataFrame使用value_counts方法存在一个未解决的问题(我认为应该由pandas 0.15引入).

计算DataFrame每行中Series系列项目的出现次数

相关推荐