用另一个列的值的len()添加一个DataFrame列

问题描述：

我在尝试获取另一列中的字符串值的字符计数列时遇到问题，但还没有弄清楚如何有效地做到这一点.

I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.

for index in range(len(df)):
    df['char_length'][index] = len(df['string'][index]))

这显然涉及首先创建一个null列，然后重写它，这在我的数据集上花费了很长时间.那么获得这样的东西最有效的方法是什么

This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like

'string'     'char_length'
abcd          4
abcde         5

我已经检查了很多，但是我无法弄清楚.

I've checked around quite a bit, but I haven't been able to figure it out.

答

Pandas具有矢量化字符串方法:str.len().要创建新列，您可以编写:

Pandas has a vectorised string method for this: str.len(). To create the new column you can write:

df['char_length'] = df['string'].str.len()

例如:

>>> df
  string
0   abcd
1  abcde

>>> df['char_length'] = df['string'].str.len()
>>> df
  string  char_length
0   abcd            4
1  abcde            5

这应该比使用Python for循环在DataFrame上循环要快得多.

This should be considerably faster than looping over the DataFrame with a Python for loop.

许多其他Python熟悉的字符串方法已引入Pandas.例如，lower(用于转换为小写字母)，count用于计数特定子字符串的出现，replace用于将一个子字符串与另一个子字符串交换.

Many other familiar string methods from Python have been introduced to Pandas. For example, lower (for converting to lowercase letters), count for counting occurrences of a particular substring, and replace for swapping one substring with another.

用另一个列的值的len()添加一个DataFrame列

相关推荐