用另一个列的值的len()添加一个DataFrame列
我在尝试获取另一列中的字符串值的字符计数列时遇到问题,但还没有弄清楚如何有效地做到这一点.
I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.
for index in range(len(df)):
df['char_length'][index] = len(df['string'][index]))
这显然涉及首先创建一个null列,然后重写它,这在我的数据集上花费了很长时间.那么获得这样的东西最有效的方法是什么
This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like
'string' 'char_length'
abcd 4
abcde 5
我已经检查了很多,但是我无法弄清楚.
I've checked around quite a bit, but I haven't been able to figure it out.
Pandas具有矢量化字符串方法:str.len()
.要创建新列,您可以编写:
Pandas has a vectorised string method for this: str.len()
. To create the new column you can write:
df['char_length'] = df['string'].str.len()
例如:
>>> df
string
0 abcd
1 abcde
>>> df['char_length'] = df['string'].str.len()
>>> df
string char_length
0 abcd 4
1 abcde 5
这应该比使用Python for
循环在DataFrame上循环要快得多.
This should be considerably faster than looping over the DataFrame with a Python for
loop.
许多其他Python熟悉的字符串方法已引入Pandas.例如,lower
(用于转换为小写字母),count
用于计数特定子字符串的出现,replace
用于将一个子字符串与另一个子字符串交换.
Many other familiar string methods from Python have been introduced to Pandas. For example, lower
(for converting to lowercase letters), count
for counting occurrences of a particular substring, and replace
for swapping one substring with another.