如何从熊猫数据框中的列中删除字符串值

问题描述：

我正在尝试编写一些代码，以逗号分隔数据框列中的字符串（因此它成为列表），并从该列表中删除某个字符串（如果存在）。删除不需要的字符串后，我想再次以逗号加入列表元素。我的数据框如下所示：

I am trying to write some code that splits a string in a dataframe column at comma (so it becomes a list) and removes a certain string from that list if it is present. after removing the unwanted string I want to join the list elements again at comma. My dataframe looks like this:

df:

   Column1  Column2
0      a       a,b,c
1      y       b,n,m
2      d       n,n,m
3      d       b,b,x

所以基本上我的目标是从column2中删除所有b值，以便得到：

So basically my goal is to remove all b values from column2 so that I get:

df：

   Column1  Column2
0      a       a,c
1      y       n,m
2      d       n,n,m
3      d       x

我编写的代码如下：

df=df['Column2'].apply(lambda x: x.split(','))

def exclude_b(df):
    for index, liste in df['column2].iteritems():
        if 'b' in liste:
            liste.remove('b')
            return liste
        else:
            return liste

第一行将列以逗号分隔的列表。使用现在的函数，我尝试遍历所有列表并删除b（如果存在），如果不存在则返回列表。如果我在末尾打印 liste，则仅返回Column2的第一行，而不返回其他行。我究竟做错了什么？并有一种方法可以将我的if条件实现为lambda函数吗？

The first row splits all the values in the column into a comma separated list. with the function now I tried to iterate through all the lists and remove the b if present, if it is not present return the list as it is. If I print 'liste' at the end it only returns the first row of Column2, but not the others. What am I doing wrong? And would there be a way to implement my if condition into a lambda function?

答

您只需应用正则表达式 b，？即可替换 b 和中的任何值，在 b 之后找到的如果存在

simply you can apply the regex b,? , which means replace any value of b and , found after the b if exists

df['Column2'] = df.Column2.str.replace('b,?' , '')

Out[238]:
Column1 Column2
0   a   a,c
1   y   n,m
2   d   n,n,m
3   d   x

如何从熊猫数据框中的列中删除字符串值

相关推荐