pandas在列中读取csv和额外的逗号
我正在读一个基本的csv文件,其中列用逗号分隔这些列名:
I'm reading a basic csv file where the columns are separated by commas with these column names:
userid,username,body
但是,body列是一个可以包含逗号的字符串。显然这会导致一个问题,并且pandas会抛出一个错误:
However, the body column is a string which may contain commas. Obviously this causes a problem and pandas throws out an error:
CParserError: C错误:第3行中的第3个字段,看到了8
有没有办法让pandas忽略特定列中的逗号或
Is there a way to tell pandas to ignore commas in a specific column or a way to go around this problem?
想象一下,我们正在读取你的数据框,名为 comma.csv
:
Imagine we're reading your dataframe called comma.csv
:
userid, username, body
01, n1, 'string1, string2'
你可以做的事情是在列中指定字符串的分隔符:
One thing you can do is to specify the delimiter of the strings in the column with:
df = pd.read_csv('comma.csv', quotechar="'")
在这种情况下,由'
分隔的字符串被视为总计,无论其中是否有逗号。
In this case strings delimited by '
are considered as total, no matter commas inside them.