大 pandas 在列中使用额外的逗号读取 csv
我正在阅读一个基本的 csv 文件,其中的列用逗号与这些列名分隔:
I'm reading a basic csv file where the columns are separated by commas with these column names:
用户名、用户名、正文
然而,body 列是一个可能包含逗号的字符串.显然这会导致一个问题,pandas 抛出一个错误:
However, the body column is a string which may contain commas. Obviously this causes a problem and pandas throws out an error:
CParserError:标记数据时出错.C 错误:第 3 行应为 3 个字段,看到 8 个
有没有办法告诉熊猫忽略特定列中的逗号或解决这个问题的方法?
Is there a way to tell pandas to ignore commas in a specific column or a way to go around this problem?
假设我们正在读取名为 comma.csv
的数据框:
Imagine we're reading your dataframe called comma.csv
:
userid, username, body
01, n1, 'string1, string2'
您可以做的一件事是指定列中字符串的分隔符:
One thing you can do is to specify the delimiter of the strings in the column with:
df = pd.read_csv('comma.csv', quotechar="'")
在这种情况下,由 '
分隔的字符串被视为总数,无论其中是否有逗号.
In this case strings delimited by '
are considered as total, no matter commas inside them.