如何使用带有逗号分隔符和空格的 Pandas 解析 csv?
问题描述:
我目前有以下带有逗号分隔符的 data.csv
:
I currently have the following data.csv
which has a comma delimiter:
name,day
Chicken Sandwich,Wednesday
Pesto Pasta,Thursday
Lettuce, Tomato & Onion Sandwich,Friday
Lettuce, Tomato & Onion Pita,Friday
Soup,Saturday
解析器脚本是:
import pandas as pd
df = pd.read_csv('data.csv', delimiter=',', error_bad_lines=False, index_col=False)
print(df.head(5))
输出为:
Skipping line 4: expected 2 fields, saw 3
Skipping line 5: expected 2 fields, saw 3
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Soup Saturday
我如何处理这种情况 Lettuce, Tomato &洋葱三明治
.每个项目都应该用 ,
分隔,但一个项目中可能有一个逗号,后跟一个空格.所需的输出是:
How do I handle the case Lettuce, Tomato & Onion Sandwich
. Each item should be separated by ,
but it's possible that an item has a comma in it followed by a space. The desired output is:
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Lettuce, Tomato & Onion Sandwich Friday
3 Lettuce, Tomato & Onion Pita Friday
4 Soup Saturday
答
在其他情况下也适用的替代方案.好吧,太丑了.
An alternative that works in other situations too. OK, it's ugly.
import pandas as pd
from io import StringIO
for_pd = StringIO()
with open('theirry.csv') as input:
for line in input:
line = line.rstrip().replace(', ', '|||').replace(',', '```').replace('|||', ', ').replace('```', '|')
print (line, file=for_pd)
for_pd.seek(0)
df = pd.read_csv(for_pd, sep='|')
print (df)
结果:
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Lettuce, Tomato & Onion Sandwich Friday
3 Lettuce, Tomato & Onion Pita Friday
4 Soup Saturday