如何从文件中所有引用的文本中删除换行符?
问题描述:
我从数据库中导出了一个 CSV 文件.某些字段是较长的文本块,并且可以包含换行符.从该文件中仅删除双引号内的换行符但保留所有其他换行符的最简单方法是什么?
I have exported a CSV file from a database. Certain fields are longer text chunks, and can contain newlines. What would be the simplest way of removing only newlines from this file that are inside double quotes, but preserving all others?
我不在乎它是使用 Bash 命令行单行还是简单的脚本,只要它可以工作.
I don't care if it uses a Bash command line one liner or a simple script as long as it works.
例如
"Value1", "Value2", "This is a longer piece
of text with
newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"
应删除较长文本中的换行符,但不应删除分隔两行的换行符.
The newlines inside of the longer piece of text should be removed, but not the newline separating the two rows.
答
在 Python 中:
In Python:
import csv
with open("input.csv", "rb") as input, open("output.csv", "wb") as output:
w = csv.writer(output)
for record in csv.reader(input):
w.writerow(tuple(s.remove("\n") for s in record))