如何在 Bash 脚本中解析 CSV?
问题描述:
我正在尝试解析可能包含 100k+ 行的 CSV.这是我的标准:
I am trying to parse a CSV containing potentially 100k+ lines. Here is the criteria I have:
- 标识符的索引
- 标识符值
我想检索 CSV 中所有在给定索引中具有给定值的行(以逗号分隔).
I would like to retrieve all lines in the CSV that have the given value in the given index (delimited by commas).
有什么想法,特别考虑性能吗?
Any ideas, taking in special consideration for performance?
答
第一个使用普通 grep
和 cut
的原型:
First prototype using plain old grep
and cut
:
grep "${VALUE}" inputfile.csv | cut -d, -f"${INDEX}"
如果这足够快并提供正确的输出,那么您就完成了.
If that's fast enough and gives the proper output, you're done.