如何根据另一个文件中的列表值从csv文件中删除行?

问题描述：

我有两个文件:

candidates.csv :

id,value
1,123
4,1
2,5
50,5

blacklist.csv :

我想从 candidates.csv 中删除所有行，其中第一列( id )的值包含在 blacklist.csv 代码>. id 始终为数字.在这种情况下，我希望输出看起来像这样:

I'd like to remove all rows from candidates.csv in which the first column (id) has a value contained in blacklist.csv. id is always numeric. In this case I'd like my output to look like this:

id,value
4,1
50,5

到目前为止，我用于识别重复行的脚本如下所示:

So far, my script for identifying the duplicate lines looks like this:

cat candidates.csv | cut -d \, -f 1 | grep -f blacklist.csv -w

这给了我输出

1
2

现在，我需要以某种方式将此信息通过管道传回 sed / awk / gawk /...，以删除重复项，但是我不知道如何有什么想法我可以从这里继续吗?还是有更好的解决方案?我唯一的限制是它必须在bash中运行.

Now I somehow need to pipe this information back into sed/awk/gawk/... to delete the duplicates, but I don't know how. Any ideas how I can continue from here? Or is there a better solution altogether? My only restriction is that it has to run in bash.

答

有关以下内容:

 awk -F, '(NR==FNR){a[$1];next}!($1 in a)' blacklist.csv candidates.csv

这是如何工作的?

awk程序是一系列模式-动作对，写为:

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

其中 condition 通常是一个表达式，而 action 是一系列命令.在这里，第一个条件操作对显示为:

where condition is typically an expression and action a series of commands. Here, the first condition-action pairs read:

(NR == FNR){a [$ 1]; next} ，如果总记录计数 NR 等于文件 FNR (即，如果我们正在读取第一个文件)，将所有值存储在数组 a 中，然后跳至下一条记录(不执行其他任何操作)
！($ 1 in a)如果第一个字段不在数组 a 中，则执行默认操作，即打印行.这将仅对第二个文件起作用，因为第一个条件操作对的条件不成立.

(NR==FNR){a[$1];next} if the total record count NR equals the record count of the file FNR (i.e. if we are reading the first file), store all values in array a and skip to the next record (do not do anything else)
!($1 in a) if the first field is not in the array a then perform the default action which is print the line. This will only work on the second file as the condition of the first condition-action pair does not hold.

如何根据另一个文件中的列表值从csv文件中删除行?

相关推荐