如何在不对Unix进行排序的情况下删除文件中的重复行?

问题描述：

在Unix中是否可以删除文件中的重复行?

Is there a way to delete duplicate lines in a file in Unix?

我可以使用sort -u和uniq命令执行此操作，但是我想使用sed或awk. 有可能吗?

I can do it with sort -u and uniq commands, but I want to use sed or awk. Is that possible?

答

awk '!seen[$0]++' file.txt

seen是Awk会将文件的每一行传递到的关联数组.如果行不在数组中，则seen[$0]的计算结果为false. !是逻辑NOT运算符，会将false转换为true. Awk将打印表达式计算结果为true的行. ++递增seen，以便在第一次找到行之后找到seen[$0] == 1，然后找到seen[$0] == 2，依此类推.
Awk将除0和""(空字符串)以外的所有内容评估为true.如果在seen中放置了重复的行，则!seen[$0]的结果将为false，并且该行将不会写入输出中.

seen is an associative-array that Awk will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. Awk will print the lines where the expression evaluates to true. The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
Awk evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.

如何在不对Unix进行排序的情况下删除文件中的重复行?

相关推荐