仅当重复的行与某个模式匹配时才删除它们

问题描述:

此问题具有一个很好的答案,说您可以使用awk '!seen[$0]++' file.txt从文件中删除非连续的重复行.仅当非重复的重复行与模式匹配时,如何才能删除它们?例如仅当它们包含字符串"#####"

This question has a great answer saying you can use awk '!seen[$0]++' file.txt to delete non-consecutive duplicate lines from a file. How can I delete non-consecutive duplicate lines from a file only if they match a pattern? e.g. only if they contain the string "#####"

示例输入

deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'

所需的输出

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'

您可以使用

awk '!/#####/ || !seen[$0]++'

或者作为埃德·莫顿(Ed Morton)建议,一个同义词

awk '!(/#####/ && seen[$0]++)'

在这里,!seen[$0]++和往常一样,它将删除所有重复的行. !/#####/部分匹配包含#####模式的行,并取消匹配.结合了||的两个模式将删除其中所有具有#####模式的重复行.

Here, !seen[$0]++ does the same thing as usual, it will remove any duplicated line. The !/#####/ part matches lines that contain a ##### pattern and negates the match. The two patterns combined with || will remove all duplicate lines having ##### pattern inside them.

请参见在线awk演示:

s="deleteme.txt ##########
1219:                            'PCM BE PTP'
deleteme.txt ##########
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1225:                          , 'PCM FE/MID PTP'"
awk '!/#####/ || !seen[$0]++' <<< "$s"

输出:

deleteme.txt ##########
1219:                            'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
deleteme2.txt ##########
1222:                          , 'PCM BE PTP UT'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1223  #####:                          , 'PCM BE PTP'
1221:                          , 'PCM FE/MID PTP UT','PCM IA 1 PTP'
1225:                          , 'PCM FE/MID PTP'