Perl-在文件或数组中查找重复的行

问题描述：

我正在尝试从文件句柄中打印重复的行，而不是删除它们或其他在其他问题上看到的问题.我没有足够的Perl经验来快速完成此操作，所以我在这里问.这是怎么做的?

I'm trying to print duplicate lines from the filehandle, not remove them or anything else I see asked on other questions. I don't have enough experience with perl to be able to quickly do this, so I'm asking here. What's the way to do this?

答

使用标准的Perl速记:

Using the standard Perl shorthands:

my %seen;
while ( <> ) { 
    print if $seen{$_}++;
}

作为单线":

perl -ne 'print if $seen{$_}++'

更多数据?打印<file name>:<line number>:<line>:

perl -ne 'print ( $ARGV eq "-" ? "" : "$ARGV:" ), "$.:$_" if $seen{$_}++'

%seen的解释:

%seen声明一个哈希.对于输入中的每个 unique 行(在本例中为while(<>))，$seen{$_}在哈希中将有一个标量槽，由该行的文本命名(这是$_正在使用{}大括号执行).
使用后缀增量运算符(x++)，我们获取表达式的值，并记住在表达式后对其进行 increment .因此，如果我们还没有看到"行$seen{$_}是不确定的，但是像这样强制输入数字上下文"时，它将被视为0，而 false .
然后将其增加到1.

%seen declares a hash. For each unique line in the input (which is coming from while(<>) in this case) $seen{$_} will have a scalar slot in the hash named by the the text of the line (this is what $_ is doing in the has {} braces).
Using the postfix increment operator (x++) we take the value for our expression, remembering to increment it after the expression. So, if we haven't "seen" the line $seen{$_} is undefined--but when forced into an numeric "context" like this, it's taken as 0--and false.
Then it's incremented to 1.

因此，当while开始运行时，所有行均为零"(如果可以帮助您将其视为非%seen"行)，那么，我们第一次看到一行取未定义的值-导致if失败-将标量槽处的计数增加到1.因此，对于将来出现的任何情况，在通过if条件并打印时该值为1.

So, when the while begins to run, all lines are "zero" (if it helps you can think of the lines as "not %seen") then, the first time we see a line, perl takes the undefined value - which fails the if - and increments the count at the scalar slot to 1. Thus, it is 1 for any future occurrences at which point it passes the if condition and it printed.

现在，如上所述，%seen声明了一个哈希，但是在关闭strict的情况下，可以当场创建任何变量表达式.因此，perl第一次看到$seen{$_}时，它知道我在寻找%seen，它没有它，因此它创建了它.

Now as I said above, %seen declares a hash, but with strict turned off, any variable expression can be created on the spot. So the first time perl sees $seen{$_} it knows that I'm looking for %seen, it doesn't have it, so it creates it.

与此有关的另一件整洁的事情是，最后，如果您愿意使用它，则可以对每行重复多少次进行计数.

An added neat thing about this is that at the end, if you care to use it, you have a count of how many times each line was repeated.

Perl-在文件或数组中查找重复的行

相关推荐