如何计算正则表达式的匹配数?

如何计算正则表达式的匹配数?

问题描述:

假设我有一个包含以下内容的字符串:

Let's say I have a string which contains this:

HelloxxxHelloxxxHello

我编译一个模式以查找"Hello"

I compile a pattern to look for 'Hello'

Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");

它应该找到三个匹配项.我怎样才能知道有多少场比赛?

It should find three matches. How can I get a count of how many matches there were?

我已经尝试过各种循环并使用matcher.groupCount(),但是没有用.

I've tried various loops and using the matcher.groupCount() but it didn't work.

matcher.find()找不到所有匹配项,只有 next 匹配项.

matcher.find() does not find all matches, only the next match.

long matches = matcher.results().count();

Java 8和更早版本的解决方案

您必须执行以下操作. (从Java 9开始,有一个更好的解决方案)

int count = 0;
while (matcher.find())
    count++;

顺便说一句,matcher.groupCount()完全不同.

Btw, matcher.groupCount() is something completely different.

完整示例:

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String hello = "HelloxxxHelloxxxHello";
        Pattern pattern = Pattern.compile("Hello");
        Matcher matcher = pattern.matcher(hello);

        int count = 0;
        while (matcher.find())
            count++;

        System.out.println(count);    // prints 3
    }
}

处理重叠匹配

在计算aaaaaa的匹配项时,上面的代码段将为您提供 2 .

Handling overlapping matches

When counting matches of aa in aaaa the above snippet will give you 2.

aaaa
aa
  aa

要获得3个匹配项,即此行为:

To get 3 matches, i.e. this behavior:

aaaa
aa
 aa
  aa

您必须在索引<start of last match> + 1处搜索匹配项,如下所示:

You have to search for a match at index <start of last match> + 1 as follows:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
    count++;
    i = matcher.start() + 1;
}

System.out.println(count);    // prints 3