为什么preg_match匹配最后一个子模式而不是第一个?

为什么preg_match匹配最后一个子模式而不是第一个?

问题描述:

I am trying to match the first hexadecimal address from a line that can contain many hexadecimal addresses, but instead I get the last.

My request is:

preg_match('%.*(0x[0-9a-f]{8}){1}.*%', $v, $current_match);

where the $v is a string like:

Line: 2 libdispatch.dylib 0x36eaed55 0x36eae000 + 3413

I would want to get 0x36eaed55, but my regular expression for $current_match[1] returns 0x36eae000 instead.

According to php documentation: $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

我试图匹配可以包含许多十六进制地址的行中的第一个十六进制地址,但我得到了 持续。 p>

我的请求是: p>

  preg_match('%。*(0x [0-9a-f] {8}){1  }。*%',$ v,$ current_match); 
  code>  pre> 
 
 

其中 $ v code>是一个字符串,如: p>

 行:2 libdispatch.dylib 0x36eaed55 0x36eae000 + 3413 
  code>  pre> 
 
 

我想得到 0x36eaed55 code> ,但我的 $ current_match [1] code>的正则表达式返回 0x36eae000 code>。 p>

根据php文档: $ matches [1] code>将具有与第一个捕获的带括号的子模式匹配的文本,依此类推。 p> div>

The problem is that the * quantifier is greedy by default, so the first .* matches as much as possible while still allowing the entire expression to match. In this case, it means that .* will "gobble up" all of the hexadecimal constants but the last one, as (0x[0-9a-f]{8}){1} still needs to match.

One solution is to use the non-greedy operator *?. The first constant is found when using the following:

preg_match('%.*?(0x[0-9a-f]{8}){1}.*?%', $v, $current_match);

However, because you know that $v includes a hexadecimal constant, and you want the first one, then why not simply match against the pattern of the hexadecimal constant?

preg_match('%0x[0-9a-f]{8}%', $v, $current_match);

Even if you wanted the second, third, fourth, ... hexadecimal constant, you could use preg_match_all() with the same pattern:

preg_match_all('%0x[0-9a-f]{8}%', $v, $all_matches, PREG_PATTERN_ORDER);

The first .* tries to match as much as possible, so it matches your first hex as well. Try making it not greedy: .*?

That's because your first .* is greedy. You can fix it by changing your regexp to:

preg_match('%(0x[0-9a-f]{8})%', $v, $current_match);

or

preg_match('%.*?(0x[0-9a-f]{8})%', $v, $current_match);

You need to use the ungreedy modifier, "U":

preg_match('%.*(0x[0-9a-f]{8}){1}.*%U', $v, $m);