对大型正则表达式使用延迟评估(而不仅仅是。*?)

问题描述:

Using the follow regex:

\[\w* \w* \d{2} [\w:]* \d{4}\] \[error\] \[client .*?\] .*? Using HTTP not .*?<br /> 

I get the following results (where yellow boxes indicate a match):

Sublime Text 2

Raw Text: http://pastebin.com/vSi0mLGv

The bottom two sections are correct. I want all sections that contain: &lt;&lt;&lt;NOTICE&gt;&gt;&gt; Non-Prod Server: Using HTTP not HTTP/S

The top section however, contains the correct string (similar to the bottom two), but also comes with a whole other chunk that I do not want:

[Thu May 10 17:43:48 2012] [error] [client ::1] Current Name:
DashboardBar_projAnnualReview200, referer: http://
localhost/test/pages/TestPage.php<br />`

I know this comes down to regex being greedy, but how can I go about making it do a lazy evaluation for the <br />, if that's even the right way to go about it. I've tried (<br />)*? and others to no avail.


Other Information: I am using Sublime Text 2, and performing a regex search if anyone wanted to recreate the image.

使用以下正则表达式: p>

  \ [\ w *  \ w * \ d {2} [\ w:] * \ d {4} \] \ [错误\] \ [客户端。*?\]。*? 使用HTTP而不是。*?&lt; br /&gt;  
  code>  pre> 
 
 

我得到以下结果(黄色框表示匹配): p>

p>

原始文本: em> http://pastebin.com/vSi0mLGv p>

底部的两个部分是正确的。 我希望所有部分包含:&amp; lt;&amp; lt;&amp; lt; NOTICE&amp; gt;&amp; gt;&amp; gt; 非Prod服务器:使用HTTP而不是HTTP / S code> p>

然而,顶部包含正确的字符串(类似于底部的两个),但也附带一个完整的其他字符串 我不想要的块: p>

  [Thu May 10 17:43:48 2012] [error] [client :: 1]当前名称:
DashboardBar_projAnnualReview200,referer:http  :// 
localhost / test / pages / TestPage.php&lt; br /&gt;`
  code>  pre> 
 
 

我知道这归结为正则表达式贪婪,但我怎么能 如果对于&lt; br /&gt; code>进行惰性评估,如果这是正确的方法。 我已经尝试(&lt; br /&gt;)*? code>和其他人无济于事。 p>


其他信息: I 我正在使用 Sublime Text 2 ,并执行正则表达式搜索,如果有人想要重新创建图像。 p > div>

Greediness is not the problem, eagerness is. The regex engine starts trying to match at the earliest opportunity, and it doesn't give up until every possibility has been exhausted. Making quantifiers non-greedy doesn't change that, it just changes the order in which the possibilities are tried.

It's not the * in .* that's causing your problem, it's the .. You need to use something more restrictive, because it's allowing the match to start too early. This regex works as desired because I've replaced the .*? with [^][]*, which matches any characters except ] or [:

\[\w* \w* \d{2} [\w:]* \d{4}\] \[error\] \[client [^][]*\] [^][]* Using HTTP not .*?<br />

I don't know what regex flavor SublimeText uses, so you may need to escape the square brackets inside the character class:

\[\w* \w* \d{2} [\w:]* \d{4}\] \[error\] \[client [^\]\[]*\] [^\]\[]* Using HTTP not .*?<br />

You mean "reluctant", not "lazy".

There should be no intervening <br />, right? Something like ((?!<br />).)* might work.