正则表达式从java转换为php

正则表达式从java转换为php

问题描述:

I have a regular expression in php and I need to convert it to java. Is it possible to do so? If yes how can i do?

Thanks in advance

$region_pattern = "/<a href=\"#\"><img src=\"images\/ponto_[^\.]+\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\(<]+)\((?P<region>[^\)]+)\)<\/strong><\/a>/i" ;

我在php中有一个正则表达式,我需要将它转换为java。 有可能这样做吗? 如果是,我该怎么办? p>

提前致谢 p>

  $ region_pattern =“/&lt; a href = \”#\“  &gt;&lt; img src = \“images \ / ponto _ [^ \。] + \ .gif \”[^&gt;] *&gt; [&amp; nbsp;] *&lt; strong&gt;(?P&lt; neighborhood&gt; [  ^ \(&lt;] +)\((?P&lt; region&gt; [^ \]] +)\)&lt; \ / strong&gt;&lt; \ / a&gt; / i“; 
  code>  pre  > 
  div>

There are some problems with the original regex that have to be cleared away first. First, there's [&nbsp;], which matches one of the characters &, n, b, s, p or ;. To match an actual non-breaking space character, you should use \xA0.

You also have a lot of unneeded backslashes in there. You can get rid of some by changing the regex delimiter to something other than /; others aren't needed because they're inside character classes, where most metacharacters lose their special meanings. That leaves you with this PHP regex:

"~<a href=\"#\"><img src=\"images/ponto_[^.]+\.gif\"[^>]*>\xA0*<strong>(?P<neighborhood>[^(<]+)\((?P<region>[^)]+)\)</strong></a>~i"

There are three things that make this regex incompatible with Java. One is the delimiters (/ originally, ~ in the version above) along with the trailing i modifier. Java doesn't use regex delimiters at all, so just drop those. The modifier can be moved into the regex itself by using the inline form, (?i), at the beginning of the regex. (That will work in PHP too, by the way.)

Next is the backslashes. The ones that are used to escape quotation marks remain as they are, but all the others get doubled because Java is more strict about escape sequences in string literals.

Finally, there are the named groups. Up until Java 6, named groups weren't supported at all; Java 7 supports them, but they use the shorter (?<name>...) syntax favored by .NET, not the Pythonesque (?P<name>...) syntax. (By the way, the shorter (?<name>...) version should work in PHP, too (as should (?'name'...), also introduced by .NET).

So the Java 7 version of your regex would be:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>(?<neighborhood>[^(<]+)\\((?<region>[^)]+)\\)</strong></a>"

For Java 6 or earlier you would use:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>([^(<]+)\\(([^)]+)\\)</strong></a>"

...and you'd have to use numbers instead of names to refer to the group captures.

REGEX is REGEX regardless of language. The REGEX you've posted will work on both Java and PHP. You do need to make some adjustments as both language don't take the pattern exactly the same (though the pattern itself will work in both languages).

Points to Consider

  • You should know that Java's Pattern object applies flags without having to specify them on the pattern string itself.
  • Delimiters should not be included as well. Only the pattern itself.

A typical conversion from any regex to java is to:

  • Exclude pattern delimiters => remove starting and trailing /
  • Remove flags, these are applied to the Pattern object, this is the trailing i. You should either put it in the initialisation of your Pattern object or prepend it to the regex like (?i)<regex>
  • Replace all \ with \\, \ has a meaning already in java(escape in strings), to use a backslash inside a regex in java you have to use \\ instead of \, so \w becomes \\w. and \\ becomes \\\\

Above regex would become

Pattern.compile("<a href=\"#\"><img src=\"images\\/ponto_[^\\.]+\\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\\(<]+)\\((?P<region>[^\\)]+)\\)<\\/strong><\\/a>", Pattern.CASE_INSENSITIVE);

This will fail however, I think it is because ?P is a modifier, not one I know exists in Java so ye it is a invalid regex.