php正则表达式匹配shorttags

问题描述:

This is close, but is failing to match successive "attributes":

$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
preg_match_all('/\[((\w+)((\s(\w+)="([^"]+)"))*)\]/', $string, $matches, PREG_SET_ORDER);
print '<pre>' . print_r($matches, TRUE) . '</pre>';

Gives back the following:

Array
(
    [0] => Array
        (
            [0] => [include file="bob.txt"]
            [1] => include file="bob.txt"
            [2] => include
            [3] =>  file="bob.txt"
            [4] =>  file="bob.txt"
            [5] => file
            [6] => bob.txt
        )

    [1] => Array
        (
            [0] => [another prop="val" attr="one"]
            [1] => another prop="val" attr="one"
            [2] => another
            [3] =>  attr="one"
            [4] =>  attr="one"
            [5] => attr
            [6] => one
        )

    [2] => Array
        (
            [0] => [tag]
            [1] => tag
            [2] => tag
        )

)

Where [2] is the tag name, [5] is the attribute name and [6] is the attribute value.

The failure is on the second node - it catches attr="one" but not prop="val"

TYIA.

(this is only meant for limited, controlled use - not broad distribution - so I don't need to worry about single quotes or escaped double quotes)

这是关闭的,但无法匹配连续的“属性”: p>

  $ string =“single attribute [include file = \”bob.txt \“]多个属性[另一个prop = \”val \“attr = \”one \“]没有属性[tag]等”; \  npreg_match_all('/ \ [((\ w +)((\ s(\ w +)=“([^”] +)“))*)\] /',$ string,$ matches,PREG_SET_ORDER); 
print'  &LT;预&GT;”  .print_r($ matches,TRUE)。'&lt; / pre&gt;'; 
  code>  pre> 
 
 

收回以下内容: p>

   Array 
(
 [0] =&gt; Array 
(
 [0] =&gt; [include file =“bob.txt”] 
 [1] =&gt; include file =“bob  .txt“
 [2] =&gt; include 
 [3] =&gt; file =”bob.txt“
 [4] =&gt; file =”bob.txt“
 [5] =&gt;  file 
 [6] =&gt; bob.txt 
)
 
 [1] =&gt;数组
(
 [0] =&gt; [another prop =“val”attr =“one”]  
 [1] =&gt;另一个prop =“val”attr =“one”
 [2] =&gt;另一个
 [3] =&gt; attr =“one”
 [4] =&gt; attr  =“one”
 [5] =&gt; attr 
 [6] =&gt;一个
)
 
 [2] =&gt;数组
(
 [0] =&gt; [tag]  
 [1] =&gt;标签
 [2] =&gt;标签
)
 
)
  code>  pre> 
 
 

其中[2]是标签 name,[5]是属性名称,[6]是属性值。 p>

失败是在t上 第二个节点 - 它捕获 attr =“one” code>但不是 prop =“val” code> p>

TYIA。 p> \ n

(这仅限于有限的受控使用 - 不是广泛分布 - 所以我不需要担心单引号或转义双引号) p> div>

Unfortunately there is no way to repeat capture groups like that. Personally, I would use preg_match to match the tags themselves (i.e. remove all the extra parentheses inside the regex), then foreach match you can then extract the attributes. Something like this:

$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
preg_match_all('/\[\w+(?:\s\w+="[^"]+")*\]/', $string, $matches);
foreach($matches[0] as $m) {
    preg_match('/^\w+/', $m, $tagname); $tagname = $tagname[0];
    preg_match_all('/\s(\w+)="([^"]+)"/', $m, $attrs, PREG_SET_ORDER);
    // do something with $tagname and $attrs
}

Note that if you intend to replace the tag with some content, you should use preg_replace_callback like so:

$string = "single attribute [include file=\"bob.txt\"] multiple attributes [another prop=\"val\" attr=\"one\"] no attributes [tag] etc";
$output = preg_replace_callback('/\[\w+(?:\s\w+="[^"]+")*\]/', $string, function($match) {
    preg_match('/^\w+/', $m, $tagname); $tagname = $tagname[0];
    preg_match_all('/\s(\w+)="([^"]+)"/', $m, $attrs, PREG_SET_ORDER);
    $result = // do something with $tagname and $attrs
    return $result;
});