PHP和正则表达式将字符串的两个独立部分提取为ONE重组变量

PHP和正则表达式将字符串的两个独立部分提取为ONE重组变量

问题描述:

I have a PHP string consisting of HTML code as follows:

$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
        (Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
        (Omalizumab)
</li>
</ul>';

using

preg_match_all($regex,$string,$matches, PREG_PATTERN_ORDER);

for ($i = 0; $i < count($matches[0]); ++$i)
{ echo $i . "    " . $matches[0][$i]. "<br>"; }

if I use

$regex = "^(?<=>).*?(?=(\Q</a>\E))^";

I get

1 Nalcrom

2 Alimemazine

3 Xolair

whereas if I use

$regex = "^\(.*?\)^";

I get

1 (Sodium Cromoglicate)

2 (Omalizumab)

Trying

$regex = "^(?<=>).*?(?=(\Q</a>\E))(\(.*?\))^";

and variations upon it I get nothing but blank, whereas what I need is:

1 Nalcrom (Sodium Cromoglicate)

2 Alimemazine

3 Xolair (Omalizumab)

Any ideas on how I can do this? thnx

Make the second regex group optional ?, i.e.:

   $string =
    '<ul>
    <li>
    <a href="/nalcrom">Nalcrom</a>
            (Sodium Cromoglicate)
    </li>
    <li>
    <a href="/alimemazine">Alimemazine</a>
    </li>
    <li>
    <a href="/xolair">Xolair</a>
            (Omalizumab)
    </li>
    </ul>';

    preg_match_all('%">(.*?)</a>\s+(\(.*?\))?%i', $string, $match, PREG_PATTERN_ORDER);
    for ($i = 0; $i < count($match[0]); $i++) {
        echo  $match[1][$i] . " ". $match[2][$i];
    }

Output:

Nalcrom (Sodium Cromoglicate)
Alimemazine 
Xolair (Omalizumab)

DEMO

Here's a non regex solution. This gets rid of all HTML then uses new lines as indicators for data points. If the new line starts with a ( it presumes it belongs with the previous point and appends it there.

<?php
$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
        (Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
        (Omalizumab)
</li>
</ul>';
$new_string = strip_tags($string);
$newlines = explode("
", $new_string);
$count = 0;
$output = '';
foreach($newlines as $newline) {
    $newline = trim($newline);
    if(!empty($newline)) {
        if(preg_match('~^\(~', $newline)) {
            $output .= $newline;
        } else {
            $count++;
            if(!empty($output)) {
                $output .= "

";
            }
            $output .=  $count . ' ' .$newline . ' ';
        }
    }
}
echo $output;

Then output is:

1 Nalcrom (Sodium Cromoglicate)

2 Alimemazine

3 Xolair (Omalizumab)

Try the following regex:

@>([^<]+)</a>([^<]*)</li>@ius

In your example $matches[1][0] and trim($matches[2][0]) should return respectively Nalcrom and Sodium Cromoglicate. So you can iterate over your list using the second index.

My example needs trim to keep the regex simple, but in practice you can twist it so that it doesnt capture blank char.