PHP和正则表达式将字符串的两个独立部分提取为ONE重组变量
I have a PHP string consisting of HTML code as follows:
$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
(Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
(Omalizumab)
</li>
</ul>';
using
preg_match_all($regex,$string,$matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); ++$i)
{ echo $i . " " . $matches[0][$i]. "<br>"; }
if I use
$regex = "^(?<=>).*?(?=(\Q</a>\E))^";
I get
1 Nalcrom
2 Alimemazine
3 Xolair
whereas if I use
$regex = "^\(.*?\)^";
I get
1 (Sodium Cromoglicate)
2 (Omalizumab)
Trying
$regex = "^(?<=>).*?(?=(\Q</a>\E))(\(.*?\))^";
and variations upon it I get nothing but blank, whereas what I need is:
1 Nalcrom (Sodium Cromoglicate)
2 Alimemazine
3 Xolair (Omalizumab)
Any ideas on how I can do this? thnx
Make the second regex group optional ?
, i.e.:
$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
(Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
(Omalizumab)
</li>
</ul>';
preg_match_all('%">(.*?)</a>\s+(\(.*?\))?%i', $string, $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[0]); $i++) {
echo $match[1][$i] . " ". $match[2][$i];
}
Output:
Nalcrom (Sodium Cromoglicate)
Alimemazine
Xolair (Omalizumab)
Here's a non regex solution. This gets rid of all HTML then uses new lines as indicators for data points. If the new line starts with a (
it presumes it belongs with the previous point and appends it there.
<?php
$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
(Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
(Omalizumab)
</li>
</ul>';
$new_string = strip_tags($string);
$newlines = explode("
", $new_string);
$count = 0;
$output = '';
foreach($newlines as $newline) {
$newline = trim($newline);
if(!empty($newline)) {
if(preg_match('~^\(~', $newline)) {
$output .= $newline;
} else {
$count++;
if(!empty($output)) {
$output .= "
";
}
$output .= $count . ' ' .$newline . ' ';
}
}
}
echo $output;
Then output is:
1 Nalcrom (Sodium Cromoglicate)
2 Alimemazine
3 Xolair (Omalizumab)
Try the following regex:
@>([^<]+)</a>([^<]*)</li>@ius
In your example $matches[1][0] and trim($matches[2][0]) should return respectively Nalcrom and Sodium Cromoglicate. So you can iterate over your list using the second index.
My example needs trim to keep the regex simple, but in practice you can twist it so that it doesnt capture blank char.