如何在PHP中捕获与可选空格的链接？ [重复]

问题描述：

This question already has an answer here:

How do you parse and process HTML/XML in PHP? 30 answers

From a file_get_contents I get the HTML code of a url.

$html = file_get_contents($url);

Now I would like to capture the href link.

The HTML code is:

<li class="four-column mosaicElement">
<a href="https://example.com" title="Lorem ipsum">
...
</a>
</li>
<li class="four-column mosaicElement">
<a href="https://example.org" title="Lorem ipsum">
...
</a>
</li>

So I'm using this:

preg_match_all('/class=\"four-column mosaicElement\"><a href=\"(.+?)\" title=\"(.+?)"/m', $html, $urls, PREG_SET_ORDER, 0);

foreach ($urls as $key => $url) {
    echo $url[1];
}

How do I solve this problem?

</div>

此问题已经存在这里有一个答案： p>

如何在PHP中解析和处理HTML / XML？ \ n 30 answers span> li> ul> div>

从file_get_contents code>我得到一个网址的HTML代码。 p>

  $ html = file_get_contents（$ url）; 
  code>  pre> 
 
 现在我想捕获 href  code>链接。 p> 
 
 
 HTML代码是： p> 
 
 

 ＆lt;  li class =“four-column mosaicElement”＆gt; 
＆lt; a href =“https://example.com”title =“Lorem ipsum”＆gt; 
 ... 
＆lt; / a＆gt; 
＆lt; / li＆gt  ; 
＆lt; li class =“four-column mosaicElement”＆gt; 
＆lt; a href =“https://example.org”title =“Lorem ipsum”＆gt; 
。  .. 
＆lt; / a＆gt; 
＆lt; / li＆gt; 
  code>  pre> 
 
 所以我正在使用它： p> 
 
 
  preg_match_all（'/ class = \“four-column mosaicElement \”＆gt;＆lt; a href = \“（。+？）\”title = \“（。+？）”/ m'，$ html，$  urls，PREG_SET_ORDER，0）; 
 
foreach（$ urls as $ key =＆gt;  $ url）{
 echo $ url [1]; 
} 
  code>  pre> 
 
 如何解决此问题？ p> 
  div>

答

Here, we can also use an expression with positive lookahead and optional spaces, just in case,

(?=class="four-column mosaicElement")[\s\S]*?href="\s*(https?[^\s]+)\s*"

and our desired URLs are in this group:

(https?[^\s]+)

DEMO

TEST

$re = '/(?=class="four-column mosaicElement")[\s\S]*?href="\s*(https?[^\s]+)\s*"/m';
$str = '<li class="four-column mosaicElement">
<a href="https://example.com" title="Lorem ipsum">
...
</a>
</li>
<li class="four-column mosaicElement">
<a href="https://example.org" title="Lorem ipsum">

<li class="four-column mosaicElement">
<a href="   https://example.org   " title="Lorem ipsum">

<li class="four-column mosaicElement">
<a href="   https://example.org                " title="Lorem ipsum">
...
</a>
</li>
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $key => $url) {
    echo $url[1] . "
";
}

Output

https://example.com
https://example.org
https://example.org
https://example.org

RegEx Circuit

jex.im visualizes regular expressions:

答

I was able to get your code working by just modify the regex pattern to the following:

class="four-column mosaicElement">\s*<a href="(.+?)" title="(.+?)"
                                 ^^^^^

Note carefully that I allow for any amount of whitespace between the class attribute from the outer tag (<li>) and the inner anchor.

Here is your updated script:

$html = "<li class=\"four-column mosaicElement\">
<a href=\"https://example.com\" title=\"Lorem ipsum\">
</a>
</li>
<li class=\"four-column mosaicElement\">
<a href=\"https://example.org\" title=\"Lorem ipsum\">
</a>
</li>";
preg_match_all('/class="four-column mosaicElement">\s*<a href="(.+?)" title="(.+?)"/m', $html, $urls, PREG_SET_ORDER, 0);

foreach ($urls as $key => $url) {
    echo $url[1] . "
";
}

This prints:

https://example.com
https://example.org

答

Another option is to use DOMXPath with an xpath expression that finds all list items with both class names and then gets the anchors:

//li[contains(@class, 'four-column') and contains(@class, 'mosaicElement')]/a

For example:

$string = <<<DATA
<li class="four-column mosaicElement">
<a href="https://example.com" title="Lorem ipsum">
</a>
</li>
<li class="four-column mosaicElement">
<a href="https://example.org" title="Lorem ipsum">
</a>
</li>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($string);
$xpath = new DOMXpath($dom);

foreach($xpath->query("//li[contains(@class, 'four-column') and contains(@class, 'mosaicElement')]/a") as $v) {
    echo $v->getAttribute("href") . PHP_EOL;
}

Result

https://example.com
https://example.org

See a php demo

如何在PHP中捕获与可选空格的链接？ [重复]

DEMO

TEST

Output

RegEx Circuit

相关推荐