PHP preg_replace:将文本中的所有锚标记替换为带有正则表达式的href值

PHP preg_replace:将文本中的所有锚标记替换为带有正则表达式的href值

问题描述:

I want to replace all anchor tags within a text with their href value, but my pattern does not work right.

$str = 'This is a text with multiple anchor tags. This is the first one: <a href="https://www.link1.com/" title="Link 1">Link 1</a> and this one the second: <a href="https://www.link2.com/" title="Link 2">Link 2</a> after that a lot of other text. And here the 3rd one: <a href="https://www.link3.com/" title="Link 3">Link 3</a> Some other text.';
$test = preg_replace("/<a\s.+href=['|\"]([^\"\']*)['|\"].*>[^<]*<\/a>/i",'\1', $str);
echo $test;

At the end the text should look like this:

This is a text with multiple anchor tags. This is the first one: https://www.link1.com/ and this one the second: https://www.link2.com/ after that a lot of other text. And here the 3rd one: https://www.link3.com/ Some other text.

Thank you very much!

我想用文本中的href值替换文本中的所有锚标签,但我的模式不能正常工作。 p>

  $ str ='这是一个包含多个锚标签的文本。 这是第一个:&lt; a href =“https://www.link1.com/”title =“Link 1”&gt; Link 1&lt; / a&gt; 这是第二个:&lt; a href =“https://www.link2.com/”title =“Link 2”&gt; Link 2&lt; / a&gt; 之后很多其他文字。 这里是第3个:&lt; a href =“https://www.link3.com/”title =“Link 3”&gt; Link 3&lt; / a&gt; 其他一些文字。'; 
 $ test = preg_replace(“/&lt; a \ s。+ href = ['| \”]([^ \“\'] *)['| \”]。*&gt;  [^&lt;] *&lt; \ / a&gt; / i“,'\ 1',$ str); 
echo $ test; 
  code>  pre> 
 
 

最后 文本应如下所示: p>

 这是一个包含多个锚标签的文本。这是第一个:https://www.link1.com/,这个是 第二个:https://www.link2.com/之后有很多其他文字。这里是第三个:https://www.link3.com/其他一些文字。
  code>  pre>  
 
 

非常感谢! p> div>

Just don't.

Use a parser instead.

$dom = new DOMDocument();
// since you have a fragment, wrap it in a <body>
$dom->loadHTML("<body>".$str."</body>");
$links = $dom->getElementsByTagName("a");
while($link = $links[0]) {
    $link->parentNode->insertBefore(new DOMText($link->getAttribute("href")),$link);
    $link->parentNode->removeChild($link);
}
$result = $dom->saveHTML($dom->getElementsByTagName("body")[0]);
// remove <body>..</body> wrapper
$output = substr($result, strlen("<body>"), -strlen("</body>"));

Demo on 3v4l

Simpler perhaps not, but safer is to loop the string with strpos to find and cut the string and remove the html.

$str = 'This is a text with multiple anchor tags. This is the first one: <a class="funky-style" href="https://www.link1.com/" title="Link 1">Link 1</a> and this one the second: <a href="https://www.link2.com/" title="Link 2">Link 2</a> after that a lot of other text. And here the 3rd one: <a href="https://www.link3.com/" title="Link 3">Link 3</a> Some other text.';

$pos = strpos($str, '<a');

while($pos !== false){
    // Find start of html and remove up to link (<a href=")
    $str = substr($str, 0, $pos) . substr($str, strpos($str, 'href="', $pos)+6);
    // Find end of link and remove that.(" title="Link 1">Link 1</a>)
    $str = substr($str, 0, strpos($str,'"', $pos)) . substr($str, strpos($str, '</a>', $pos)+4);
    // Find next link if possible
    $pos = strpos($str, '<a');
}
echo $str;

https://3v4l.org/vdN7E

Edited to handle different order of a a-tag.

In case you're still set on regex, this should work:

preg_replace("/<a\s+href=['\"]([^'\"]+)['\"][^\>]*>[^<]+<\/a>/i",'$1', $str);

But you're probably better off with a solution like what Andreas posted.

FYI: the reason your previous regex didn't work was this little number:

.*>

Because . selects everything you ended up matching everything past the url to be replaced; all the way to the end. This is why it appeared to only select and replace the first anchor tag it found and cut off the rest.

Changing that to

[^\>]*

Ensures that this particular selection is constrained to only the portion of the string which exists between the url and the ending bracket of the a tag.