php基于getAttribute从html中删除标签

问题描述:

How can I limit which link tag is removed by specifying $tag->getAttribute('rel') = "icon"? I tried adding a simple if statement to the $remove[] $tags as $tag; line...code ran through, but the link with rel="icon" line was not at all removed.

So in this example the whole link tag should be removed from the html:

<link rel="icon" type="image/png" href="/images/favicon.ico" />


$html = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('link');

$remove = [];
foreach($tags as $tag) {
    $remove[] = $tag;
}

foreach ($remove as $tag) {
    $tag->parentNode->removeChild($tag); 
}

UPDATE Answer here: @prodigitalson provided the following which initially did not work:

$html = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DOMXpath($dom);
$tags = $finder->query('//link[@rel="icon"]');

foreach ($tags as $tag)
{
$tag->parentNode->removeChild($tag); 
}

by adding the following line as the last line of the code...worked perfect.

$html = $dom->saveHTML();

如何通过指定 $ tag-&gt; getAttribute('rel'来限制删除哪个链接标记 )=“icon” code>? 我尝试将一个简单的if语句添加到 $ remove [] $标签作为$ tag; code> line ...代码运行,但链接 rel =“icon” code> 线路根本没有被删除。 p>

因此,在此示例中,应从html中删除整个链接标记: p>

 &lt; link rel =“icon”type =  “image / png”href =“/ images / favicon.ico”/&gt; 
 
 
 $ html = file_get_contents($ url); 
 $ dom = new DOMDocument(); 
 $ dom-&gt;  loadHTML($ html); 
 
 $ tags = $ dom-&gt; getElementsByTagName('link'); 
 
 $ remove = []; 
foreach($ tags as $ tag){
 $ remove [  ] = $ tag; 
} 
 
foreach($ remove as $ tag){
 $ tag-&gt; parentNode-&gt; removeChild($ tag);  
} 
  code>  pre> 
 
 

更新答案: @ prodigitalson提供了以下最初不起作用的内容: p>

  $ html = file_get_contents($ url); 
 $ dom = new DOMDocument(); 
 $ dom-&gt; loadHTML($ html); 
 $ finder = new DOMXpath($ dom); 
 $ tags =  $ finder-&gt; query('// link [@ rel =“icon”]'); 
 
foreach($ tags as $ tag)
 {
 $ tag-&gt; parentNode-&gt; removeChild($ 标签);  
   pre> 
 
 

将以下行添加为代码的最后一行...工作正常。 p>

  $ html = $ dom-&gt; saveHTML(); 
  code>  pre> 
  div>

You can get these all with an xpath:

$html = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DOMXpath($dom);
$tags = $finder->query('//link[@rel="icon"]');
$toRemove = array();

foreach ($tags as $tag)
{
  $toRemove[] = $tag;
}

// with array walk
array_walk(function($elem) { $elem->parentNode->removeChild($elem); }, $toRemove);

// with foreach
foreach ($toRemove as $tag) {
  $tag->parentNode->removeChild($tag);
}

You can use easy way with function str_replace:

<?php

//$html = file_get_contents($url);

$html = '<a rel="icon" href="#">link</a>';
$html = str_replace('rel="icon"', 'rel=""', $html);

echo $html;
?>