如何使用DOMDocument排除body标签中的特定html块?

如何使用DOMDocument排除body标签中的特定html块?

问题描述:

I'm using DOMDocument to get the HTML from a website. I want to get html within the <body></body> and I got it. But inside body here is a <nav>...</nav> block. How can I exclude <nav></nav> block only by using DOMDocument.

Here is my Code:

<!DOCTYPE html>
<head>
    <title>Title Here</title>
<head>
<?php
  $d = new DOMDocument;
  $mock = new DOMDocument;
  $internalErrors = libxml_use_internal_errors(true);
  $d->loadHTML(file_get_contents('http://www.example.com'));
  $body = $d->getElementsByTagName('body')->item(0);
  foreach ($body->childNodes as $child){
      $mock->appendChild($mock->importNode($child, true));
  }
  libxml_use_internal_errors($internalErrors);
  echo $mock->saveHTML(); //<body>.....</body>
?>
</html>

我正在使用 DOMDocument code>从网站获取HTML。 我想在&lt; body&gt;&lt; / body&gt; code>中获取html,然后我就明白了。 但是在 body code>里面是&lt; nav&gt; ...&lt; / nav&gt; code>块。 如何仅使用DOMDocument排除&lt; nav&gt;&lt; / nav&gt; code>块。 p>

这是我的代码: p>

 &lt;!DOCTYPE html&gt; 
&lt; head&gt; 
&lt; title&gt; Title Here&lt; / title&gt; 
&lt; head&gt; 
&lt;?php 
 $ d = new DOMDocument; 
 $  mock = new DOMDocument; 
 $ internalErrors = libxml_use_internal_errors(true); 
 $ d-&gt; loadHTML(file_get_contents('http://www.example.com')); 
 $ body = $ d-&gt;  getElementsByTagName('body') - &gt; item(0); 
 foreach($ body-&gt; childNodes as $ child){
 $ mock-&gt; appendChild($ mock-&gt; importNode($ child,true)  ); 
} 
 libxml_use_internal_errors($ internalErrors); 
 echo $ mock-&gt; saveHTML();  //&lt ;body>.....</body>
?>
</html>
div>

Please look at the accepted answer on this one, PHP DOM: Get NodeValue excluding the child nodes

You can remove 'nav' node just after gathering all child nodes of the body.