正则表达式:如何提取HTML标题标签[重复]
问题描述:
This question already has an answer here:
Extract all heading tags (h1, h2, h3, ...) and it's content. For example :
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>
Will be extracted as :
<h1 id="title">This is the title</h1>
and <h2 id="subtitle">This is the subtitle</h2>
I'm using PHP and using regex as the title say.
</div>
答
It is recommended to use the right tool for the task.
$doc = DOMDocument::loadHTML('
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>
<p>another tag</p>
');
$xpath = new DOMXPath($doc);
$heads = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');
foreach ($heads as $tag) {
echo $doc->saveHTML($tag), "
";
}
Output
<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>