正则表达式:如何提取HTML标题标签[重复]

问题描述:

This question already has an answer here:

Extract all heading tags (h1, h2, h3, ...) and it's content. For example :

<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>

Will be extracted as :

<h1 id="title">This is the title</h1> and <h2 id="subtitle">This is the subtitle</h2>

I'm using PHP and using regex as the title say.

</div>

It is recommended to use the right tool for the task.

$doc = DOMDocument::loadHTML('
    <h1 id="title">This is the title</h1>
    <h2 id="subtitle">This is the subtitle</h2>
    <p>And this is the paragraph</p>
    <p>another tag</p>
');

$xpath = new DOMXPath($doc);  
$heads = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');

foreach ($heads as $tag) {
   echo $doc->saveHTML($tag), "
";
}

Output

<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>