解析HTML和获取所有h3之后h2之前的h2使用PHP

问题描述：

我正在寻找文章中的第一个h2。一旦找到，寻找所有h3，直到找到下一个h2。冲洗并重复，直到找到所有标题和副标题。

I am looking to find the first h2 in the article. Once found, look for all h3's until the next h2 is found. Rinse and repeat until all headings and subheadings have been located.

在立即将此问题标记或关闭为重复解析问题之前，请先记下问题标题，因为这不是基本的节点检索。我已经把那部分了。

Before you immediately flag or close this question as duplicate parsing question, please take note of the question title, as for this isn't about basic node retrieval. I've got that part down.

我正在使用 DOMDocument 使用 DOMDocument :: loadHTML（） ， DOMDocument :: getElementsByTagName（） 和 DOMDocument :: saveHTML（） 来检索文章的重要标题。

I am using DOMDocument to parse HTML using DOMDocument::loadHTML(), DOMDocument::getElementsByTagName() and DOMDocument::saveHTML() to retrieve the important headings of an article.

我的代码如下：

$matches = array();
$dom = new DOMDocument;
$dom->loadHTML($content);
foreach($dom->getElementsByTagName('h2') as $node) {
    $matches['heading-two'][] = $dom->saveHtml($node);
}
foreach($dom->getElementsByTagName('h3') as $node) {
    $matches['heading-three'][] = $dom->saveHtml($node);
}
if($matches){
    $this->key_points = $matches;
}

这给了我一些输出：

array(
    'heading-two' => array(
        '<h2>Here is the first heading two</h2>',
        '<h2>Here is the SECOND heading two</h2>'
    ),
    'heading-three' => array(
        '<h3>Here is the first h3</h3>',
        '<h3>Here is the second h3</h3>',
        '<h3>Here is the third h3</h3>',
        '<h3>Here is the fourth h3</h3>',
    )
);

我正在寻找一些更像：

array(
    '<h2>Here is the first heading two</h2>' => array(
        '<h3>Here is an h3 under the first h2</h3>',
        '<h3>Here is another h3 found under first h2, but after the first h3</h3>'
    ),
    '<h2>Here is the SECOND heading two</h2>' => array(
        '<h3>Here is an h3 under the SECOND h2</h3>',
        '<h3>Here is another h3 found under SECOND h2, but after the first h3</h3>'
    )
);

我不完全正在寻找代码完成（如果你觉得这样做会更好地帮助别人） - 去吧），但是或多或少的正确方向的指导或建议来完成像上面直接上面的嵌套数组。

I'm not exactly looking for code completion (if you feel it would better help others by doing so -- go ahead), but more or less guidance or advice in the right direction to accomplish a nested array like directly above above.

答

p>我假设所有标题在DOM中都是相同的，所以每个h3都是h2的兄弟。有了这个假设，你可以遍历h2的兄弟姐妹，直到遇到下一个h2：

I assume that all headings are on the same level in DOM, so every h3 is sibling of h2. With that assumption , you can iterate over siblings of h2 until next h2 is encountered:

foreach($dom->getElementsByTagName('h2') as $node) {
    $key = $dom->saveHtml($node);
    $matches[$key] = array();
    while(($node = $node->nextSibling) && $node->nodeName !== 'h2') {
        if($node->nodeName == 'h3') {
            $matches[$key][] = $dom->saveHtml($node);   
        }
    }
}

解析HTML和获取所有h3之后h2之前的h2使用PHP

相关推荐