使用PHP Simple HTML DOM刮取数据

问题描述：

I structure like this:

<tr>
    <td>
        <strong>Tel. nr.:</strong>
        +370 000 000
        <strong>Faksas:</strong>
        +370 5 0000
    </td>
</tr>

I new in using Simple HTML DOM. What I need, I need content +370 000 000 and +370 5 0000 . I see that this library does not support Xpath, how can I write a query where I can extract the contents after the HTML link <strong>Tel. nr.:</strong> ?

I found only one way, get HTML and with regex get text from </strong> till <strong>, but maybe Simple HTML DOM have own method for this?

我的结构如下： p>

 ＆lt; tr＆gt; \  n＆lt; td＆gt; 
＆lt; strong＆gt;电话。  nr.:</strong>
 +370 000 000 
＆lt; strong＆gt; Faksas：＆lt; / strong＆gt; 
 +370 5 0000 
＆lt; / td＆gt; 
＆lt; / tr＆gt; 
   code>  pre> 
 
 我是使用Simple HTML DOM的新手。 我需要的是，我需要 +370 000 000  strong>和 +370 5 0000  strong>的内容。 我看到这个库不支持Xpath，如何编写查询，我可以在HTML链接＆lt; strong＆gt; Tel之后提取内容。  nr。：＆lt; / strong＆gt;  code>？   p> 
 
 
我发现只有一种方法，获取HTML并使用正则表达式从＆lt; / strong＆gt;  code>获取文本，直到＆lt; strong＆gt;  code>， 但也许简单的HTML DOM有自己的方法吗？ p> 
  div>

答

Try like this...

   <?php
    require('simple_parser.php');
    $html = str_get_html('
    <tr>
        <td>
            <strong>Tel. nr.:</strong>
            +370 000 000
            <strong>Faksas:</strong>
            +370 5 0000
        </td>
    </tr>');
    $td =$html->find('td',0) ; 
    echo $td->plaintext;

    ?>

Post your full code to get a clear answer

答

You could use ->find('text') in order to get the text nodes:

$sample_html = '
<table>
<tr>
    <td>
        <strong>Tel. nr.:</strong>
        +370 000 000
        <strong>Faksas:</strong>
        +370 5 0000
    </td>
</tr>
</table>
';

$html = str_get_html($sample_html);
foreach($html->find('tr') as $row) {
    $first_td = $row->find('td', 0);
    echo $first_td->find('text', 2);
    echo $first_td->find('text', 4);
}

But this solution is rather clunky. One removal of those newlines on the elements would yield another result.

I suggest use DOMDocument with xpath instead:

$dom = new DOMDocument;
$dom->loadHTML($sample_html);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//tr[1]/td[1]/text()');
foreach($elements as $e) {
    echo trim($e->textContent) . '<br/>';
}

使用PHP Simple HTML DOM刮取数据

相关推荐