循环使用简单HTML DOM的表

问题描述:

我使用简单的HTML DOM从HTML文档中提取数据,我有几个问题需要一些帮助。

I'm using Simple HTML DOM to extract data from a HTML document, and I have a couple of issues that I need some help with.


  1. 在以开头的行上,如果($ td-> find('a'))我想提取href和锚节点分开,并将它们放在单独的变量中。然而,代码不起作用(参见下面的代码中的回声输出)。

  1. On the line that begins with if ($td->find('a')) I want to extract the href and the content of the anchor node separately, and place them in separate variables. The code however doesn't work (see output from echoes in the code below).

最好的方法是什么?请注意,我的目的是在以后的信息中创建一个XML文档,所以我需要正确的顺序信息。

What is the best way to do this? Note that my purpose is to create a XML document out of the information later on, so I need the information in the correct order.

链接导致页面包含有关不同车辆的详细信息(例如最大速度,价格等),我也想提取并放入单独的变量。如何获取这些页面上的数据?

The links leads to pages containing detailed information about the different cars (e.g. "Max speed", "Price" etc) that I also want to extract and put into separate variables. How can I get hold of data on these pages?

<?php
include 'simple_html_dom.php';

$html = new simple_html_dom();
$html = file_get_html('http://www.example.com/foo.html');

$items = array();

foreach ($html->find('table') as $table) {
    foreach ($table->find('tr') as $tr) {

        foreach ($tr->find('td') as $td) {

            if ($td->find('a')) {
                $link = $td->find('a.href');
                echo $link;  // empty

                $text = $td->find('a.text');
                echo $text; // Array
            }
            else {
                echo 'Name: ' . $td;
            }
        }
    }
}


HTML文档如下所示:

The HTML document looks like this:

<div>
    <table>
        <tr>
            <td>
                <a href="car1.html" target="_blank">Car 1</a>
            </td>
            <td>
                Porsche
            </td>
        </tr>
        <tr>
            <td>
                <a href="car2.html" target="_blank">Car 2</a>
            </td>
            <td>
                Chrysler
            </td>
        </tr>
        ... and so on...


p>使用 $ td-> find('a',0) - > href $ td-> find('a' ,0) - > innertext 以访问第一种情况下的元素属性,以及第二种内容。此外,如果可能有多个锚点,请使用0作为安全警卫总是获得第一个。

Use $td->find('a', 0)->href and $td->find('a', 0)->innertext to access element attributes in the first case, and contents in the second. Also, if there might be multiple anchor to be found, use 0 as a safe guard to always get the first one.