试图使用PHP解析网页

试图使用PHP解析网页

问题描述:

I am trying to parse a webpage and print out a table which is on the webpage. I am using php_simple_html dom parser. However, when I try to parse the table off the webpage, all the javascript commands to output the table get turned into comments within the php:

<html>
<script type="text/javascript" src="jquery.js"></script>
<?php
    include 'crawling/simple_html_dom.php';
    $html = file_get_html('http://uiucfreefood.com/');


    $ret = $html->find('body', 0)->find('div', 10)->find('table',0); //gets to the table tag
    echo $ret; // nothing is echoed out because the original webpage uses jscript commands to write the table to the page but these commands get turned to comments for some reason.
?>
</html>

When I inspect the element of the page where I am echoing the parsed information I am able to see that the table tag with all the info is in there but the jscript commands have been turned into comments. Is there a way for me to just grab the info and echo it out myself? I tried adding another ->find('tbody'); at the end of the parse command but it doesn't do anything. Any advice is appreciated. Thanks.

EDIT: You can try this code out yourself if you download the simple_html_dom.php and include it in your php file. Source: http://sourceforge.net/projects/simplehtmldom/files/

EDIT: Just noticed something really important. The javascript commands are commented out in the original webpage also. Instead, the original webpage is using a javascript function to print out the table which I do not have defined. Writing that function myself should fix the issue.

EDIT: yup, that worked.

我正在尝试解析网页并打印出网页上的表格。 我正在使用php_simple_html dom解析器。 但是,当我尝试从网页上解析表格时,输出表格的所有javascript命令都会变成php中的注释: p>

 &lt; html&gt; 
&lt;  script type =“text / javascript”src =“jquery.js”&gt;&lt; / script&gt; 
&lt;?php 
 include'crawling / simple_html_dom.php'; 
 $ html = file_get_html('http://  uiucfreefood.com/');
nnnn ret = $ html-&gt; find('body',0) - &gt; find('div',10) - &gt; find('table',0  );  //获取表格标签
 echo $ ret;  //没有任何内容被回显,因为原始网页使用jscript命令将表格写入页面,但这些命令由于某种原因转向评论。
?&gt; 
&lt; / html&gt; 
  code>   pre> 
 
 

当我检查我正在回显解析信息的页面元素时,我能够看到包含所有信息的表标记在那里,但是jscript命令已经变成了注释。 有没有办法让我抓住信息并自己回应? 我尝试添加另一个 - &gt; find('tbody'); 在parse命令的末尾但它没有做任何事情。 任何建议表示赞赏。 谢谢。 p>

编辑:如果您下载simple_html_dom.php并将其包含在您的php文件中,您可以自己尝试使用此代码。 资料来源: http://sourceforge.net/projects/simplehtmldom/files/ p>

编辑:刚刚注意到一些非常重要的事情。 javascript命令也在原始网页中注释掉了。 相反,原始网页使用javascript函数打印出我没有定义的表。 自己写这个功能应该解决这个问题。 p>

编辑:是的,有效。 p> div>

Try using file_get_content instead of get HTML and see if that works. Honestly, depending on your needs, you should code your own parser. It is not that hard to write a parser for the table scan and display.

You will just need the following;

$array = split("<table>", $content);
$boolPlaceHolder = false;

and you can then set the placeholder to true when you encounter this way you can scan through the chars of the content and grab the table.

Hope this helps.