使用PHP Simple HTML DOM解析器获取文本

使用PHP Simple HTML DOM解析器获取文本

问题描述:

我正在使用PHP Simple HTML DOM解析器从网页获取文本. 我需要处理的页面是这样的:

i'm using PHP Simple HTML DOM Parser to get text from a webpage. The page i need to manipulate is something like:

<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>

我需要获取h1元素和没有标签的文本. 要获得h1,我使用以下代码:

I need to get the h1 element and the text that has no tags. to get the h1 i use this code:

$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}

但是其他文字呢? 我也在foreach中尝试过此方法,但我得到了全文:

But the other text? I also tried this into the foreach but i get the full text:

$text->plaintext;

但它还返回了H1标记...

but it returned also the H1 tag...

使用剥离标签,如@Peachy所指出的.但是,向其传递第二个参数<br>意味着字符串将忽略 <br>标签,这是不必要的.就您而言,

Use strip tags, as @Peachy pointed out. However, passing it a second argument <br> means string will ignore <br> tags, which is unnecessary. In your case,

<?php
    strip_tags($text);
?>

假设您只选择content ID中的内容,

就可以正常工作.

would work as you'd like, given that you are only selecting content in the content id.