使用PHP Simple HTML DOM解析器获取文本
我正在使用PHP Simple HTML DOM解析器从网页获取文本. 我需要处理的页面是这样的:
i'm using PHP Simple HTML DOM Parser to get text from a webpage. The page i need to manipulate is something like:
<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>
我需要获取h1
元素和没有标签的文本.
要获得h1
,我使用以下代码:
I need to get the h1
element and the text that has no tags.
to get the h1
i use this code:
$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}
但是其他文字呢? 我也在foreach中尝试过此方法,但我得到了全文:
But the other text? I also tried this into the foreach but i get the full text:
$text->plaintext;
但它还返回了H1
标记...
but it returned also the H1
tag...
使用剥离标签,如@Peachy所指出的.但是,向其传递第二个参数<br>
意味着字符串将忽略 <br>
标签,这是不必要的.就您而言,
Use strip tags, as @Peachy pointed out. However, passing it a second argument <br>
means string will ignore <br>
tags, which is unnecessary. In your case,
<?php
strip_tags($text);
?>
假设您只选择content
ID中的内容,
就可以正常工作.
would work as you'd like, given that you are only selecting content in the content
id.