HTML解析错误：服务乱序 - 尝试解析网站时

问题描述：

i want to parse a website but i always get an Error: service out of order.

No matter what start or end string i give. I also tried to use an other URL and i copied full examples from other users that works for them but not for me. I also tried to increase the Size to 20000. But nothing is working.

Here is my php-Script:

<?php
// URL, die durchsucht werden soll
$url = "http://cordis.europa.eu/project/rcn/85400_en.html";

// Zeichenfolge vor relevanten Einträgen
$startstring = "<div class='tech'><p>";

// bis zum nächsten html tag bzw. Zeichenfolge nach relevanten Einträgen
$endstring = "<"; 

$file = @fopen ($url,"r");

if($file)
{
    echo "URL found<br>";
}

if (trim($file) == "") {
    echo "Service out of order - File:".$file."<br>";
    } else {
    $i=0;
    while (!feof($file)) {

        // Wenn das File entsprechend groß ist, kann es unter Umständen
        // notwendig sein, die Zahl 2000 entsprechend zu erhöhen. Im Falle
        // eines Buffer-Overflows gibt PHP eine entsprechende Fehlermeldung aus.

        $zeile[$i] = fgets($file,20000);
        $i++;
    }
    fclose($file);
}

// Data filtering

for ($j=0;$j<$i;$j++) {
    if ($resa = strstr($zeile[$j],$startstring)) {
        $resb = str_replace($startstring, "", $resa);
        $endstueck = strstr($resb, $endstring);
        $resultat .= str_replace($endstueck,"",$resb);
        $resultat .= "; ";
    }
}

// Data output

echo ("Result = ".$resultat."<br>");
return $resultat;

Any help is appreciate. thanks in advance

EDIT: The URL is found and file has a value: Resource id #3

我想解析一个网站，但是总是会出现错误：服务乱序。 p> \ n

无论我给出的是什么开始或结束字符串。我还尝试使用其他URL，我复制了其他适用于他们的用户的 full示例但不适合我。我也尝试将大小增加到20000. 但没有任何工作。 p>

这是我的php-Script： p>

 ＆lt;  ？php 
 // URL，die durchsucht werden soll 
 $ url =“http://cordis.europa.eu/project/rcn/85400_en.html";
nn,Zeichenfolge vorrelevantenEinträgen
 $  startstring =“＆lt; div class ='tech'＆gt;＆lt; p＆gt;”; 
 
 // biszumnächstenhtmltag bzw.  Zeichenfolge nachrelevantenEinträgen
 $ endstring =“＆lt;”;  
 
 $ file = @fopen（$ url，“r”）; 
 
if（$ file）
 {
 echo“找到的URL＆lt; br＆gt;”; 
} 
 
if（trim（  $ file）==“”）{
 echo“服务乱序 - 文件：”。$ file。“＆lt; br＆gt;”; 
} else {
 $ i = 0; 
 while（！feof  （$ file））{
 
 // Wenn das Fileentsprechendgroßist，kann esunterUmständen
 // notwendig sein，die Zahl 2000 entsprechendzuerhöhen。  Im Falle 
 // eines Buffer-Overflows gibt PHP eine entsprechende Fehlermeldung aus。
 
 $ zeile [$ i] = fgets（$ file，20000）; 
 $ i ++; 
} 
 fclose（$ file  ）; 
} 
 
 //数据过滤
 
for（$ j = 0; $ j＆lt; $ i; $ j ++）{
 if（$ resa = strstr（$ zeile [$ j]，$  startstring））{
 $ resb = str_replace（$ startstring，“”，$ resa）; 
 $ endstueck = strstr（$ resb，$ endstring）; 
 $ resultat。= str_replace（$ endstueck，“”，$  resb）; 
 $ resultat。=“;”; 
} 
} 
 
 //数据输出
 
echo（“Result =”。$ resultat。“＆lt; br＆gt;”）; 
return  $ resultat; 
  code>  pre> 
 
 感谢任何帮助。
 
提前预订 p> 
 
 
编辑：找到网址并且文件包含 value：资源ID＃3  p> 
  div>

答

Use this it will give expected output.

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://cordis.europa.eu/project/rcn/85400_en.html");
curl_setopt($ch, CURLOPT_GET, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_ENCODING, ''); 

$headers = array();
$headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$headers[] = 'Accept-Encoding:gzip, deflate, sdch';
$headers[] = 'Accept-Language:en-US,en;q=0.8';
$headers[] = 'Cache-Control:max-age=0';
$headers[] = 'Connection:keep-alive';
$headers[] = 'Cookie:CORDIS=14.141.177.158.1441621012200552; PHPSESSID=jrf2e3t4vu56acdkf9np0tat06; WT_FPC=id=14.141.177.158-1441621016.978424:lv=1441605951963:ss=1441604805004
Host:cordis.europa.eu';
$headers[] = 'User-Agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36';
$headers[] = 'Host:cordis.europa.eu';
$headers[] = 'Request URL:http://cordis.europa.eu/project/rcn/85400_en.html';

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$server_output = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument;
$dom->loadHTML($server_output);
$xpath = new DomXpath($dom);

$div = $xpath->query("//*[@class='tech']")->item(0);
$data = trim($div->textContent);
echo $data;
?>

Output

答

Try

<?php
// URL, die durchsucht werden soll
$url = "http://cordis.europa.eu/project/rcn/85400_en.html";

$html = file_get_contents($url);

if ($html === false) {
    //Service unavailable
    echo 'Service unavailable';
    return;
}
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DomXpath($dom);

$div = $xpath->query("//*[@class='tech']")->item(0);
$output = trim($div->textContent);

// Data output

echo ("Result = " . $output. "<br>");
return $output;

HTML解析错误：服务乱序 - 尝试解析网站时

相关推荐