从外部URL获取特定内容 - PHP

问题描述:

I am trying to get the direct download link from Google Drive, so I want to get specific content from external url.

Example: visit x link and get the url from this div or from this class! I think that this is possible, but I don't know how to do it.

 $dom = new DOMDocument;
 libxml_use_internal_errors(true);
 $dom->loadHTMLFile('https://drive.google.com/uc?id=12ejMrVziFpjcEpG9A2Ks4yoNDJ9qz0B5&export=download');
 $DOMxpath = new DOMXPath($dom);
 $DivContent = $DOMxpath->query("//div[@id='uc-download-link']");
 $bigDiv = $DivContent;
 $link = $bigDiv->find('a');
 echo $link->href . '<br>';

Is that possible with PHP or JavaScript?

Sure, it can be done easily with php's file_get_contents, DOMDocument and DOMXPath.

The following example gets the HREF value from the 'Stack Overflow' logo's <a> tag, which has the class -logo js-gps-track:

$html = file_get_contents('http://stackoverflow.com/');
$dom = new DOMDocument();
libxml_use_internal_errors(true);

$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

// find the element whose href value you want by XPath
$nodes = $xpath->query('//*[@class="-logo js-gps-track"]');

foreach($nodes as $href) {
    // print out the href value
    echo $href->getAttribute( 'href' ); 
}

Obviously you'd just need to amend the URL and the XPath for your specific use case.

Would it be fair to say that you're trying to scrape links from an external page? If so, there's very popular JS package called Cheerio, which allows you to access elements the same way as jQuery. It runs on Node https://www.npmjs.com/package/cheerio but it seems to be available via CDN at https://www.jsdelivr.com/package/npm/cheerio

Definitely possible with both PHP & JavaScript. Thing is, how you want to approach it. To get the download link from drive by parsing DOM, I will use these packages:

PHP : simple_html_dom package

NODE JS : cheerio

PYTHON : simple request library with bs4

You can just filter with find() to locate download link text section and grab it finally with plaintext() method of simple_html_dom

Example

include('simple_html_dom.php');

$html = file_get_html('gdriveurl');

$target = $html->find('a'):

echo $target->href (This is the download link )

Another easiest solution is xpath