从外部URL获取特定内容 - PHP
I am trying to get the direct download link from Google Drive, so I want to get specific content from external url.
Example: visit x link and get the url from this div or from this class! I think that this is possible, but I don't know how to do it.
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://drive.google.com/uc?id=12ejMrVziFpjcEpG9A2Ks4yoNDJ9qz0B5&export=download');
$DOMxpath = new DOMXPath($dom);
$DivContent = $DOMxpath->query("//div[@id='uc-download-link']");
$bigDiv = $DivContent;
$link = $bigDiv->find('a');
echo $link->href . '<br>';
Is that possible with PHP or JavaScript?
Sure, it can be done easily with php's file_get_contents
, DOMDocument
and DOMXPath
.
The following example gets the HREF
value from the 'Stack Overflow' logo's <a>
tag, which has the class -logo js-gps-track
:
$html = file_get_contents('http://stackoverflow.com/');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// find the element whose href value you want by XPath
$nodes = $xpath->query('//*[@class="-logo js-gps-track"]');
foreach($nodes as $href) {
// print out the href value
echo $href->getAttribute( 'href' );
}
Obviously you'd just need to amend the URL and the XPath for your specific use case.
Would it be fair to say that you're trying to scrape links from an external page? If so, there's very popular JS package called Cheerio, which allows you to access elements the same way as jQuery. It runs on Node https://www.npmjs.com/package/cheerio but it seems to be available via CDN at https://www.jsdelivr.com/package/npm/cheerio
Definitely possible with both PHP & JavaScript. Thing is, how you want to approach it. To get the download link from drive by parsing DOM, I will use these packages:
PHP : simple_html_dom package
NODE JS : cheerio
PYTHON : simple request library with bs4
You can just filter with find() to locate download link text section and grab it finally with plaintext() method of simple_html_dom
Example
include('simple_html_dom.php');
$html = file_get_html('gdriveurl');
$target = $html->find('a'):
echo $target->href (This is the download link )
Another easiest solution is xpath