从Google图片获取结果时如何获取20张以上的图片?

问题描述:

下面的脚本从Google获取图像,它仅获得$ page变量中指定的页面的20张图像。

The script below fetches images from Google, it get only 20 images of the page specified in $page variable.

我不知道为什么它得到了准确的图像20个结果以及如何更改此值,例如显示100张第一张图像

I didn't figure out why it's getting exactly 20 result and how can i change this value to be larger, to display 100 first images for example

<?php


// Image sizes
define ('GIS_LARGE', 'l');
define ('GIS_MEDIUM', 'm');
define ('GIS_ICON', 'i');
define ('GIS_ANY', '');

// Image types
define ('GIS_FACE', 'face');
define ('GIS_PHOTO', 'photo');
define ('GIS_CLIPART', 'clipart');
define ('GIS_LINEART', 'lineart');

function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}


function googleImageSearch ($query, $page = 1, $size = GIS_ANY, $type = GIS_ANY)
{

$retVal = array();

// Get the search results page


$response = get_data("http://images.google.com/images?hl=en&q=" . urlencode ($query) . '&imgsz=' . $size . '&imgtype=' . $type . '&start=' . (($page - 1) * 21));

// Extract the image information. This is found inside of a javascript call to setResults
preg_match('/\<table class=\"images_table\"(.*?)\>(.*?)\<\/table\>/is', $response, $match);

if (isset($match[2])) {

    // Grab all the arrays
    preg_match_all('/\<td(.*?)\>(.*?)\<\/td\>/', $match[2], $m);

    foreach ($m[2] as $item) {

        // List of expressions used to grab all our info
        $info = array(
            'resultLink' => '\<a href=\"(.*?)\"',
            'source' => 'imgurl=(.*?)&amp;',
            'title' => '\<br\/\>(.*?)\<br\/\>([\d]+)',
            'width' => '([\d]+) &times;',
            'height' => '&times; ([\d]+)',
            'type' => '&nbsp;-([\w]+)',
            'size' => ' - ([\d]+)',
            'thumbsrc' => 'src="(.*?)"',
            'thumbwidth' => 'width="([\d]+)"',
            'thumbheight' => 'height="([\d]+)"',
            'domain' => '\<cite title="(.*?)"\>'
        );

        $t = new stdClass;
        $t->thumb = new stdClass;
        foreach ($info as $prop => $expr) {
            if (preg_match('/' . $expr . '/is', $item, $m)) {
                $value = 'title' == $prop ? str_replace(array('<b>', '</b>'), '', $m[1]) : $m[1];

                // Thumb properties go under the thumb object
                if (0 === strpos($prop, 'thumb')) {
                    $prop = str_replace('thumb', '', $prop);
                    $t->thumb->$prop = $value;
                } else {
                    $t->$prop = $value;
                }

                // Nicey up the google images result url
                if ('resultLink' == 'resultLink') {
                    $t->resultLink = 'http://images.google.com' . $t->resultLink;
                }

            }
        }

        $retVal[] = $t;

    }

}

return $retVal;

}

其中的代码行告诉脚本获取20张图片?

Where is the line of code that tells the script to get 20 images ?

任何帮助将不胜感激。

嗯,您不能。该脚本正在从标准版本的Google图片中获取结果,并且无法更改每页的结果。您唯一可以做的就是请求五次以总共拥有100张图像。

Well, you can't. The script is fetching results from standard version of Google images and it has no option to change results per page. The only thing you can do is to request five times to have 100 images in all.

更新:要不断更新附加图片,只需使用 +运算符即可。

Update: To keep updating appending the images just use '+' operator. Like,

$image = array();

for( $i = 1; $i <= 5; $i++ )
     $image += googleImageSearch ($query, $page = 1, $size = GIS_ANY, $type = GIS_ANY);

当心,如果您不愿意隐藏请求,或者Google怀疑自动请求,可能会遇到此页面。

Beware, if you are not clever to conceal your request, or google is suspicious of automated request you are likely to encounter this page.