athlon1600/serpscraper

PHP 驱动的接口,用于查询最受欢迎的搜索引擎

v4.0.1 2024-08-31 21:33 UTC

This package is auto-updated.

Last update: 2024-08-31 21:38:24 UTC


README

GitHub Workflow Status PHP Versions

SerpScraper

该库的目的是提供一个简单、不易被发现、并能抵抗验证码的从像 Google 和 Bing 这样的流行搜索引擎提取搜索结果的方法。

安装

推荐通过 Composer 安装此软件

composer require athlon1600/serpscraper "^4.0"

从 Google 提取搜索结果

<?php

use SerpScraper\Engine\GoogleSearch;

$page = 1;

$google = new GoogleSearch();

// all available preferences for Google
$google->setPreference('results_per_page', 100);
//$google->setPreference('google_domain', 'google.lt');
//$google->setPreference('date_range', 'hour');

$results = array();

do {

	$response = $google->search("how to scrape google", $page);
	
	// error field must be empty otherwise query failed
	if(empty($response->error)){
	
		$results = array_merge($results, $response->results);
		$page++;
		
	} else if($response->error == 'captcha'){
	    
	    // read below
	    break;
	}

} while ($response->has_next_page);

自动解决 Google 搜索验证码

为了使其正常工作,您需要在 2captcha.com 注册服务并获取 API 密钥。强烈建议使用代理服务器。
在此处安装您自己的 VPS 上的私有代理服务器
https://github.com/Athlon1600/useful#squid

<?php

use SerpScraper\Engine\GoogleSearch;
use SerpScraper\GoogleCaptchaSolver;

$google = new GoogleSearch();

$browser = $google->getBrowser();
$browser->setProxy('PROXY:IP');

$solver = new GoogleCaptchaSolver($browser);

while(true){
    $response = $google->search('famous people born in ' . mt_rand(1500, 2020));
    
    if ($response->error == 'captcha') {

        echo "Captcha detected!" . PHP_EOL;
        
        $temp = $solver->solveUsingTwoCaptcha($response, '2CAPTCHA_API_KEY', 90);

        if ($temp->status == 200) {
            echo "Captcha solved successfully!" . PHP_EOL;
        } else {
            echo 'Solving captcha has failed...' . PHP_EOL;
        }

    } else {
        echo "OK. ";
    }
    
    sleep(2);
}

从 Bing 提取搜索结果

<?php

use SerpScraper\Engine\BingSearch;

$bing = new BingSearch();
$results = array();

for($page = 1; $page < 10; $page++){
	
	$response = $bing->search("search bing using php", $page);
	if($response->error == false){
		$results = array_merge($results, $response->results);
	}
	
	if($response->has_next_page == false){
		break;
	}
}

var_dump($results);