katsana/dusk-crawler

使用 Laravel Dusk 的网络爬虫

v0.1.6 2020-10-21 13:19 UTC

This package is auto-updated.

Last update: 2024-09-15 13:04:03 UTC


README

tests Latest Stable Version Total Downloads Latest Unstable Version License

Laravel Dusk 允许开发者运行浏览器自动化,但它缺乏根据浏览器接收到的响应进行导航的能力。如果您需要处理失败,您必须等待超时到期并处理通用异常。

Dusk Crawler 通过添加 inspectUsing() 方法来解决这个问题,允许开发者使用 ReactPHP 的 Promise 检查成功或失败状态。

安装

Dusk Crawler 可以通过 composer 安装

composer require "katsana/dusk-crawler"

用法

Dusk Crawler 只向 Laravel\Dusk\Browser 添加了两个新的宏

示例

假设您想爬取 Packagist 搜索某些包,并且输入是动态的。

use DuskCrawler\Dusk;
use DuskCrawler\Inspector;
use DuskCrawler\Exceptions\InspectionFailed;
use Laravel\Dusk\Browser;

function searchPackagist(string $packagist) {
    $dusk = new Dusk('search-packagist');

    $dusk->headless()->disableGpu()->noSandbox();
    $dusk->userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36');

    $dusk->start();

    $dusk->browse(function ($browser) use ($packagist) {
        $browser->visit('https://packagist.org.cn/');

        $promise = $browser->type('search_query[query]', $packagist, '{enter}')
            ->inspectUsing(15, function (Browser $browser, Inspector $inspector) {
                $searchList = $browser->resolver->findOrFail('.search-list');

                if (! $searchList->isDisplayed() || $searchList->getText() == '') {
                    // result not ready, just return false.
                    return false;
                }

                if ($searchList->getText() == 'No packages found.') {
                    return $inspector->abort('No packages found!');
                }

                return $inspector->resolve();
            });

        $promise->then(function ($browser) {
            // Crawl the page on success.
            $packages = $browser->crawler()
              ->filter('div.package-item')->each(function ($div) {
                return $div->text();
            });
      
            dump($packages);
        })->otherwise(function (InspectionFailed $exception) {
            // Handle abort state.
            dump("No result");
        })->done();
    });

    $dusk->stop();
}

searchPackagist('dusk-crawler');

Dusk::closeAll();