katsana / dusk-crawler
使用 Laravel Dusk 的网络爬虫
v0.1.6
2020-10-21 13:19 UTC
Requires
- php: >=7.2
- laravel/dusk: ^5.11 || ^6.0
- orchestra/canvas-core: ^4.7 || ^5.0 || ^6.0
- orchestra/dusk-updater: ^1.2
- react/promise: ^2.7
- symfony/dom-crawler: ^4.3 || ^5.0
Requires (Dev)
- orchestra/canvas: ^4.5 || ^5.0 || ^6.0
- orchestra/testbench: ^4.5 || ^5.0 || ^6.0
README
Laravel Dusk 允许开发者运行浏览器自动化,但它缺乏根据浏览器接收到的响应进行导航的能力。如果您需要处理失败,您必须等待超时到期并处理通用异常。
Dusk Crawler 通过添加 inspectUsing()
方法来解决这个问题,允许开发者使用 ReactPHP 的 Promise 检查成功或失败状态。
安装
Dusk Crawler 可以通过 composer 安装
composer require "katsana/dusk-crawler"
用法
Dusk Crawler 只向 Laravel\Dusk\Browser
添加了两个新的宏
示例
假设您想爬取 Packagist 搜索某些包,并且输入是动态的。
use DuskCrawler\Dusk; use DuskCrawler\Inspector; use DuskCrawler\Exceptions\InspectionFailed; use Laravel\Dusk\Browser; function searchPackagist(string $packagist) { $dusk = new Dusk('search-packagist'); $dusk->headless()->disableGpu()->noSandbox(); $dusk->userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'); $dusk->start(); $dusk->browse(function ($browser) use ($packagist) { $browser->visit('https://packagist.org.cn/'); $promise = $browser->type('search_query[query]', $packagist, '{enter}') ->inspectUsing(15, function (Browser $browser, Inspector $inspector) { $searchList = $browser->resolver->findOrFail('.search-list'); if (! $searchList->isDisplayed() || $searchList->getText() == '') { // result not ready, just return false. return false; } if ($searchList->getText() == 'No packages found.') { return $inspector->abort('No packages found!'); } return $inspector->resolve(); }); $promise->then(function ($browser) { // Crawl the page on success. $packages = $browser->crawler() ->filter('div.package-item')->each(function ($div) { return $div->text(); }); dump($packages); })->otherwise(function (InspectionFailed $exception) { // Handle abort state. dump("No result"); })->done(); }); $dusk->stop(); } searchPackagist('dusk-crawler'); Dusk::closeAll();