mjorgens / web-crawler
一个PHP网络爬虫库
V1.0.3
2021-02-15 17:22 UTC
Requires
- php: ^7.2
- guzzlehttp/guzzle: ^6.0 || ^7.0
- guzzlehttp/psr7: ^1.0
- illuminate/database: ^6.20.15 || ^7.30.4 || ^8.25.0
- symfony/dom-crawler: ^4.0 || ^5.0
Requires (Dev)
- phpunit/phpunit: ^8.0 || ^9.0
- squizlabs/php_codesniffer: ^3.5
This package is auto-updated.
Last update: 2024-09-11 02:21:44 UTC
README
这是一个PHP库,它接受一个起始URL,然后解析页面HTML并提取URL。然后它跟随URL并解析这些页面,直到达到最大URL数量。
要求
安装
推荐通过Composer安装此库。
composer require mjorgens/web-crawler
使用
$repository = new \Mjorgens\Crawler\CrawledRepository\CrawledMemoryRepository(); // The collection of pages $url = new Uri('https://example.com'); // Starting url $maxUrls = 5; // Max number of urls to crawl Crawler::create() ->setRepository($repository) ->setMaxCrawl($maxUrls) ->startCrawling($url); // Start the crawler foreach ($repository as $page){ echo $page->url; echo $page->html; }