lexxtor / easy-php-crawler
此包最新版本(dev-master)没有提供许可证信息。
简单而灵活的URL爬虫。
dev-master / 0.0.x-dev
2017-02-20 10:16 UTC
Requires
- php: >=5.4
This package is not auto-updated.
Last update: 2024-09-28 20:25:27 UTC
README
这是一个简单而灵活的爬虫,用于解析URL和加载内容。
使用示例
<?php use Lexxtor\EasyPhpCrawler\EasyPhpCrawler; require 'EasyPhpCrawler.php'; EasyPhpCrawler::go('http://news.yandex.ru', [ 'beforeLoadUrl' => function($url, $crawler) { echo $crawler->currentUrlIndex . '/' . $crawler->getQueueSize() . " $url "; }, 'afterLoadUrlSuccess' => function($url, $content, $crawler) { echo 'loaded: ' . strlen($content) . "\n"; }, 'afterLoadUrlFail' => static function($url, $errorMessage, $crawler) { echo 'Error: ' . $errorMessage . "\n"; }, 'allowUrlRules' => [ '/\/\/news.yandex.ru\//', ], 'denyUrlRules' => [ '/search/', '/\/$/', '/maps/', '/themes/', '/\?redircnt=/', ], ]);