lexxtor / easy-php-crawler

此包最新版本(dev-master)没有提供许可证信息。

简单而灵活的URL爬虫。

dev-master / 0.0.x-dev 2017-02-20 10:16 UTC

This package is not auto-updated.

Last update: 2024-09-28 20:25:27 UTC


README

这是一个简单而灵活的爬虫,用于解析URL和加载内容。

使用示例

<?php

use Lexxtor\EasyPhpCrawler\EasyPhpCrawler;

require 'EasyPhpCrawler.php';

EasyPhpCrawler::go('http://news.yandex.ru', [
    'beforeLoadUrl' => function($url, $crawler) {
        echo $crawler->currentUrlIndex . '/' . $crawler->getQueueSize() . "  $url  ";
    },
    'afterLoadUrlSuccess' => function($url, $content, $crawler) {
        echo 'loaded: ' . strlen($content) . "\n";
    },
    'afterLoadUrlFail' => static function($url, $errorMessage, $crawler) {
        echo 'Error: ' . $errorMessage . "\n";
    },
    'allowUrlRules' => [
        '/\/\/news.yandex.ru\//',
    ],
    'denyUrlRules' => [
        '/search/',
        '/\/$/',
        '/maps/',
        '/themes/',
        '/\?redircnt=/',
    ],
]);