sukohi/search-bot

Laravel 包用于爬取网站。

1.0.5 2017-02-15 09:30 UTC

This package is not auto-updated.

Last update: 2024-09-14 20:36:31 UTC


README

Laravel 包用于爬取网站。(Laravel 5+)

需求

安装

执行以下命令。

composer require sukohi/search-bot:1.*

在 app.php 中设置服务提供者

'providers' => [
    ...Others...,
    Sukohi\SearchBot\SearchBotServiceProvider::class,
    Sukohi\LaravelAbsoluteUrl\LaravelAbsoluteUrlServiceProvider::class, 
]

同时别名为

'aliases' => [
    ...Others...,
    'LaravelAbsoluteUrl' => Sukohi\LaravelAbsoluteUrl\Facades\LaravelAbsoluteUrl::class,
    'SearchBot' => Sukohi\SearchBot\Facades\SearchBot::class,
]

然后执行以下命令。

php artisan vendor:publish
php artisan migrate

现在你有 config/search_bot.php,可以设置域名限制。

配置

return [

    'main' => '*',
    'yahoo' => ['yahoo.com', 'www.yahoo.com'],
    'reddit' => ['www.reddit.com']

];
  • 如果不需要设置限制,设置 *

用法

$starting_url = 'http://yahoo.com';
$options = [
    'type' => 'main', // $type is optional.(Default: main),
    'url_deletion' => true  // Default: true
];
$result = \SearchBot::request($starting_url, $options);

if($result->exists()) {

    // Symfony\Component\BrowserKit\Response
    // See http://api.symfony.com/2.3/Symfony/Component/BrowserKit/Response.html
    $response = $result->response();

    // Symfony\Component\DomCrawler/Crawler
    // See http://api.symfony.com/2.3/Symfony/Component/DomCrawler/Crawler.html
    $crawler = $result->crawler();

    $result->links(function($url, $text){

        // All links including URL & text will come here.

    });

    $result->queues(function($crawler_queue, $url, $text){

        // All links that do not exist in DB will come here.
        // $crawler_queue has already type and url.
        $crawler_queue->save();

    });

} else {

    $e = $result->exception();
    echo $e->getMessage();
    $type = $result->type();
    $url = $result->url();

}

选项

  • 类型

    类型是字符串,你可以自由决定。
    默认是 main

  • url_deletion

    如果为真,访问的 URL 将从数据库中删除。
    默认是 true

许可证

此软件包根据 MIT 许可证授权。
版权所有 2017 Sukohi Kuhoh