uatthaphon/g-crawler

一个简单的PHP网页爬虫,封装了Guzzle和DomCrawler

dev-master 2019-06-29 05:32 UTC

This package is auto-updated.

Last update: 2024-09-29 05:42:17 UTC


README

一个简单的PHP网页爬虫,封装了Guzzle和DomCrawler

安装

将包依赖添加到您的项目中

composer require uatthaphon/g-crawler

用法

在您的PHP项目中

一旦GCrawler被包含到您的项目中,您可以通过简单的init将其添加到任何类中。

use GCrawler\GCrawler;


class Example {
    protected $_gCrawler;
    
    public function __construct()
    {
            $this->_gCrawler = new GCrawler($config);
    }
    
    public function run()
    {
            $crawler = $_gCrawler->crawler('https://www.example.com/');
            $text = $crawler->filter('div.here')
                ->each(function ($node) {
                        return $node->text();
                };
                
            return $text;
    }
    

或者使用配置进行初始化

use GCrawler\GCrawler;


class Example {
    protected $_gCrawler;
    
    public function __construct()
    {
            $config = [
                'headers' => [
                    'User-Agent' => 'testing/1.0',
                    'Accept' => 'application/json',
                    'X-Foo' => ['Bar', 'Baz'],
                ]
            ];
            $this->_gCrawler = new GCrawler($config);
    }
    
    public function run()
    {
            $crawler = $_gCrawler->crawler('https://www.example.com/');
            $text = $crawler->filter('div.here')
                ->each(function ($node) {
                        return $node->text();
                };
                
            return $text;
    }
    

许可证

g-crawler在MIT许可证下发布。