offdev/gpp

围绕guzzle的一个包装器,提供中间件功能、URL枚举能力以及用于利用这些功能的爬虫

1.0.0 2018-10-06 15:35 UTC

This package is auto-updated.

Last update: 2024-09-20 21:47:39 UTC


README

Latest Stable Version Minimum PHP Version Build Status License

需求

安装

$ composer require offdev/gpp

使用

基本中间件使用

中间件处理传入的服务器响应,以便进一步操作。如果它无法直接操作响应,它可以委托给提供的响应处理器来执行此操作。

<?php

use GuzzleHttp\Client as GuzzleClient;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Response;
use Offdev\Gpp\Client;
use Offdev\Gpp\Http\MiddlewareInterface;
use Offdev\Gpp\Http\ResponseHandlerInterface;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

class DirectoryLister implements MiddlewareInterface
{
    public function process(
        RequestInterface $originalRequest,
        ResponseInterface $response,
        ResponseHandlerInterface $responseHandler
    ): ResponseInterface {
        $content = (string)$response->getBody();
        if (preg_match_all('/href="\/articles\/\d+\/([^"]+)"/m', $content, $matches, PREG_SET_ORDER)) {
            $result = [];
            foreach ($matches as $match) {
                $result[] = $match[1];
            }
            return new Response(200, [], json_encode($result, JSON_PRETTY_PRINT));
        }

        return $responseHandler->handle($response);
    }
}

$client = new Client(new GuzzleClient(), [DirectoryLister::class]);
$result = $client->send(new Request('GET', 'https://www.worldhunger.org/articles/12/'));
var_dump((string)$result->getBody());

输出

$ php middleware.php
/tmp/gpp-examples/middleware.php:34:
string(300) "[
    "editorials\/",
    "global\/",
    "images\/",
    "us\/",
    "2012_archive.htm",
    "asia.htm",
    "books.htm",
    "davidson.htm---",
    "editorials.htm",
    "editorials2.htm",
    "global.htm",
    "newtemplate.htm",
    "phn.htm",
    "us.htm",
    "vanderslice_hungry_children.htm"
]"

URL枚举

请查看包含在本包中的 IntegerEnumerator 类,这是一个非常基础的例子,它将增加给定URL中找到的任何数字。

<?php

use GuzzleHttp\Psr7\Request;
use Offdev\Gpp\Utils\IntegerEnumerator;

$enumerator = new IntegerEnumerator();
$nextRequest = $enumerator->getNextRequest(
    new Request('GET', 'https://www.worldhunger.org/articles/12/')
);

var_dump((string)$nextRequest->getUri());

输出

$ php enumerator.php
/tmp/gpp-examples/enumerator.php:11:
string(40) "https://www.worldhunger.org/articles/13/"

爬虫使用

<?php

use GuzzleHttp\Client as GuzzleClient;
use GuzzleHttp\Psr7\Request;
use Offdev\Gpp\Client;
use Offdev\Gpp\Crawler;
use Offdev\Gpp\Utils\IntegerEnumerator;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

$client = new Client(new GuzzleClient(['exceptions' => false]));
$crawler = new Crawler($client, new IntegerEnumerator());
$crawler->crawl(
    new Request('GET', 'https://www.worldhunger.org/articles/15/'),
    5, // time between each request, in seconds
    function ( // callback function, to control the crawler workflow
        RequestInterface $originalRequest,
        ResponseInterface $response
    ) {
        echo $response->getStatusCode().' : '.(string)$originalRequest->getUri().PHP_EOL;
        if ($response->getStatusCode() !== 200) {
            return true; // cancel crawling
        }
        // go ahead, wait for the interval, and crawl the next result
        return false;
    }
);

输出

$ php crawler.php
200 : https://www.worldhunger.org/articles/15/
200 : https://www.worldhunger.org/articles/16/
404 : https://www.worldhunger.org/articles/17/

代码质量

谁还需要那些呢?

首先,确保通过运行 composer install 安装依赖项。您还需要确保启用xdebug,以便PHPUnit可以生成代码覆盖率。

PHP代码质量检查器

$ ./vendor/bin/phpcs --colors --standard=PSR2 -v src/ tests/
Creating file list... DONE (13 files in queue)
Changing into directory /Users/pascal/devel/gpp/src
Processing Crawler.php [PHP => 419 tokens in 71 lines]... DONE in 25ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Utils
Processing RequestEnumeratorInterface.php [PHP => 201 tokens in 40 lines]... DONE in 16ms (0 errors, 0 warnings)
Processing IntegerEnumerator.php [PHP => 360 tokens in 51 lines]... DONE in 25ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http
Processing MiddlewareInterface.php [PHP => 225 tokens in 47 lines]... DONE in 16ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http/Exceptions
Processing PatternException.php [PHP => 91 tokens in 21 lines]... DONE in 6ms (0 errors, 0 warnings)
Processing InvalidArgumentException.php [PHP => 91 tokens in 21 lines]... DONE in 6ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http
Processing ResponseHandlerInterface.php [PHP => 169 tokens in 37 lines]... DONE in 13ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src
Processing Client.php [PHP => 803 tokens in 129 lines]... DONE in 55ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests
Processing PassthroughModifierMiddleware.php [PHP => 395 tokens in 63 lines]... DONE in 26ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests/Utils
Processing IntegerEnumeratorTest.php [PHP => 320 tokens in 50 lines]... DONE in 14ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests
Processing CrawlerTest.php [PHP => 408 tokens in 52 lines]... DONE in 26ms (0 errors, 0 warnings)
Processing DirectResponseMiddleware.php [PHP => 385 tokens in 62 lines]... DONE in 28ms (0 errors, 0 warnings)
Processing ClientTest.php [PHP => 1159 tokens in 121 lines]... DONE in 92ms (0 errors, 0 warnings)

PHPUnit

$ ./vendor/bin/phpunit
PHPUnit 7.3.5 by Sebastian Bergmann and contributors.

.........                                                           9 / 9 (100%)

Time: 1.39 seconds, Memory: 6.00MB

OK (9 tests, 15 assertions)

Generating code coverage report in HTML format ... done


Code Coverage Report:
  2018-10-06 12:13:13

 Summary:
  Classes: 100.00% (3/3)
  Methods: 100.00% (7/7)
  Lines:   100.00% (44/44)

\Offdev\Gpp::Offdev\Gpp\Client
  Methods: 100.00% ( 4/ 4)   Lines: 100.00% ( 28/ 28)
\Offdev\Gpp::Offdev\Gpp\Crawler
  Methods: 100.00% ( 2/ 2)   Lines: 100.00% (  9/  9)
\Offdev\Gpp\Utils::Offdev\Gpp\Utils\IntegerEnumerator
  Methods: 100.00% ( 1/ 1)   Lines: 100.00% (  7/  7)

Infection

$ ./vendor/bin/infection
You are running Infection with xdebug enabled.
    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

Running initial test suite...

PHPUnit version: 7.3.5

   14 [============================]  1 sec

Generate mutants...

Processing source code files: 8/8
Creating mutated files and processes: 13/13
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out

...........E.                                        (13 / 13)

13 mutations were generated:
      12 mutants were killed
       0 mutants were not covered by tests
       0 covered mutants were not detected
       1 errors were encountered
       0 time outs were encountered

Metrics:
         Mutation Score Indicator (MSI): 100%
         Mutation Code Coverage: 100%
         Covered Code MSI: 100%

Please note that some mutants will inevitably be harmless (i.e. false positives).
Dashboard report has not been sent: it is not a Travis CI

Time: 6s. Memory: 10.00MB

许可协议

Apache-2.0