offdev / gpp
围绕guzzle的一个包装器,提供中间件功能、URL枚举能力以及用于利用这些功能的爬虫
1.0.0
2018-10-06 15:35 UTC
Requires
- php: >=7.1
- guzzlehttp/guzzle: ^6.3
- psr/http-message: ^1.0
Requires (Dev)
- infection/infection: ^0.10.5
- phpunit/phpunit: ^7.3
- squizlabs/php_codesniffer: ^3.3
This package is auto-updated.
Last update: 2024-09-20 21:47:39 UTC
README
需求
- PHP >= 7.1
- Composer
- Guzzle
安装
$ composer require offdev/gpp
使用
基本中间件使用
中间件处理传入的服务器响应,以便进一步操作。如果它无法直接操作响应,它可以委托给提供的响应处理器来执行此操作。
<?php use GuzzleHttp\Client as GuzzleClient; use GuzzleHttp\Psr7\Request; use GuzzleHttp\Psr7\Response; use Offdev\Gpp\Client; use Offdev\Gpp\Http\MiddlewareInterface; use Offdev\Gpp\Http\ResponseHandlerInterface; use Psr\Http\Message\RequestInterface; use Psr\Http\Message\ResponseInterface; class DirectoryLister implements MiddlewareInterface { public function process( RequestInterface $originalRequest, ResponseInterface $response, ResponseHandlerInterface $responseHandler ): ResponseInterface { $content = (string)$response->getBody(); if (preg_match_all('/href="\/articles\/\d+\/([^"]+)"/m', $content, $matches, PREG_SET_ORDER)) { $result = []; foreach ($matches as $match) { $result[] = $match[1]; } return new Response(200, [], json_encode($result, JSON_PRETTY_PRINT)); } return $responseHandler->handle($response); } } $client = new Client(new GuzzleClient(), [DirectoryLister::class]); $result = $client->send(new Request('GET', 'https://www.worldhunger.org/articles/12/')); var_dump((string)$result->getBody());
输出
$ php middleware.php
/tmp/gpp-examples/middleware.php:34:
string(300) "[
"editorials\/",
"global\/",
"images\/",
"us\/",
"2012_archive.htm",
"asia.htm",
"books.htm",
"davidson.htm---",
"editorials.htm",
"editorials2.htm",
"global.htm",
"newtemplate.htm",
"phn.htm",
"us.htm",
"vanderslice_hungry_children.htm"
]"
URL枚举
请查看包含在本包中的 IntegerEnumerator 类,这是一个非常基础的例子,它将增加给定URL中找到的任何数字。
<?php use GuzzleHttp\Psr7\Request; use Offdev\Gpp\Utils\IntegerEnumerator; $enumerator = new IntegerEnumerator(); $nextRequest = $enumerator->getNextRequest( new Request('GET', 'https://www.worldhunger.org/articles/12/') ); var_dump((string)$nextRequest->getUri());
输出
$ php enumerator.php
/tmp/gpp-examples/enumerator.php:11:
string(40) "https://www.worldhunger.org/articles/13/"
爬虫使用
<?php use GuzzleHttp\Client as GuzzleClient; use GuzzleHttp\Psr7\Request; use Offdev\Gpp\Client; use Offdev\Gpp\Crawler; use Offdev\Gpp\Utils\IntegerEnumerator; use Psr\Http\Message\RequestInterface; use Psr\Http\Message\ResponseInterface; $client = new Client(new GuzzleClient(['exceptions' => false])); $crawler = new Crawler($client, new IntegerEnumerator()); $crawler->crawl( new Request('GET', 'https://www.worldhunger.org/articles/15/'), 5, // time between each request, in seconds function ( // callback function, to control the crawler workflow RequestInterface $originalRequest, ResponseInterface $response ) { echo $response->getStatusCode().' : '.(string)$originalRequest->getUri().PHP_EOL; if ($response->getStatusCode() !== 200) { return true; // cancel crawling } // go ahead, wait for the interval, and crawl the next result return false; } );
输出
$ php crawler.php
200 : https://www.worldhunger.org/articles/15/
200 : https://www.worldhunger.org/articles/16/
404 : https://www.worldhunger.org/articles/17/
代码质量
谁还需要那些呢?
首先,确保通过运行 composer install
安装依赖项。您还需要确保启用xdebug,以便PHPUnit可以生成代码覆盖率。
PHP代码质量检查器
$ ./vendor/bin/phpcs --colors --standard=PSR2 -v src/ tests/
Creating file list... DONE (13 files in queue)
Changing into directory /Users/pascal/devel/gpp/src
Processing Crawler.php [PHP => 419 tokens in 71 lines]... DONE in 25ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Utils
Processing RequestEnumeratorInterface.php [PHP => 201 tokens in 40 lines]... DONE in 16ms (0 errors, 0 warnings)
Processing IntegerEnumerator.php [PHP => 360 tokens in 51 lines]... DONE in 25ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http
Processing MiddlewareInterface.php [PHP => 225 tokens in 47 lines]... DONE in 16ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http/Exceptions
Processing PatternException.php [PHP => 91 tokens in 21 lines]... DONE in 6ms (0 errors, 0 warnings)
Processing InvalidArgumentException.php [PHP => 91 tokens in 21 lines]... DONE in 6ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src/Http
Processing ResponseHandlerInterface.php [PHP => 169 tokens in 37 lines]... DONE in 13ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/src
Processing Client.php [PHP => 803 tokens in 129 lines]... DONE in 55ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests
Processing PassthroughModifierMiddleware.php [PHP => 395 tokens in 63 lines]... DONE in 26ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests/Utils
Processing IntegerEnumeratorTest.php [PHP => 320 tokens in 50 lines]... DONE in 14ms (0 errors, 0 warnings)
Changing into directory /Users/pascal/devel/gpp/tests
Processing CrawlerTest.php [PHP => 408 tokens in 52 lines]... DONE in 26ms (0 errors, 0 warnings)
Processing DirectResponseMiddleware.php [PHP => 385 tokens in 62 lines]... DONE in 28ms (0 errors, 0 warnings)
Processing ClientTest.php [PHP => 1159 tokens in 121 lines]... DONE in 92ms (0 errors, 0 warnings)
PHPUnit
$ ./vendor/bin/phpunit
PHPUnit 7.3.5 by Sebastian Bergmann and contributors.
......... 9 / 9 (100%)
Time: 1.39 seconds, Memory: 6.00MB
OK (9 tests, 15 assertions)
Generating code coverage report in HTML format ... done
Code Coverage Report:
2018-10-06 12:13:13
Summary:
Classes: 100.00% (3/3)
Methods: 100.00% (7/7)
Lines: 100.00% (44/44)
\Offdev\Gpp::Offdev\Gpp\Client
Methods: 100.00% ( 4/ 4) Lines: 100.00% ( 28/ 28)
\Offdev\Gpp::Offdev\Gpp\Crawler
Methods: 100.00% ( 2/ 2) Lines: 100.00% ( 9/ 9)
\Offdev\Gpp\Utils::Offdev\Gpp\Utils\IntegerEnumerator
Methods: 100.00% ( 1/ 1) Lines: 100.00% ( 7/ 7)
Infection
$ ./vendor/bin/infection
You are running Infection with xdebug enabled.
____ ____ __ _
/ _/___ / __/__ _____/ /_(_)___ ____
/ // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
_/ // / / / __/ __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/ \___/\___/\__/_/\____/_/ /_/
Running initial test suite...
PHPUnit version: 7.3.5
14 [============================] 1 sec
Generate mutants...
Processing source code files: 8/8
Creating mutated files and processes: 13/13
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out
...........E. (13 / 13)
13 mutations were generated:
12 mutants were killed
0 mutants were not covered by tests
0 covered mutants were not detected
1 errors were encountered
0 time outs were encountered
Metrics:
Mutation Score Indicator (MSI): 100%
Mutation Code Coverage: 100%
Covered Code MSI: 100%
Please note that some mutants will inevitably be harmless (i.e. false positives).
Dashboard report has not been sent: it is not a Travis CI
Time: 6s. Memory: 10.00MB