t1gor/robots-txt-parser

PHP类,用于根据Google、Yandex、W3C和The Web Robots Pages规范解析robots.txt规则。

v0.2.5 2020-09-12 12:21 UTC

README

Build Status Code Climate Test Coverage License Total Downloads

PHP类,用于根据Google、Yandex、W3C和The Web Robots Pages规范解析robots.txt规则。

支持的规范完整列表(以及尚未支持的规范)可在我们的Wiki中找到。

支持的指令

  • User-agent
  • Allow
  • Disallow
  • Sitemap
  • Host
  • Cache-delay
  • Clean-param
  • Crawl-delay
  • Request-rate (进行中)
  • Visit-time (进行中)

安装

该库可通过Composer包安装。要使用Composer安装,请在您的composer.json文件中添加需求,如下所示

composer require t1gor/robots-txt-parser

您可以在此处了解更多关于Composer的信息。

使用示例

创建解析器实例
use t1gor\RobotsTxtParser\RobotsTxtParser;

# from string
$parser = new RobotsTxtParser("User-agent: * \nDisallow: /");

# from local file
$parser = new RobotsTxtParser(fopen('some/robots.txt'));

# or a remote one (make sure it's allowed in your php.ini)
# even FTP should work (but this is not confirmed)
$parser = new RobotsTxtParser(fopen('http://example.com/robots.txt'));
记录解析过程

我们正在实现来自PSRLoggerAwareInterface,因此它应该可以与支持该标准的任何记录器无缝工作。请参见以下使用Telegram机器人的Monolog示例。

use Monolog\Handler\TelegramBotHandler;
use Monolog\Logger;
use PHPUnit\Framework\TestCase;
use Psr\Log\LogLevel;
use t1gor\RobotsTxtParser\RobotsTxtParser;

$monologLogger = new Logger('robot.txt-parser');
$monologLogger->setHandler(new TelegramBotHandler('api-key', 'channel'));

$parser = new RobotsTxtParser(fopen('some/robots.txt'));
$parser->setLogger($monologLogger);

我们的大部分日志条目都是LogLevel::DEBUG,但也可能有适当的LogLevel::WARNINGS

解析非UTF-8编码的文件
use t1gor\RobotsTxtParser\RobotsTxtParser;

/** @see EncodingTest for more details */
$parser = new RobotsTxtParser(fopen('market-yandex-Windows-1251.txt', 'r'), 'Windows-1251');

公共API

更多代码示例可在测试文件夹中找到。

一些有用的链接和材料

贡献

首先,感谢您的兴趣和帮助意愿!如果您发现了一个问题并且知道如何修复它,请向dev分支提交一个pull request。请务必注意以下几点

  • 您的修复问题应该有测试覆盖(我们使用phpUnit)
  • 请留意code climate的建议。它在某种程度上有助于使事情更简单,至少看起来是这样的 :)
  • 遵循编码标准也将非常受欢迎(4个制表符作为缩进,camelCase等。)

如果您能分享使用该库的项目链接,我将不胜感激。

许可证

The MIT License

Copyright (c) 2013 Igor Timoshenkov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.