bee4/robots.txt

此软件包已被废弃且不再维护。未建议替代软件包。

Robots.txt 解析器和匹配器

v2.0.3 2016-02-24 22:36 UTC

This package is auto-updated.

Last update: 2023-02-22 15:51:56 UTC


README

Build Status Scrutinizer Code Quality Code Coverage SensiolabInsight

License

此库允许解析 Robots.txt 文件并根据定义的规则检查 URL 状态。它遵循以下 RFC 草案中定义的规则:http://www.robotstxt.org/norobots-rfc.txt

安装

Latest Stable Version Total Downloads

可以使用 Composer 安装此项目。在 composer.json 中添加以下内容

{
    "require": {
        "bee4/robots.txt": "~2.0"
    }
}

或运行以下命令

composer require bee4/robots.txt:~2.0

使用方法

<?php

use Bee4\RobotsTxt\ContentFactory;
use Bee4\RobotsTxt\Parser;

// Extract content from URL
$content = ContentFactory::build("https://httpbin.org/robots.txt");

// or directly from robots.txt content
$content = new Content("
User-agent: *
Allow: /

User-agent: google-bot
Disallow: /forbidden-directory
");

// Then you must parse the content
$rules = Parser::parse($content);

//or with a reusable Parser
$parser = new Parser();
$rules = $parser->analyze($content);

//Content can also be parsed directly as string
$rules = Parser::parse('User-Agent: Bing
Disallow: /downloads');

// You can use the match method to check if an url is allowed for a give user-agent...
$rules->match('Google-Bot v01', '/an-awesome-url');      // true
$rules->match('google-bot v01', '/forbidden-directory'); // false

// ...or get the applicable rule for a user-agent and match
$rule = $rules->get('*');
$result = $rule->match('/'); // true
$result = $rule->match('/forbidden-directory'); // true