bee4 / robots.txt
此软件包已被废弃且不再维护。未建议替代软件包。
Robots.txt 解析器和匹配器
v2.0.3
2016-02-24 22:36 UTC
Requires
- php: >=5.4
- ext-pcre: *
Requires (Dev)
- atoum/atoum: ~2.5
- hoa/console: ~3.0
- squizlabs/php_codesniffer: ~2.5
README
此库允许解析 Robots.txt 文件并根据定义的规则检查 URL 状态。它遵循以下 RFC 草案中定义的规则:http://www.robotstxt.org/norobots-rfc.txt
安装
可以使用 Composer 安装此项目。在 composer.json 中添加以下内容
{ "require": { "bee4/robots.txt": "~2.0" } }
或运行以下命令
composer require bee4/robots.txt:~2.0
使用方法
<?php use Bee4\RobotsTxt\ContentFactory; use Bee4\RobotsTxt\Parser; // Extract content from URL $content = ContentFactory::build("https://httpbin.org/robots.txt"); // or directly from robots.txt content $content = new Content(" User-agent: * Allow: / User-agent: google-bot Disallow: /forbidden-directory "); // Then you must parse the content $rules = Parser::parse($content); //or with a reusable Parser $parser = new Parser(); $rules = $parser->analyze($content); //Content can also be parsed directly as string $rules = Parser::parse('User-Agent: Bing Disallow: /downloads'); // You can use the match method to check if an url is allowed for a give user-agent... $rules->match('Google-Bot v01', '/an-awesome-url'); // true $rules->match('google-bot v01', '/forbidden-directory'); // false // ...or get the applicable rule for a user-agent and match $rule = $rules->get('*'); $result = $rule->match('/'); // true $result = $rule->match('/forbidden-directory'); // true