nickmoline / robots-checker
类,用于检查URL是否被robots排除,使用所有可能的robots排除方法
v1.0.5
2018-04-03 21:27 UTC
Requires
- php: >=5.5.9
- ext-fileinfo: *
- ext-intl: *
- ext-mbstring: *
- league/uri: ^4.0 || ^5.0
- php-curl-class/php-curl-class: 3.5.*
- thesoftwarefanatics/php-html-parser: ^1.7
- tomverran/robots-txt-checker: ^1.14
Requires (Dev)
- friendsofphp/php-cs-fixer: ^1.13 || ^2.0
- phpspec/phpspec: ~2.0
- phpunit/phpunit: ~4.8
This package is auto-updated.
Last update: 2024-09-07 00:53:32 UTC
README
这些类允许您检查所有可以排除URL的方法。
类
您可以实例化以下类
NickMoline\Robots\RobotsTxt
: 检查对应url的robots.txt文件NickMoline\Robots\Status
: 检查可索引URL的HTTP状态码NickMoline\Robots\Header
: 检查HTTPX-Robots-Tag
HeaderNickMoline\Robots\Meta
: 检查<meta name="robots">
标签(以及特定于机器人的标签)NickMoline\Robots\All
: 包装类,将运行上述所有检查
示例用法
<?php require NickMoline\Robots\RobotsTxt; require NickMoline\Robots\Header as RobotsHeader; require NickMoline\Robots\All as RobotsAll; $checker = new RobotsTxt("http://www.example.com/test.html"); $allowed = $checker->verify(); // By default it checks Googlebot $allowed = $checker->setUserAgent("bingbot")->verify(); // Checks to see if blocked for bingbot by robots.txt file echo $checker->getReason(); // Get the reason the url is allowed or denied $checker2 = new RobotsHeader("http://www.example.com/test.html"); $allowed = $checker->verify(); // Same as above but will test the X-Robots-Tag HTTP headers $checkerAll = new RobotsAll("http://www.example.com/test.html"); $allowed = $checker->verify(); // This one runs all of the available tests