pieweb / url-harvester
从URL或其源代码中采集统计信息和元数据(面向SEO)。
0.0.31
2021-11-07 19:50 UTC
Requires
- php: ^7.3|^8.0
- jeremykendall/php-domain-parser: ^6.1
- league/uri: ^6.5
- neitanod/forceutf8: ^2.0.4
- piedweb/curl: ^0.0.18
- piedweb/text-analyzer: ^0.0.4
- spatie/robots-txt: ^1.0.10|^2
- symfony/css-selector: ^5.2
- symfony/dom-crawler: ^5.2
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.2
- phpunit/phpunit: ^9.5
- symfony/var-dumper: ^5.3
- vimeo/psalm: ^4.4
README
URL元数据采集器
从URL或其源代码中采集统计信息和元数据(面向SEO)。
在Seo Pocket Crawler(GitHub上的源代码)中实现。
安装
$ composer require piedweb/url-harvester
使用方法
采集方法
use \PiedWeb\UrlHarvester\Harvest; use \PiedWeb\UrlHarvester\Link; $url = 'https://piedweb.com'; Harvest::fromUrl($url) ->getResponse()->getInfo('total_time') // load time ->getResponse()->getInfo('size_download') ->getResponse()->getStatusCode() ->getResponse()->getContentType() ->getRes... ->getTag('h1') // @return first tag content (could be html) ->getUniqueTag('h1') // @return first tag content in utf8 (could contain html) ->getMeta('description') // @return string from content attribute or NULL ->getCanonical() // @return string|NULL ->isCanonicalCorrect() // @return bool ->getRatioTxtCode() // @return int ->getTextAnalysis() // @return \PiedWeb\TextAnalyzer\Analysis ->getKws() // @return 10 more used words ->getBreadCrumb() ->indexable($userAgent = 'googlebot') // @return int corresponding to a const from Indexable ->getLinks() ->getLinks(Link::LINK_SELF) ->getLinks(Link::LINK_INTERNAL) ->getLinks(Link::LINK_SUB) ->getLinks(Link::LINK_EXTERNAL) ->getLinkedRessources() // Return an array with all attributes containing a href or a src property ->mayFollow() // check headers and meta and return bool ->getDomain() ->getBaseUrl() ->getRobotsTxt() // @return \Spatie\Robots\RobotsTxt or empty string ->setRobotsTxt($content) // @param string or RobotsTxt
测试
$ composer test
贡献
请参阅贡献指南
鸣谢
许可协议
MIT许可协议(MIT)。请参阅许可文件以获取更多信息。