README

URL元数据采集器

从URL或其源代码中采集统计信息和元数据（面向SEO）。

安装

$ composer require piedweb/url-harvester

使用方法

采集方法

use \PiedWeb\UrlHarvester\Harvest;
use \PiedWeb\UrlHarvester\Link;

$url = 'https://piedweb.com';

Harvest::fromUrl($url)
    ->getResponse()->getInfo('total_time') // load time
    ->getResponse()->getInfo('size_download')
    ->getResponse()->getStatusCode()
    ->getResponse()->getContentType()
    ->getRes...

    ->getTag('h1') // @return first tag content (could be html)
    ->getUniqueTag('h1') // @return first tag content in utf8 (could contain html)
    ->getMeta('description') // @return string from content attribute or NULL
    ->getCanonical() // @return string|NULL
    ->isCanonicalCorrect() // @return bool
    ->getRatioTxtCode() // @return int
    ->getTextAnalysis() // @return \PiedWeb\TextAnalyzer\Analysis
    ->getKws() // @return 10 more used words
    ->getBreadCrumb()
    ->indexable($userAgent = 'googlebot') // @return int corresponding to a const from Indexable

    ->getLinks()
    ->getLinks(Link::LINK_SELF)
    ->getLinks(Link::LINK_INTERNAL)
    ->getLinks(Link::LINK_SUB)
    ->getLinks(Link::LINK_EXTERNAL)
    ->getLinkedRessources() // Return an array with all attributes containing a href or a src property
    ->mayFollow() // check headers and meta and return bool

    ->getDomain()
    ->getBaseUrl()

    ->getRobotsTxt() // @return \Spatie\Robots\RobotsTxt or empty string
    ->setRobotsTxt($content) // @param string or RobotsTxt

测试

$ composer test

pieweb / url-harvester

维护者

详细信息

README

URL元数据采集器

安装

使用方法

测试

贡献

鸣谢

许可协议