pieweb/url-harvester

从URL或其源代码中采集统计信息和元数据(面向SEO)。

0.0.31 2021-11-07 19:50 UTC

README

Open Source Package

URL元数据采集器

Latest Version Software License GitHub Tests Action Status Quality Score Code Coverage Type Coverage Total Downloads

从URL或其源代码中采集统计信息和元数据(面向SEO)。

Seo Pocket CrawlerGitHub上的源代码)中实现。

安装

通过Packagist

$ composer require piedweb/url-harvester

使用方法

采集方法

use \PiedWeb\UrlHarvester\Harvest;
use \PiedWeb\UrlHarvester\Link;

$url = 'https://piedweb.com';

Harvest::fromUrl($url)
    ->getResponse()->getInfo('total_time') // load time
    ->getResponse()->getInfo('size_download')
    ->getResponse()->getStatusCode()
    ->getResponse()->getContentType()
    ->getRes...

    ->getTag('h1') // @return first tag content (could be html)
    ->getUniqueTag('h1') // @return first tag content in utf8 (could contain html)
    ->getMeta('description') // @return string from content attribute or NULL
    ->getCanonical() // @return string|NULL
    ->isCanonicalCorrect() // @return bool
    ->getRatioTxtCode() // @return int
    ->getTextAnalysis() // @return \PiedWeb\TextAnalyzer\Analysis
    ->getKws() // @return 10 more used words
    ->getBreadCrumb()
    ->indexable($userAgent = 'googlebot') // @return int corresponding to a const from Indexable

    ->getLinks()
    ->getLinks(Link::LINK_SELF)
    ->getLinks(Link::LINK_INTERNAL)
    ->getLinks(Link::LINK_SUB)
    ->getLinks(Link::LINK_EXTERNAL)
    ->getLinkedRessources() // Return an array with all attributes containing a href or a src property
    ->mayFollow() // check headers and meta and return bool

    ->getDomain()
    ->getBaseUrl()

    ->getRobotsTxt() // @return \Spatie\Robots\RobotsTxt or empty string
    ->setRobotsTxt($content) // @param string or RobotsTxt

测试

$ composer test

贡献

请参阅贡献指南

鸣谢

许可协议

MIT许可协议(MIT)。请参阅许可文件以获取更多信息。