chrisshennan/page-analyzer

一个简单的、可扩展的PHP页面分析器

dev-master 2022-12-21 14:19 UTC

This package is auto-updated.

Last update: 2024-09-21 18:09:38 UTC


README

一个简单的、可扩展的PHP页面分析器。

分析HTML标记并提取常见的属性,例如

  • 元标签,包括OpenGraph标签、Twitter Cards等
  • JSON-LD
  • 苹果图标 / favicon
  • 规范引用
  • 响应头
  • RSS / XML源(链接替代)

通过添加自己的自定义分析器轻松扩展。

安装

composer require chrisshennan/page-analyzer dev-master

基本用法

分析实时页面

在创建PageAnalyzer时设置分析器列表。

$loader = require __DIR__.'/vendor/autoload.php';

$url ='https://shoutabout.it';

$analyzerFactory = new \Cas\PageAnalyzer\Factory\AnalyzerFactory();
$analyzerFactory->addAnalyzerReference('\Cas\PageAnalyzer\Analyzer\MetaData');
$analyzerFactory->addAnalyzerReference('\Cas\PageAnalyzer\Analyzer\Logo');

$analyzerManager = $analyzerFactory->createManager();
$analysisCollection = $analyzerManager->analyze($url)->getAnalysis();

$data = [];

foreach ($analysisCollection as $analyzer => $analysis) {
    $data[$analyzer] = $analysis->getData();
}

...

自定义分析器

创建新的分析器类

namespace App\Analyzer;

use Cas\PageAnalyzer\Analyzer\BaseAnalyzer;

class MyCustomAnalyzer extends BaseAnalyzer
{
    public function analyze(string $content) : array
    {
        ...
    }
}

将分析器添加到分析器列表中

$loader = require __DIR__.'/vendor/autoload.php';

$url ='https://shoutabout.it';

$analyzerFactory = new \Cas\PageAnalyzer\Factory\AnalyzerFactory();
$analyzerFactory->addAnalyzerReference('\Cas\PageAnalyzer\Analyzer\MetaData');
$analyzerFactory->addAnalyzerReference('\Cas\PageAnalyzer\Analyzer\Logo');
$analyzerFactory->addAnalyzerReference('\App\Analyzer\MyCustomAnalyzer');

$analyzerManager = $analyzerFactory->createManager();
$analysisCollection = $analyzerManager->analyze($url)->getAnalysis();

$data = [];

foreach ($analysisCollection as $analyzer => $analysis) {
    $data[$analyzer] = $analysis->getData();
}

...