README

PHP 库，用于获取任何网页信息（使用 oembed、opengraph、twitter-cards、HTML 抓取等）。它与任何网络服务（YouTube、Vimeo、Flickr、Instagram 等）兼容，并具有一些网站的适配器，如（archive.org、github、facebook 等）。

要求

PHP 7.4+
安装了 Curl 库
PSR-17 实现。默认情况下，这些库会自动检测

如果您需要 PHP 5.5-7.3 支持，请使用 3.x 版本

在线演示

运行 php -S localhost:8888 demo/index.php

视频教程

安装

此包可通过 Composer 以 embed/embed 的方式安装和自动加载。

$ composer require embed/embed

用法

use Embed\Embed;

$embed = new Embed();

//Load any url:
$info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE');

//Get content info

$info->title; //The page title
$info->description; //The page description
$info->url; //The canonical url
$info->keywords; //The page keywords

$info->image; //The thumbnail or main image

$info->code->html; //The code to embed the image, video, etc
$info->code->width; //The exact width of the embed code (if exists)
$info->code->height; //The exact height of the embed code (if exists)
$info->code->ratio; //The percentage of height / width to emulate the aspect ratio using paddings.

$info->authorName; //The resource author
$info->authorUrl; //The author url

$info->cms; //The cms used
$info->language; //The language of the page
$info->languages; //The alternative languages

$info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc)
$info->providerUrl; //The provider url
$info->icon; //The big icon of the site
$info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px)

$info->publishedTime; //The published time of the resource
$info->license; //The license url of the resource
$info->feeds; //The RSS/Atom feeds

并行多个请求

use Embed\Embed;

$embed = new Embed();

//Load multiple urls asynchronously:
$infos = $embed->getMulti(
    'https://www.youtube.com/watch?v=PP1xn5wHtxE',
    'https://twitter.com/carlosmeixidefl/status/1230894146220625933',
    'https://en.wikipedia.org/wiki/Tordoia',
);

foreach ($infos as $info) {
    echo $info->title;
}

文档

文档是存储页面 HTML 代码的对象。您可以使用它从 HTML 代码中提取额外信息

//Get the document object
$document = $info->getDocument();

$document->link('image_src'); //Returns the href of a <link>
$document->getDocument(); //Returns the DOMDocument instance
$html = (string) $document; //Returns the html code

$document->select('.//h1'); //Search

您可以使用 xpath 查询来选择特定元素。搜索始终返回一个 Embed\QueryResult 实例

//Search the A elements
$result = $document->select('.//a');

//Filter the results
$result->filter(fn ($node) => $node->getAttribute('href'));

$id = $result->str('id'); //Return the id of the first result as string
$text = $result->str(); //Return the content of the first result

$ids = $result->strAll('id'); //Return an array with the ids of all results as string
$texts = $result->strAll(); //Return an array with the content of all results as string

$tabindex = $result->int('tabindex'); //Return the tabindex attribute of the first result as integer
$number = $result->int(); //Return the content of the first result as integer

$href = $result->url('href'); //Return the href attribute of the first result as url (converts relative urls to absolutes)
$url = $result->url(); //Return the content of the first result as url

$node = $result->node(); //Return the first node found (DOMElement)
$nodes = $result->nodes(); //Return all nodes found

元数据

为了方便起见，对象 Metas 存储了 HTML 中所有 <meta> 元素的值，因此您可以更容易地获取这些值。每个元数据的键来自 name、property 或 itemprop 属性，值来自 content。

//Get the Metas object
$metas = $info->getMetas();

$metas->all(); //Return all values
$metas->get('og:title'); //Return a key value
$metas->str('og:title'); //Return the value as string (remove html tags)
$metas->html('og:description'); //Return the value as html
$metas->int('og:video:width'); //Return the value as integer
$metas->url('og:url'); //Return the value as full url (converts relative urls to absolutes)

OEmbed

除了 HTML 和元数据外，此库还使用 oEmbed 端点获取额外数据。您可以根据以下方式获取这些数据

//Get the oEmbed object
$oembed = $info->getOEmbed();

$oembed->all(); //Return all raw data
$oembed->get('title'); //Return a key value
$oembed->str('title'); //Return the value as string (remove html tags)
$oembed->html('html'); //Return the value as html
$oembed->int('width'); //Return the value as integer
$oembed->url('url'); //Return the value as full url (converts relative urls to absolutes)

也可以提供额外的 oEmbed 参数（如 Instagram 的 hidecaption）

$embed = new Embed();

$result = $embed->get('https://www.instagram.com/p/B_C0wheCa4V/');
$result->setSettings([
    'oembed:query_parameters' => ['hidecaption' => true]
]);
$oembed = $info->getOEmbed();

LinkedData

默认情况下可用的另一个 API，用于使用 JsonLD 架构提取信息。

//Get the linkedData object
$ld = $info->getLinkedData();

$ld->all(); //Return all data
$ld->get('name'); //Return a key value
$ld->str('name'); //Return the value as string (remove html tags)
$ld->html('description'); //Return the value as html
$ld->int('width'); //Return the value as integer
$ld->url('url'); //Return the value as full url (converts relative urls to absolutes)

其他 API

一些网站（如维基百科或 Archive.org）提供自定义 API，用于获取更可靠的数据。您可以使用 getApi() 方法获取 API 对象，但请注意，并非所有结果都具有此方法。API 对象具有与 oEmbed 相同的方法

//Get the API object
$api = $info->getApi();

$api->all(); //Return all raw data
$api->get('title'); //Return a key value
$api->str('title'); //Return the value as string (remove html tags)
$api->html('html'); //Return the value as html
$api->int('width'); //Return the value as integer
$api->url('url'); //Return the value as full url (converts relative urls to absolutes)

扩展 Embed

根据您的需求，您可能希望使用额外功能扩展此库或更改其执行某些操作的方式。

PSR

Embed 使用一些 PSR 标准以实现最大的互操作性

PSR-7 标准接口，用于表示 HTTP 请求、响应和 URI
PSR-17 标准工厂，用于创建 PSR-7 对象
PSR-18 标准接口，用于发送 HTTP 请求并返回响应

Embed 内置了一个兼容 PSR-18 的 CURL 客户端，但您需要安装一个 PSR-7 / PSR-17 库。在此处您可以查看流行的库列表，并且库可以自动检测 'laminas\diactoros', 'guzzleHttp\psr7', 'slim\psr7', 'nyholm\psr7' 和 'sunrise\http'（按此顺序）。如果您想使用不同的 PSR 实现，可以通过这种方式实现

use Embed\Embed;
use Embed\Http\Crawler;

$client = new CustomHttpClient();
$requestFactory = new CustomRequestFactory();
$uriFactory = new CustomUriFactory();

//The Crawler is responsible for perform http queries
$crawler = new Crawler($client, $requestFactory, $uriFactory);

//Create an embed instance passing the Crawler
$embed = new Embed($crawler);

适配器

有些网站有特殊需求：因为它们提供了允许提取更多信息（如维基百科或Archive.org）的公共 API，或者因为我们需要更改在此特定网站上提取数据的方式。对于所有这些情况，我们都有适配器，适配器是扩展默认类以提供额外功能的类。

在创建适配器之前，您需要了解 Embed 的工作原理：当您执行此代码时，您会得到一个 Extractor 类

//Get the Extractor with all info
$info = $embed->get($url);

//The extractor have document and oembed:
$document = $info->getDocument();
$oembed = $info->getOEmbed();

Extractor 类有许多 Detectors。每个检测器负责检测特定的信息。例如，有一个检测标题的检测器，还有检测描述、图像、代码等的检测器。

因此，适配器基本上是为特定网站创建的提取器。它还可以包含自定义检测器或 API。如果您查看 src/Adapters 文件夹，您可以查看所有适配器。

如果您创建了一个适配器，您还需要将其注册到 Embed，以便它知道在哪个网站需要使用。为此，有一个 ExtractorFactory 对象，它负责为每个网站实例化正确的提取器。

use Embed\Embed;

$embed = new Embed();

$factory = $embed->getExtractorFactory();

//Use this MySite adapter for mysite.com
$factory->addAdapter('mysite.com', MySite::class);

//Remove the adapter for pinterest.com, so it will use the default extractor
$factory->removeAdapter('pinterest.com');

//Change the default extractor
$factory->setDefault(CustomExtractor::class);

检测器

Embed 随带一些预定义的检测器，但您可能想更改或添加更多。只需创建一个扩展 Embed\Detectors\Detector 类的类，并在提取器工厂中注册它即可。例如

use Embed\Embed;
use Embed\Detectors\Detector;

class Robots extends Detector
{
    public function detect(): ?string
    {
        $response = $this->extractor->getResponse();
        $metas = $this->extractor->getMetas();

        return $response->getHeaderLine('x-robots-tag'),
            ?: $metas->str('robots');
    }
}

//Register the detector
$embed = new Embed();
$embed->getExtractorFactory()->addDetector('robots', Robots::class);

//Use it
$info = $embed->get('http://example.com');
$robots = $info->robots;

设置

如果您需要将设置传递给 CurlClient 以执行 http 查询

use Embed\Embed;
use Embed\Http\Crawler;
use Embed\Http\CurlClient;

$client = new CurlClient();
$client->setSettings([
    'cookies_path' => $cookies_path,
    'ignored_errors' => [18],
    'max_redirs' => 3,               // see CURLOPT_MAXREDIRS
    'connect_timeout' => 2,          // see CURLOPT_CONNECTTIMEOUT
    'timeout' => 2,                  // see CURLOPT_TIMEOUT
    'ssl_verify_host' => 2,          // see CURLOPT_SSL_VERIFYHOST
    'ssl_verify_peer' => 1,          // see CURLOPT_SSL_VERIFYPEER
    'follow_location' => true,       // see CURLOPT_FOLLOWLOCATION
    'user_agent' => 'Mozilla',       // see CURLOPT_USERAGENT
]);

$embed = new Embed(new Crawler($client));

如果您需要将设置传递给检测器，您可以将设置添加到 ExtractorFactory

use Embed\Embed;

$embed = new Embed();
$embed->setSettings([
    'oembed:query_parameters' => [],  //Extra parameters send to oembed
    'twitch:parent' => 'example.com', //Required to embed twitch videos as iframe
    'facebook:token' => '1234|5678',  //Required to embed content from Facebook
    'instagram:token' => '1234|5678', //Required to embed content from Instagram
    'twitter:token' => 'asdf',        //Improve the data from twitter
]);
$info = $embed->get($url);

注意：内置的检测器不需要设置。此功能仅用于方便，如果您创建了一个需要设置的特定检测器。

嵌入 / 嵌入

维护者

详细信息