嵌入 / 嵌入
使用 oembed、opengraph 等获取页面信息的 PHP 库
Requires
- php: ^7.4|^8
- ext-curl: *
- ext-dom: *
- ext-json: *
- ext-mbstring: *
- composer/ca-bundle: ^1.0
- ml/json-ld: ^1.1
- oscarotero/html-parser: ^0.1.4
- psr/http-client: ^1.0
- psr/http-factory: ^1.0
- psr/http-message: ^1.0|^2.0
Requires (Dev)
- brick/varexporter: ^0.3.1
- friendsofphp/php-cs-fixer: ^2.0
- nyholm/psr7: ^1.2
- oscarotero/php-cs-fixer-config: ^1.0
- phpunit/phpunit: ^9.0
- symfony/css-selector: ^5.0
Suggests
- symfony/css-selector: If you want to get elements using css selectors
- dev-master
- v4.4.12
- v4.4.11
- v4.4.10
- v4.4.9
- v4.4.8
- v4.4.7
- v4.4.6
- v4.4.5
- v4.4.4
- v4.4.3
- v4.4.2
- v4.4.1
- v4.4.0
- v4.3.5
- v4.3.4
- v4.3.3
- v4.3.2
- v4.3.1
- v4.3.0
- v4.2.7
- v4.2.6
- v4.2.5
- v4.2.4
- v4.2.3
- v4.2.2
- v4.2.1
- v4.2.0
- v4.1.1
- v4.1.0
- v4.0.1
- v4.0.0
- v3.x-dev
- v3.4.18
- 3.4.17
- v3.4.16
- v3.4.15
- v3.4.14
- v3.4.13
- v3.4.12
- v3.4.11
- v3.4.10
- v3.4.9
- v3.4.8
- v3.4.7
- v3.4.6
- v3.4.5
- v3.4.4
- v3.4.3
- v3.4.2
- v3.4.1
- v3.4.0
- v3.3.9
- v3.3.8
- v3.3.7
- v3.3.6
- v3.3.5
- v3.3.4
- v3.3.3
- v3.3.2
- v3.3.1
- v3.3.0
- v3.2.2
- v3.2.1
- v3.2.0
- v3.1.0
- v3.0.8
- v3.0.7
- v3.0.6
- v3.0.5
- v3.0.4
- v3.0.3
- v3.0.2
- v3.0.1
- v3.0.0
- v2.x-dev
- v2.7.11
- v2.7.10
- 2.7.9
- v2.7.8
- v2.7.7
- v2.7.6
- v2.7.5
- v2.7.4
- v2.7.3
- v2.7.2
- v2.7.1
- v2.7.0
- v2.6.9
- v2.6.8
- v2.6.7
- v2.6.6
- v2.6.5
- v2.6.4
- v2.6.3
- v2.6.2
- v2.6.1
- v2.6.0
- v2.5.12
- v2.5.11
- v2.5.10
- v2.5.9
- v2.5.8
- v2.5.7
- v2.5.6
- v2.5.5
- v2.5.4
- v2.5.3
- v2.5.2
- 2.5.1
- v2.5.0
- v2.4.5
- v2.4.4
- v2.4.3
- v2.4.2
- v2.4.1
- v2.4.0
- v2.3.0
- v2.2.7
- v2.2.6
- v2.2.5
- v2.2.4
- v2.2.3
- v2.2.2
- v2.2.1
- v2.2
- v2.1.2
- v2.1.1
- v2.1.0
- v2.0.1
- v2.0
- v1.x-dev
- v1.8.4
- v1.8.3
- v1.8.2
- v1.8.1
- v1.8.0
- v1.7.0
- v1.6.0
- v1.5.9
- v1.5.8
- v1.5.7
- v1.5.6
- v1.5.5
- v1.5.4
- v1.5.3
- v1.5.2
- v1.5.1
- v1.5.0
- v1.4.6
- v1.4.5
- v1.4.4
- v1.4.3
- 1.4.2
- v1.4.1
- v1.4.0
- v1.3.8
- v1.3.7
- v1.3.6
- v1.3.5
- v1.3.4
- v1.3.3
- v1.3.2
- v1.3.1
- v1.3.0
- v1.2.1
- v1.2.0
- v1.1.1
- v1.1.0
- v1.0.2
- v1.0.1
This package is auto-updated.
Last update: 2024-09-04 10:05:14 UTC
README
PHP 库,用于获取任何网页信息(使用 oembed、opengraph、twitter-cards、HTML 抓取等)。它与任何网络服务(YouTube、Vimeo、Flickr、Instagram 等)兼容,并具有一些网站的适配器,如(archive.org、github、facebook 等)。
要求
- PHP 7.4+
- 安装了 Curl 库
- PSR-17 实现。默认情况下,这些库会自动检测
如果您需要 PHP 5.5-7.3 支持,请使用 3.x 版本
在线演示
运行 php -S localhost:8888 demo/index.php
视频教程
安装
此包可通过 Composer 以 embed/embed 的方式安装和自动加载。
$ composer require embed/embed
用法
use Embed\Embed; $embed = new Embed(); //Load any url: $info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE'); //Get content info $info->title; //The page title $info->description; //The page description $info->url; //The canonical url $info->keywords; //The page keywords $info->image; //The thumbnail or main image $info->code->html; //The code to embed the image, video, etc $info->code->width; //The exact width of the embed code (if exists) $info->code->height; //The exact height of the embed code (if exists) $info->code->ratio; //The percentage of height / width to emulate the aspect ratio using paddings. $info->authorName; //The resource author $info->authorUrl; //The author url $info->cms; //The cms used $info->language; //The language of the page $info->languages; //The alternative languages $info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc) $info->providerUrl; //The provider url $info->icon; //The big icon of the site $info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px) $info->publishedTime; //The published time of the resource $info->license; //The license url of the resource $info->feeds; //The RSS/Atom feeds
并行多个请求
use Embed\Embed; $embed = new Embed(); //Load multiple urls asynchronously: $infos = $embed->getMulti( 'https://www.youtube.com/watch?v=PP1xn5wHtxE', 'https://twitter.com/carlosmeixidefl/status/1230894146220625933', 'https://en.wikipedia.org/wiki/Tordoia', ); foreach ($infos as $info) { echo $info->title; }
文档
文档是存储页面 HTML 代码的对象。您可以使用它从 HTML 代码中提取额外信息
//Get the document object $document = $info->getDocument(); $document->link('image_src'); //Returns the href of a <link> $document->getDocument(); //Returns the DOMDocument instance $html = (string) $document; //Returns the html code $document->select('.//h1'); //Search
您可以使用 xpath 查询来选择特定元素。搜索始终返回一个 Embed\QueryResult
实例
//Search the A elements $result = $document->select('.//a'); //Filter the results $result->filter(fn ($node) => $node->getAttribute('href')); $id = $result->str('id'); //Return the id of the first result as string $text = $result->str(); //Return the content of the first result $ids = $result->strAll('id'); //Return an array with the ids of all results as string $texts = $result->strAll(); //Return an array with the content of all results as string $tabindex = $result->int('tabindex'); //Return the tabindex attribute of the first result as integer $number = $result->int(); //Return the content of the first result as integer $href = $result->url('href'); //Return the href attribute of the first result as url (converts relative urls to absolutes) $url = $result->url(); //Return the content of the first result as url $node = $result->node(); //Return the first node found (DOMElement) $nodes = $result->nodes(); //Return all nodes found
元数据
为了方便起见,对象 Metas
存储了 HTML 中所有 <meta>
元素的值,因此您可以更容易地获取这些值。每个元数据的键来自 name
、property
或 itemprop
属性,值来自 content
。
//Get the Metas object $metas = $info->getMetas(); $metas->all(); //Return all values $metas->get('og:title'); //Return a key value $metas->str('og:title'); //Return the value as string (remove html tags) $metas->html('og:description'); //Return the value as html $metas->int('og:video:width'); //Return the value as integer $metas->url('og:url'); //Return the value as full url (converts relative urls to absolutes)
OEmbed
除了 HTML 和元数据外,此库还使用 oEmbed 端点获取额外数据。您可以根据以下方式获取这些数据
//Get the oEmbed object $oembed = $info->getOEmbed(); $oembed->all(); //Return all raw data $oembed->get('title'); //Return a key value $oembed->str('title'); //Return the value as string (remove html tags) $oembed->html('html'); //Return the value as html $oembed->int('width'); //Return the value as integer $oembed->url('url'); //Return the value as full url (converts relative urls to absolutes)
也可以提供额外的 oEmbed 参数(如 Instagram 的 hidecaption
)
$embed = new Embed(); $result = $embed->get('https://www.instagram.com/p/B_C0wheCa4V/'); $result->setSettings([ 'oembed:query_parameters' => ['hidecaption' => true] ]); $oembed = $info->getOEmbed();
LinkedData
默认情况下可用的另一个 API,用于使用 JsonLD 架构提取信息。
//Get the linkedData object $ld = $info->getLinkedData(); $ld->all(); //Return all data $ld->get('name'); //Return a key value $ld->str('name'); //Return the value as string (remove html tags) $ld->html('description'); //Return the value as html $ld->int('width'); //Return the value as integer $ld->url('url'); //Return the value as full url (converts relative urls to absolutes)
其他 API
一些网站(如维基百科或 Archive.org)提供自定义 API,用于获取更可靠的数据。您可以使用 getApi()
方法获取 API 对象,但请注意,并非所有结果都具有此方法。API 对象具有与 oEmbed 相同的方法
//Get the API object $api = $info->getApi(); $api->all(); //Return all raw data $api->get('title'); //Return a key value $api->str('title'); //Return the value as string (remove html tags) $api->html('html'); //Return the value as html $api->int('width'); //Return the value as integer $api->url('url'); //Return the value as full url (converts relative urls to absolutes)
扩展 Embed
根据您的需求,您可能希望使用额外功能扩展此库或更改其执行某些操作的方式。
PSR
Embed 使用一些 PSR 标准以实现最大的互操作性
Embed 内置了一个兼容 PSR-18 的 CURL 客户端,但您需要安装一个 PSR-7 / PSR-17 库。 在此处您可以查看流行的库列表,并且库可以自动检测 'laminas\diactoros', 'guzzleHttp\psr7', 'slim\psr7', 'nyholm\psr7' 和 'sunrise\http'(按此顺序)。如果您想使用不同的 PSR 实现,可以通过这种方式实现
use Embed\Embed; use Embed\Http\Crawler; $client = new CustomHttpClient(); $requestFactory = new CustomRequestFactory(); $uriFactory = new CustomUriFactory(); //The Crawler is responsible for perform http queries $crawler = new Crawler($client, $requestFactory, $uriFactory); //Create an embed instance passing the Crawler $embed = new Embed($crawler);
适配器
有些网站有特殊需求:因为它们提供了允许提取更多信息(如维基百科或Archive.org)的公共 API,或者因为我们需要更改在此特定网站上提取数据的方式。对于所有这些情况,我们都有适配器,适配器是扩展默认类以提供额外功能的类。
在创建适配器之前,您需要了解 Embed 的工作原理:当您执行此代码时,您会得到一个 Extractor
类
//Get the Extractor with all info $info = $embed->get($url); //The extractor have document and oembed: $document = $info->getDocument(); $oembed = $info->getOEmbed();
Extractor
类有许多 Detectors
。每个检测器负责检测特定的信息。例如,有一个检测标题的检测器,还有检测描述、图像、代码等的检测器。
因此,适配器基本上是为特定网站创建的提取器。它还可以包含自定义检测器或 API。如果您查看 src/Adapters
文件夹,您可以查看所有适配器。
如果您创建了一个适配器,您还需要将其注册到 Embed,以便它知道在哪个网站需要使用。为此,有一个 ExtractorFactory
对象,它负责为每个网站实例化正确的提取器。
use Embed\Embed; $embed = new Embed(); $factory = $embed->getExtractorFactory(); //Use this MySite adapter for mysite.com $factory->addAdapter('mysite.com', MySite::class); //Remove the adapter for pinterest.com, so it will use the default extractor $factory->removeAdapter('pinterest.com'); //Change the default extractor $factory->setDefault(CustomExtractor::class);
检测器
Embed 随带一些预定义的检测器,但您可能想更改或添加更多。只需创建一个扩展 Embed\Detectors\Detector
类的类,并在提取器工厂中注册它即可。例如
use Embed\Embed; use Embed\Detectors\Detector; class Robots extends Detector { public function detect(): ?string { $response = $this->extractor->getResponse(); $metas = $this->extractor->getMetas(); return $response->getHeaderLine('x-robots-tag'), ?: $metas->str('robots'); } } //Register the detector $embed = new Embed(); $embed->getExtractorFactory()->addDetector('robots', Robots::class); //Use it $info = $embed->get('http://example.com'); $robots = $info->robots;
设置
如果您需要将设置传递给 CurlClient 以执行 http 查询
use Embed\Embed; use Embed\Http\Crawler; use Embed\Http\CurlClient; $client = new CurlClient(); $client->setSettings([ 'cookies_path' => $cookies_path, 'ignored_errors' => [18], 'max_redirs' => 3, // see CURLOPT_MAXREDIRS 'connect_timeout' => 2, // see CURLOPT_CONNECTTIMEOUT 'timeout' => 2, // see CURLOPT_TIMEOUT 'ssl_verify_host' => 2, // see CURLOPT_SSL_VERIFYHOST 'ssl_verify_peer' => 1, // see CURLOPT_SSL_VERIFYPEER 'follow_location' => true, // see CURLOPT_FOLLOWLOCATION 'user_agent' => 'Mozilla', // see CURLOPT_USERAGENT ]); $embed = new Embed(new Crawler($client));
如果您需要将设置传递给检测器,您可以将设置添加到 ExtractorFactory
use Embed\Embed; $embed = new Embed(); $embed->setSettings([ 'oembed:query_parameters' => [], //Extra parameters send to oembed 'twitch:parent' => 'example.com', //Required to embed twitch videos as iframe 'facebook:token' => '1234|5678', //Required to embed content from Facebook 'instagram:token' => '1234|5678', //Required to embed content from Instagram 'twitter:token' => 'asdf', //Improve the data from twitter ]); $info = $embed->get($url);
注意:内置的检测器不需要设置。此功能仅用于方便,如果您创建了一个需要设置的特定检测器。