jeankassio/sioner-metadata-extractor

Sioner Metadata Extractor 使用 Chromedriver 通过 Symfony 从具有 JavaScript 的网站上提取元数据。

1.0.1 2023-04-29 02:30 UTC

This package is auto-updated.

Last update: 2024-09-09 16:12:06 UTC


README

Sioner Metadata Extractor 使用 Chromedriver 通过 Symfony/Panther 从具有 JavaScript 的网站上提取元数据。

Total Downloads License: MIT

安装 Sioner

使用 Composer 在项目中安装 Sioner

composer require jeankassio/sioner-metadata-extractor

依赖关系

安装 ChromeDriver

sudo apt update
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install wget
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f
google-chrome --version

查看 Chrome 的版本

前往:https://chromedriver.chromium.org/downloads

image

点击您的版本。

image

下载到您的系统

解压并将文件上传到您的服务器

sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver

完成

使用方法

以下是 Sioner Metadata Extractor 返回的一些可用元数据

{
   "domain":"github.com",
   "canonical":"https:\/\/github.com\/jeankassio\/Sioner-Metadata-Extractor",
   "title":"GitHub - jeankassio\/Sioner-Metadata-Extractor: Sioner Metadata Extractor uses Chromedriver to extract metadata from websites with javascript, even if it is written in PHP",
   "image":"https:\/\/opengraph.githubassets.com\/b22dbba9d6ae7f1bf3f540334ce5b7c01e728daa06739db48430ca0804af9ab0\/jeankassio\/Sioner-Metadata-Extractor",
   "description":"Sioner Metadata Extractor uses Chromedriver to extract metadata from websites with javascript, even if it is written in PHP - GitHub - jeankassio\/Sioner-Metadata-Extractor: Sioner Metadata Extracto...",
   "icon":"https:\/\/github.com\/favicon.ico"
}
{
   "domain":"techland.time.com",
   "canonical":"https:\/\/techland.time.com\/2011\/04\/06\/linux-exec-competing-against-microsoft-is-like-kicking-a-puppy\/",
   "title":"Linux Exec: Competing Against Microsoft Is Like “Kicking a Puppy” | TIME.com",
   "image":"https:\/\/techland.time.com\/wp-content\/themes\/time2012\/library\/assets\/images\/time-logo-og.png",
   "description":"Depending who you ask, you'll get a different answer about who's winning the operating system wars. Of course, the Linux people think they've won, but here's the thing--they may be right.",
   "keywords":"business, news, linux, open-source, windows",
   "icon":"https:\/\/techland.time.com\/favicon.ico"
}
{
   "domain": "domain string",
   "canonical": "og:canonical link string",
   "title": "og:title/title website string",
   "image": "og:image/first image string",
   "description": "og:description/description string",
   "keywords": "keywords string",
   "icon": "apple-touch-icon/icon string",
   "author": "og:author/author string",
   "copyright": "copyright string"
}

它是如何工作的?

Sioner Metadata Extractor 可以在运行完全指定的秒数(稍后解释)之前,快速运行搜索以获取指定的数据,并获取这些数据而无需使用 JavaScript,从而节省其执行时间。为此,只需将您想要获取的数据作为第一个验证的必需参数 #4 传递即可。

use JeanKassio\Sioner\MetadataExtractor

$YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor";

$code = new MetadataExtractor($YourLink, null, null, ['website', 'title', 'image', 'description']);

$response = $code->ExtractMetadata();

echo json_encode($response, JSON_UNESCAPED_UNICODE);

但如果这样没有发生,它将使用 JavaScript 运行浏览器以获取数据。默认运行时间为 3 秒,但您可以通过设置参数 #2 来更改此值。

use JeanKassio\Sioner\MetadataExtractor

$YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor";

$code = new MetadataExtractor($YourLink, 2.5); //2.5 seconds

$response = $code->ExtractMetadata();

echo json_encode($response, JSON_UNESCAPED_UNICODE);

我们默认获取 200x200 的 og:image,如果找不到,则返回下一个比该尺寸更大的图像。如果没有匹配项,则返回最大的。但您也可以设置这些值。

use JeanKassio\Sioner\MetadataExtractor

$YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor";

$code = new MetadataExtractor($YourLink, null, [250,300]); //250 width, 300 height

$response = $code->ExtractMetadata();

echo json_encode($response, JSON_UNESCAPED_UNICODE);

这样,您可以传递您想要的参数并构建您想要的方式。观看

use JeanKassio\Sioner\MetadataExtractor

$YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor";

$code = new MetadataExtractor($YourLink, 5, [500,100], ['website', 'title', 'image', 'description']);

$response = $code->ExtractMetadata();

echo json_encode($response, JSON_UNESCAPED_UNICODE);

版权和许可

代码在 MIT 许可证 下发布。