jeankassio / sioner-metadata-extractor
Sioner Metadata Extractor 使用 Chromedriver 通过 Symfony 从具有 JavaScript 的网站上提取元数据。
1.0.1
2023-04-29 02:30 UTC
Requires
- php: >=8.0
- ext-dom: *
- ext-libxml: *
- php-webdriver/webdriver: ^1.8.2
- symfony/browser-kit: ^5.3 || ^6.0
- symfony/css-selector: ^5.3 || ^6.0
- symfony/dependency-injection: ^5.3 || ^6.0
- symfony/deprecation-contracts: ^2.4 || ^3
- symfony/dom-crawler: ^5.3 || ^6.0
- symfony/http-client: ^5.3 || ^6.0
- symfony/http-kernel: ^5.3 || ^6.0
- symfony/panther: ^2.0.1
- symfony/process: ^5.3 || ^6.0
README
Sioner Metadata Extractor 使用 Chromedriver 通过 Symfony/Panther 从具有 JavaScript 的网站上提取元数据。
安装 Sioner
使用 Composer 在项目中安装 Sioner
composer require jeankassio/sioner-metadata-extractor
依赖关系
安装 ChromeDriver
sudo apt update
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install wget
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f
google-chrome --version
查看 Chrome 的版本
前往:https://chromedriver.chromium.org/downloads
点击您的版本。
下载到您的系统
解压并将文件上传到您的服务器
sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver
完成
使用方法
以下是 Sioner Metadata Extractor 返回的一些可用元数据
{ "domain":"github.com", "canonical":"https:\/\/github.com\/jeankassio\/Sioner-Metadata-Extractor", "title":"GitHub - jeankassio\/Sioner-Metadata-Extractor: Sioner Metadata Extractor uses Chromedriver to extract metadata from websites with javascript, even if it is written in PHP", "image":"https:\/\/opengraph.githubassets.com\/b22dbba9d6ae7f1bf3f540334ce5b7c01e728daa06739db48430ca0804af9ab0\/jeankassio\/Sioner-Metadata-Extractor", "description":"Sioner Metadata Extractor uses Chromedriver to extract metadata from websites with javascript, even if it is written in PHP - GitHub - jeankassio\/Sioner-Metadata-Extractor: Sioner Metadata Extracto...", "icon":"https:\/\/github.com\/favicon.ico" }
{ "domain":"techland.time.com", "canonical":"https:\/\/techland.time.com\/2011\/04\/06\/linux-exec-competing-against-microsoft-is-like-kicking-a-puppy\/", "title":"Linux Exec: Competing Against Microsoft Is Like “Kicking a Puppy” | TIME.com", "image":"https:\/\/techland.time.com\/wp-content\/themes\/time2012\/library\/assets\/images\/time-logo-og.png", "description":"Depending who you ask, you'll get a different answer about who's winning the operating system wars. Of course, the Linux people think they've won, but here's the thing--they may be right.", "keywords":"business, news, linux, open-source, windows", "icon":"https:\/\/techland.time.com\/favicon.ico" }
{ "domain": "domain string", "canonical": "og:canonical link string", "title": "og:title/title website string", "image": "og:image/first image string", "description": "og:description/description string", "keywords": "keywords string", "icon": "apple-touch-icon/icon string", "author": "og:author/author string", "copyright": "copyright string" }
它是如何工作的?
Sioner Metadata Extractor 可以在运行完全指定的秒数(稍后解释)之前,快速运行搜索以获取指定的数据,并获取这些数据而无需使用 JavaScript,从而节省其执行时间。为此,只需将您想要获取的数据作为第一个验证的必需参数 #4 传递即可。
use JeanKassio\Sioner\MetadataExtractor $YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor"; $code = new MetadataExtractor($YourLink, null, null, ['website', 'title', 'image', 'description']); $response = $code->ExtractMetadata(); echo json_encode($response, JSON_UNESCAPED_UNICODE);
但如果这样没有发生,它将使用 JavaScript 运行浏览器以获取数据。默认运行时间为 3 秒,但您可以通过设置参数 #2 来更改此值。
use JeanKassio\Sioner\MetadataExtractor $YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor"; $code = new MetadataExtractor($YourLink, 2.5); //2.5 seconds $response = $code->ExtractMetadata(); echo json_encode($response, JSON_UNESCAPED_UNICODE);
我们默认获取 200x200 的 og:image,如果找不到,则返回下一个比该尺寸更大的图像。如果没有匹配项,则返回最大的。但您也可以设置这些值。
use JeanKassio\Sioner\MetadataExtractor $YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor"; $code = new MetadataExtractor($YourLink, null, [250,300]); //250 width, 300 height $response = $code->ExtractMetadata(); echo json_encode($response, JSON_UNESCAPED_UNICODE);
这样,您可以传递您想要的参数并构建您想要的方式。观看
use JeanKassio\Sioner\MetadataExtractor $YourLink = "https://github.com/jeankassio/Sioner-Metadata-Extractor"; $code = new MetadataExtractor($YourLink, 5, [500,100], ['website', 'title', 'image', 'description']); $response = $code->ExtractMetadata(); echo json_encode($response, JSON_UNESCAPED_UNICODE);
版权和许可
代码在 MIT 许可证 下发布。