ordinary9843 / html-master
分析和爬取静态/动态网站的HTML结构
v1.0.0
2024-01-02 09:42 UTC
Requires
- php: >7.1
- guzzlehttp/guzzle: ^6.5
Requires (Dev)
- phpunit/phpunit: >7.0
This package is auto-updated.
Last update: 2024-09-23 09:13:22 UTC
README
简介
分析和爬取静态/动态网站的HTML结构。
要求
此库有以下要求
- PHP 7.1+
- NodeJs 12+
- 浏览器(默认浏览器是
/use/bin/chromium)
安装
要求
apt-get install nodejs
apt-get install chromium # or `chromium-browser`
使用composer安装包
composer require ordinary9843/html-master
使用
示例用法
<?php require './vendor/autoload.php'; use Ordinary9843\HtmlMaster; $htmlMaster = new HtmlMaster(); // For the first time use of this package, it is recommended to enable the debug mode. $htmlMaster->setDebug(true); // Set the browser path for dynamic mode. $htmlMaster->setExecutablePath('/usr/bin/chromium'); /** * Set the connection time (in seconds) for dynamic mode. * * If you are unable to obtain the dynamic (SPA) HTML. * You can try extending the wait time in seconds to wait for the website JavaScript elements to finish rendering. */ $htmlMaster->setWaitSeconds(5); // Set the connection time (in seconds) for static mode. $htmlMaster->setConnectTimeout(5); $htmlMaster->setTimeout(5); /** * The decision to execute the crawler in static or dynamic mode depends on whether your browser path is correctly set. * Please use `setExecutablePath()` to set the browser path. * * Output: [ * 'title' => '', * 'description' => '', * 'meta' => [ * 'keywords' => '', * 'description' => '', * 'viewport' => '', * 'author' => '', * 'copyright' => '', * 'robots' => '', * 'og' => [], * 'twitter' => [] * ], * 'icons' => [], * 'images' => [], * 'css' => [], * 'js' => [] * ] */ $htmlMaster->parse('https://github.com/ordinary9843'); /** * Get all messages. * * Output: [ * '[INFO] Message.', * '[ERROR] Message.' * ] */ $htmlMaster->getMessages();
测试
composer test
许可证
(MIT 许可证)