mediashare / crawler
从网页抓取URL并提供Scraper Library的DomCrawler
0.2.8
2021-11-27 19:44 UTC
Requires
- league/climate: ^3.5
- mediashare/scraper: *
Requires (Dev)
- tracy/tracy: ^2.7
This package is auto-updated.
Last update: 2024-09-10 17:35:09 UTC
README
💫 从网页抓取URL并提供带有Scraper Library的DomCrawler。
DomCrawler
Scraper使用DomCrawler库。这是用于HTML和XML文档DOM导航的symfony组件。您可以在此处获取文档。
安装
composer require mediashare/crawler
用法
<?php require 'vendor/autoload.php'; use Mediashare\Crawler\Crawler; $crawler = new Crawler("https://mediashare.fr"); $crawler->run(); dump($crawler);
带有配置
<?php require 'vendor/autoload.php'; use Mediashare\Crawler\Crawler; use Mediashare\Crawler\Config; $config = new Config(); $config->setWebspider(true); // All website crawling $config->setVerbose(true); // Prompt progress bar $config->setPathRequires(['/Kernel/']); // Not crawl other path $config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path $crawler = new Crawler("https://mediashare.fr", $config); $crawler->run(); dump($crawler);