mediashare/crawler

从网页抓取URL并提供Scraper Library的DomCrawler

0.2.8 2021-11-27 19:44 UTC

README

💫 从网页抓取URL并提供带有Scraper Library的DomCrawler。

DomCrawler

Scraper使用DomCrawler库。这是用于HTML和XML文档DOM导航的symfony组件。您可以在此处获取文档

安装

composer require mediashare/crawler

用法

<?php
require 'vendor/autoload.php';

use Mediashare\Crawler\Crawler;

$crawler = new Crawler("https://mediashare.fr");
$crawler->run();
dump($crawler);
带有配置
<?php
require 'vendor/autoload.php';

use Mediashare\Crawler\Crawler;
use Mediashare\Crawler\Config;

$config = new Config();
$config->setWebspider(true); // All website crawling
$config->setVerbose(true); // Prompt progress bar
$config->setPathRequires(['/Kernel/']); // Not crawl other path
$config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path

$crawler = new Crawler("https://mediashare.fr", $config);
$crawler->run();
dump($crawler);