mitseo/scraper

使用xpath、css选择器和正则表达式解析文档。

v1.0 2019-02-25 17:52 UTC

This package is not auto-updated.

Last update: 2024-10-02 14:20:10 UTC


README

License: MIT Twitter URL

这个库可以帮助您使用不同的资源解析数据

  • 正则表达式
  • XPath
  • CSS选择器

可能的输出多种多样

  • 匹配(match():boolean)
  • 计算元素数量(count():int)
  • 提取第一个元素(extractFirst():string)
  • 提取所有元素(extractAll():array)

作者: Mitsu

使用composer安装

在您的composer.json文件中将mitseo/scraper添加为require依赖

composer require mitseo/scraper

用法

使用正则表达式解析

use Mitseo\Scraper\Scraper;

$string = "11111 222 33333 44444";

$regex1 = Scraper::regex("/[0-9]{5}/")->match($string);
$regex2 = Scraper::regex("/([0-9]{5})/")->extractFirst($string);
$regex3 = Scraper::regex("/([0-9]{5})/")->extractAll($string);
$regex4 = Scraper::regex("/[0-9]{5}/")->count($string);

使用XPath解析

use Mitseo\Scraper\Scraper;

$dom = file_get_contents('https://en.wikipedia.com/');

$xpath1 = Scraper::xpath("//a")->match($dom);
$xpath2 = Scraper::xpath("//a")->extractFirst($dom);
$xpath3 = Scraper::xpath("//a")->extractAll($dom);
$xpath3 = Scraper::xpath("//a")->count($dom);
$xpath4 = Scraper::xpath("//a",["anchor"=>".","href"=>"@href"])->extractTree($dom);

使用CSS选择器解析

use Mitseo\Scraper\Scraper;

$dom = file_get_contents('https://en.wikipedia.com/');

$css1 = Scraper::css("h1#truc")->match($dom);
$css2 = Scraper::css("h1")->extractFirst($dom);
$css3 = Scraper::css("a")->extractAll($dom);
$css4 = Scraper::css("a")->count($dom);