mitseo / scraper
使用xpath、css选择器和正则表达式解析文档。
v1.0
2019-02-25 17:52 UTC
Requires
- atrox/matcher: ^1.1
- symfony/css-selector: ^3.4.22
Requires (Dev)
This package is not auto-updated.
Last update: 2024-10-02 14:20:10 UTC
README
这个库可以帮助您使用不同的资源解析数据
- 正则表达式
- XPath
- CSS选择器
可能的输出多种多样
- 匹配(match():boolean)
- 计算元素数量(count():int)
- 提取第一个元素(extractFirst():string)
- 提取所有元素(extractAll():array)
作者: Mitsu
使用composer安装
在您的composer.json
文件中将mitseo/scraper添加为require依赖
composer require mitseo/scraper
用法
使用正则表达式解析
use Mitseo\Scraper\Scraper; $string = "11111 222 33333 44444"; $regex1 = Scraper::regex("/[0-9]{5}/")->match($string); $regex2 = Scraper::regex("/([0-9]{5})/")->extractFirst($string); $regex3 = Scraper::regex("/([0-9]{5})/")->extractAll($string); $regex4 = Scraper::regex("/[0-9]{5}/")->count($string);
使用XPath解析
use Mitseo\Scraper\Scraper; $dom = file_get_contents('https://en.wikipedia.com/'); $xpath1 = Scraper::xpath("//a")->match($dom); $xpath2 = Scraper::xpath("//a")->extractFirst($dom); $xpath3 = Scraper::xpath("//a")->extractAll($dom); $xpath3 = Scraper::xpath("//a")->count($dom); $xpath4 = Scraper::xpath("//a",["anchor"=>".","href"=>"@href"])->extractTree($dom);
使用CSS选择器解析
use Mitseo\Scraper\Scraper; $dom = file_get_contents('https://en.wikipedia.com/'); $css1 = Scraper::css("h1#truc")->match($dom); $css2 = Scraper::css("h1")->extractFirst($dom); $css3 = Scraper::css("a")->extractAll($dom); $css4 = Scraper::css("a")->count($dom);