stil / xpath-selector
此包已被弃用且不再维护。未建议替代包。
此包最新版本(2.0)没有提供许可信息。
一个用于轻松爬取HTML或XML页面的库。使用XPath查询。
2.0
2014-12-08 16:54 UTC
This package is not auto-updated.
Last update: 2021-02-19 20:31:24 UTC
README
##XPathSelector ##描述 XPathSelector 是为HTML网页爬取而创建的库。它受到Python的Scrapy的启发。它使用PHP DOM扩展,请确保已安装。PHP 5.4是最低版本。
##安装 推荐通过 Composer 安装XPathSelector。运行以下命令
composer require stil/xpath-selector
###简介 所有搜索的起点是 XPathSelector\Selector
类。它允许您加载HTML或XML,然后对其进行处理。有多种方法可以实现
use XPathSelector\Selector; $xs = Selector::load($pathToXml); $xs = Selector::loadHTMLFile($pathToHtml); $xs = Selector::loadXML($xmlString); $xs = Selector::loadHTML($htmlString);
接下来,您需要决定是要搜索单个DOM元素还是多个元素。对于单个搜索,使用 find($query)
方法。
use XPathSelector\Exception\NodeNotFoundException; try { $element = $xs->find('//head'); // returns first <head> element found echo $element->innerHTML(); // print innerHTML of <head> tag } catch (NodeNotFoundException $e) { echo $e->getMessage(); // nothing have been found }
如果您需要多个结果,请使用 findAll($query)
代替。此方法返回 XPathSelector\NodeListInterface
实例。请在API中查看。
use XPathSelector\Selector; $urls = $xs->findAll('//a/@href'); foreach ($urls as $url) { echo $url; }
您需要检查XPath路径是否存在吗?请使用 findOneOrNull($query)
方法。当没有找到结果时,它返回 Node
对象或null。它与 find($query)
的行为类似,只是返回null而不是抛出异常。
use XPathSelector\Selector; $doesExist = $xs->findOneOrNull('//a/@href') !== null;
###sample.xml
<?xml version="1.0" encoding="ISO-8859-1" ?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
###搜索单个结果
<?php use XPathSelector\Selector; $xs = Selector::load('sample.xml'); echo $xs->find('/bookstore/book[1]/title');
结果
Everyday Italian
###搜索多个结果
<?php use XPathSelector\Selector; $xs = Selector::load('sample.xml'); foreach ($xs->findAll('/bookstore/book') as $book) { printf( "[Title: %s][Price: %s]\n", $book->find('title')->extract(), $book->find('price')->extract() ); }
结果
[Title: Everyday Italian][Price: 30.00]
[Title: Harry Potter][Price: 29.99]
[Title: XQuery Kick Start][Price: 49.99]
[Title: Learning XML][Price: 39.95]
###将结果集映射到数组
<?php use XPathSelector\Selector; $xs = Selector::load('sample.xml'); $array = $xs->findAll('/bookstore/book')->map(function ($node, $index) { return [ 'index' => $index, 'title' => $node->find('title')->extract(), 'price' => (float)$node->find('price')->extract() ]; }); var_dump($array);
结果
array(4) {
[0] =>
array(3) {
'index' =>
int(0)
'title' =>
string(16) "Everyday Italian"
'price' =>
double(30)
}
[1] =>
array(3) {
'index' =>
int(1)
'title' =>
string(12) "Harry Potter"
'price' =>
double(29.99)
}
[2] =>
array(3) {
'index' =>
int(2)
'title' =>
string(17) "XQuery Kick Start"
'price' =>
double(49.99)
}
[3] =>
array(3) {
'index' =>
int(3)
'title' =>
string(12) "Learning XML"
'price' =>
double(39.95)
}
}