stil/xpath-selector

此包已被弃用且不再维护。未建议替代包。
此包最新版本(2.0)没有提供许可信息。

一个用于轻松爬取HTML或XML页面的库。使用XPath查询。

安装量: 18,280

依赖: 3

建议者: 0

安全性: 0

星标: 28

关注者: 3

分支: 1

公开问题: 1

语言:HTML

2.0 2014-12-08 16:54 UTC

This package is not auto-updated.

Last update: 2021-02-19 20:31:24 UTC


README

##XPathSelector ##描述 XPathSelector 是为HTML网页爬取而创建的库。它受到Python的Scrapy的启发。它使用PHP DOM扩展,请确保已安装。PHP 5.4是最低版本。

##安装 推荐通过 Composer 安装XPathSelector。运行以下命令

composer require stil/xpath-selector

###简介 所有搜索的起点是 XPathSelector\Selector 类。它允许您加载HTML或XML,然后对其进行处理。有多种方法可以实现

use XPathSelector\Selector;
$xs = Selector::load($pathToXml);
$xs = Selector::loadHTMLFile($pathToHtml);
$xs = Selector::loadXML($xmlString);
$xs = Selector::loadHTML($htmlString);

接下来,您需要决定是要搜索单个DOM元素还是多个元素。对于单个搜索,使用 find($query) 方法。

use XPathSelector\Exception\NodeNotFoundException;

try {
	$element = $xs->find('//head'); // returns first <head> element found
	echo $element->innerHTML(); // print innerHTML of <head> tag
} catch (NodeNotFoundException $e) {
	echo $e->getMessage(); // nothing have been found
}

如果您需要多个结果,请使用 findAll($query) 代替。此方法返回 XPathSelector\NodeListInterface 实例。请在API中查看。

use XPathSelector\Selector;

$urls = $xs->findAll('//a/@href');
foreach ($urls as $url) {
	echo $url;
}

您需要检查XPath路径是否存在吗?请使用 findOneOrNull($query) 方法。当没有找到结果时,它返回 Node 对象或null。它与 find($query) 的行为类似,只是返回null而不是抛出异常。

use XPathSelector\Selector;

$doesExist = $xs->findOneOrNull('//a/@href') !== null;

###sample.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<bookstore>
	<book category="COOKING">
		<title lang="en">Everyday Italian</title>
		<author>Giada De Laurentiis</author>
		<year>2005</year>
		<price>30.00</price>
	</book>
	<book category="CHILDREN">
		<title lang="en">Harry Potter</title>
		<author>J K. Rowling</author>
		<year>2005</year>
		<price>29.99</price>
	</book>
	<book category="WEB">
		<title lang="en">XQuery Kick Start</title>
		<author>James McGovern</author>
		<author>Per Bothner</author>
		<author>Kurt Cagle</author>
		<author>James Linn</author>
		<author>Vaidyanathan Nagarajan</author>
		<year>2003</year>
		<price>49.99</price>
	</book>
	<book category="WEB">
		<title lang="en">Learning XML</title>
		<author>Erik T. Ray</author>
		<year>2003</year>
		<price>39.95</price>
	</book>
</bookstore>

###搜索单个结果

<?php
use XPathSelector\Selector;
$xs = Selector::load('sample.xml');

echo $xs->find('/bookstore/book[1]/title');

结果

Everyday Italian

###搜索多个结果

<?php
use XPathSelector\Selector;
$xs = Selector::load('sample.xml');

foreach ($xs->findAll('/bookstore/book') as $book) {
	printf(
		"[Title: %s][Price: %s]\n",
		$book->find('title')->extract(),
		$book->find('price')->extract()
	);
}

结果

[Title: Everyday Italian][Price: 30.00]
[Title: Harry Potter][Price: 29.99]
[Title: XQuery Kick Start][Price: 49.99]
[Title: Learning XML][Price: 39.95]

###将结果集映射到数组

<?php
use XPathSelector\Selector;
$xs = Selector::load('sample.xml');

$array = $xs->findAll('/bookstore/book')->map(function ($node, $index) {
	return [
		'index' => $index,
		'title' => $node->find('title')->extract(),
		'price' => (float)$node->find('price')->extract()
	];
});

var_dump($array);

结果

array(4) {
  [0] =>
  array(3) {
    'index' =>
    int(0)
    'title' =>
    string(16) "Everyday Italian"
    'price' =>
    double(30)
  }
  [1] =>
  array(3) {
    'index' =>
    int(1)
    'title' =>
    string(12) "Harry Potter"
    'price' =>
    double(29.99)
  }
  [2] =>
  array(3) {
    'index' =>
    int(2)
    'title' =>
    string(17) "XQuery Kick Start"
    'price' =>
    double(49.99)
  }
  [3] =>
  array(3) {
    'index' =>
    int(3)
    'title' =>
    string(12) "Learning XML"
    'price' =>
    double(39.95)
  }
}