prokhonenkov / yii2-xhtml-parser
HTML 解析器
1.0
2020-01-07 12:07 UTC
Requires
- php: >=7.1
- yiisoft/yii2: ~2.0.0
This package is auto-updated.
Last update: 2024-09-07 22:45:55 UTC
README
此扩展可以解析HTML页面或XML文档,并返回结果树。
安装
安装此扩展的首选方式是通过 composer。
运行以下命令之一
php composer.phar require prokhonenkov/yii2-xhtml-parser
或
"prokhonenkov/yii2-xhtml-parser": "*"
将以下内容添加到您的 composer.json
文件的require部分。
配置
将组件声明添加到您的web配置文件中
<?php return [ // ... your config 'components' => [ 'xHtmlParser' => [ 'class' => \prokhonenkov\xhtmlparser\XHtmlParser::class ], ] ];
使用方法
示例
//Pass HTML content to the parser $result = \Yii::$app->xHtmlParser->parse($html); //or pass xml content to the parser $result = \Yii::$app->xHtmlParser->parse($xml , prokhonenkov\xhtmlparser\XHtmlParser::DRIVER_XML); /** Build query */ /** @var prokhonenkov\xhtmlparser\classes\interfaces\QueryInterface $query */ $query = $result->find(); /** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $treeOfResults */ $treeOfResults = $query ->child('div')->attribute('class', 'movie-info-wrapper')->alias('container') // search by tag name and attribute value ->begin() //search inside the previous tag (inside the "div") ->child('img')->attribute('width', '1190')->alias('mainImage') // search by tag name and attribute value ->child('div')->attribute('id', 'left_column')->alias('mainDiv') // search by tag name and attribute value ->begin() //search inside the previous tag (inside the "div") ->child('td')->text('Some text')->alias('production') // search by tag name and some text contained in the "td" tag ->begin() //search inside the previous tag (inside the "td") ->parent('tr') //search parrent tag by name ->begin() //search inside the previous tag (inside the "tr") ->child('td') //Get child tags "td" ->end() ->end() ->end() ->child('th')->text('Acters')->alias('acters') // search by tag name and some text contained in the "th" tag ->begin() ->parent('table') // Get parrent node ->begin() ->child('td') // Get child nodes ->end() ->end() ->end() ->execute(); /** Getting results of search */ //Get search result by tag alias. It returns \SplFixedArray instance. $containerList = $treeOfResults->getContainer(); /** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $container */ $container = $containerList->current(); /** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $mainDiv */ $mainDiv = $container ->getMainDiv() // returns \SplFixedArray ->current(); /** @var string $mainImage */ $mainImage = $container->getMainImage()->current()->getAttribute('src'); // Get attribute value /** @var string $production */ $production = $mainDiv ->getProduction()->current() // Get tag td by alias "production" ->getTr()->current() // Get tag "tr" by name ->getTd()->offsetGet(1) //Get second tag "td" ->getText(); // Get text content
您还可以在当前上下文中进行搜索。
示例
/** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $something */ $something = $mainDiv->find() ->child('div')->attribute('data-attribute') // Search by tag name which has attribute "data-attribute" ->begin() ->child('span')->attribute('class', 'someclass')->attribute('data-num')->alias('spanAlias') ->end() ->execute(); /** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $div */ $div = $something->getDiv(); $texts = []; if($div->count()) { /** @var SplFixedArray $spanAlias */ $spanAlias = $div->current()->getSpanAlias(); /** @var \prokhonenkov\xhtmlparser\classes\interfaces\TagInterface $item */ foreach($spanAlias as $item) { $texts[$item->getAttribute('data-num')] = $item->getText(); } }
如果您需要执行更复杂的搜索,可以使用xPath。
示例
/** @var \DOMXPath $xPath */ $xPath = $result->getXpath(); /** @var \DOMElement $domElement */ $domElement = $mainDiv->getDomElement(); // Search into context $domElement $xPath->query('xPath query', $domElement);