README

DiDOM - 简单快速的HTML解析器。

README（俄语版）
DiDOM 1.x 文档. 要从 1.x 升级，请查看变更日志.

内容

安装
快速开始
创建新文档
搜索元素
验证元素是否存在
在元素中搜索
支持的选择器
更改内容
输出
处理元素
处理缓存
杂项
与其他解析器的比较

安装

要安装DiDOM，请运行以下命令

composer require imangazaliev/didom

快速开始

use DiDom\Document;

$document = new Document('http://www.news.com/', true);

$posts = $document->find('.post');

foreach($posts as $post) {
    echo $post->text(), "\n";
}

创建新文档

DiDom允许以多种方式加载HTML

使用构造函数

// the first parameter is a string with HTML
$document = new Document($html);

// file path
$document = new Document('page.html', true);

// or URL
$document = new Document('http://www.example.com/', true);

第二个参数指定是否需要加载文件。默认为false。

签名

__construct($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)

$string - HTML或XML字符串或文件路径。

$isFile - 指示第一个参数是文件路径。

$encoding - 文档编码。

$type - 文档类型（HTML - Document::TYPE_HTML，XML - Document::TYPE_XML）。

使用单独的方法

$document = new Document();

$document->loadHtml($html);

$document->loadHtmlFile('page.html');

$document->loadHtmlFile('http://www.example.com/');

有两种方法可用于加载XML：loadXml和loadXmlFile。

这些方法接受额外的选项

$document->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadHtmlFile($url, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$document->loadXml($xml, LIBXML_PARSEHUGE);
$document->loadXmlFile($url, LIBXML_PARSEHUGE);

搜索元素

DiDOM接受CSS选择器或XPath作为搜索的表达式。您需要将路径表达式作为第一个参数，并在第二个参数中指定其类型（默认类型为Query::TYPE_CSS）

使用`find()`方法

use DiDom\Document;
use DiDom\Query;

...

// CSS selector
$posts = $document->find('.post');

// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);

如果找到与给定表达式匹配的元素，则方法返回一个包含DiDom\Element实例的数组，否则返回一个空数组。您还可以获取一个包含DOMElement对象的数组。为此，将false作为第三个参数传递。

使用魔法方法`__invoke()`

$posts = $document('.post');

警告：使用此方法是不受欢迎的，因为它可能会在未来被移除。

使用`xpath()`方法

$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");

您可以在元素内部进行搜索

echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();

验证元素是否存在

要验证元素是否存在，请使用has()方法

if ($document->has('.post')) {
    // code
}

如果您需要检查元素是否存在，然后获取它

if ($document->has('.post')) {
    $elements = $document->find('.post');
    // code
}

但这样做会更快

if (count($elements = $document->find('.post')) > 0) {
    // code
}

因为在前一种情况下，它进行了两次查询。

在元素中搜索

方法find()、first()、xpath()、has()和count()也适用于Element。

示例

echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();

方法`findInDocument()`

如果您更改、替换或删除在另一个元素中找到的元素，则文档将不会被更改。这是因为Element类的find()方法（以及相应的first ()和xpath方法）创建了一个新的文档来搜索。

要搜索源文档中的元素，您必须使用findInDocument()和firstInDocument()方法。

// nothing will happen
$document->first('head')->first('title')->remove();

// but this will do
$document->first('head')->firstInDocument('title')->remove();

警告：方法 findInDocument() 和 firstInDocument() 仅适用于属于文档的元素，以及通过 new Element(...) 创建的元素。如果元素不属于文档，将抛出 LogicException;

支持的选择器

DiDom 支持以下搜索：

标签
类、ID、属性名和属性值
伪类
- first-, last-, nth-child
- 空和非空
- 包含
- 有

// all links
$document->find('a');

// any element with id = "foo" and "bar" class
$document->find('#foo.bar');

// any element with attribute "name"
$document->find('[name]');
// the same as
$document->find('*[name]');

// input field with the name "foo"
$document->find('input[name=foo]');
$document->find('input[name=\'bar\']');
$document->find('input[name="baz"]');

// any element that has an attribute starting with "data-" and the value "foo"
$document->find('*[^data-=foo]');

// all links starting with https
$document->find('a[href^=https]');

// all images with the extension png
$document->find('img[src$=png]');

// all links containing the string "example.com"
$document->find('a[href*=example.com]');

// text of the links with "foo" class
$document->find('a.foo::text');

// address and title of all the fields with "bar" class
$document->find('a.bar::attr(href|title)');

更改内容

更改内部 HTML

$element->setInnerHtml('<a href="#">Foo</a>');

更改内部 XML

$element->setInnerXml(' Foo <span>Bar</span><!-- Baz --><![CDATA[
    <root>Hello world!</root>
]]>');

更改值（作为纯文本）

$element->setValue('Foo');
// will be encoded like using htmlentities()
$element->setValue('<a href="#">Foo</a>');

输出

获取 HTML

使用方法 `html()`

$posts = $document->find('.post');

echo $posts[0]->html();

转换为字符串

$html = (string) $posts[0];

格式化 HTML 输出

$html = $document->format()->html();

元素没有 format() 方法，因此如果您需要输出元素的格式化 HTML，则首先必须将其转换为文档

$html = $element->toDocument()->format()->html();

内部 HTML

$innerHtml = $element->innerHtml();

文档没有 innerHtml() 方法，因此，如果您需要获取文档的内部 HTML，则首先将其转换为元素

$innerHtml = $document->toElement()->innerHtml();

获取 XML

echo $document->xml();

echo $document->first('book')->xml();

获取内容

$posts = $document->find('.post');

echo $posts[0]->text();

创建新元素

创建类的实例

use DiDom\Element;

$element = new Element('span', 'Hello');

// Outputs "<span>Hello</span>"
echo $element->html();

第一个参数是属性名，第二个参数是其值（可选），第三个参数是元素属性（可选）。

创建具有属性元素的示例

$attributes = ['name' => 'description', 'placeholder' => 'Enter description of item'];

$element = new Element('textarea', 'Text', $attributes);

可以从 DOMElement 类的实例创建元素

use DiDom\Element;
use DOMElement;

$domElement = new DOMElement('span', 'Hello');

$element = new Element($domElement);

使用方法 `createElement`

$document = new Document($html);

$element = $document->createElement('span', 'Hello');

获取元素名称

$element->tagName();

获取父元素

$document = new Document($html);

$input = $document->find('input[name=email]')[0];

var_dump($input->parent());

获取同级元素

$document = new Document($html);

$item = $document->find('ul.menu > li')[1];

var_dump($item->previousSibling());

var_dump($item->nextSibling());

获取子元素

$html = '<div>Foo<span>Bar</span><!--Baz--></div>';

$document = new Document($html);

$div = $document->first('div');

// element node (DOMElement)
// string(3) "Bar"
var_dump($div->child(1)->text());

// text node (DOMText)
// string(3) "Foo"
var_dump($div->firstChild()->text());

// comment node (DOMComment)
// string(3) "Baz"
var_dump($div->lastChild()->text());

// array(3) { ... }
var_dump($div->children());

获取所有者文档

$document = new Document($html);

$element = $document->find('input[name=email]')[0];

$document2 = $element->ownerDocument();

// bool(true)
var_dump($document->is($document2));

处理元素属性

创建/更新属性

使用方法 `setAttribute`

$element->setAttribute('name', 'username');

使用方法 `attr`

$element->attr('name', 'username');

使用魔法方法 `__set`

$element->name = 'username';

获取属性值

使用方法 `getAttribute`

$username = $element->getAttribute('value');

使用方法 `attr`

$username = $element->attr('value');

使用魔法方法 `__get`

$username = $element->name;

如果找不到属性，则返回 null。

验证属性是否存在

使用方法 `hasAttribute`

if ($element->hasAttribute('name')) {
    // code
}

使用魔法方法 `__isset`

if (isset($element->name)) {
    // code
}

删除属性

使用方法 `removeAttribute`

$element->removeAttribute('name');

使用魔法方法 `__unset`

unset($element->name);

比较元素

$element  = new Element('span', 'hello');
$element2 = new Element('span', 'hello');

// bool(true)
var_dump($element->is($element));

// bool(false)
var_dump($element->is($element2));

添加子元素

$list = new Element('ul');

$item = new Element('li', 'Item 1');

$list->appendChild($item);

$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($items);

添加子元素

$list = new Element('ul');

$item = new Element('li', 'Item 1');
$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($item);
$list->appendChild($items);

替换元素

$element = new Element('span', 'hello');

$document->find('.post')[0]->replace($element);

警告：您只能替换在文档中直接找到的元素

// nothing will happen
$document->first('head')->first('title')->replace($title);

// but this will do
$document->first('head title')->replace($title);

更多关于此内容在搜索元素部分。

删除元素

$document->find('.post')[0]->remove();

警告：您只能删除在文档中直接找到的元素

// nothing will happen
$document->first('head')->first('title')->remove();

// but this will do
$document->first('head title')->remove();

更多关于此内容在搜索元素部分。

处理缓存

缓存是从 CSS 转换而来的 XPath 表达式的数组。

从缓存中获取

use DiDom\Query;

...

$xpath    = Query::compile('h2');
$compiled = Query::getCompiled();

// array('h2' => '//h2')
var_dump($compiled);

缓存设置

Query::setCompiled(['h2' => '//h2']);

杂项

`preserveWhiteSpace`

默认情况下，禁用空白保留。

您可以在加载文档之前启用 preserveWhiteSpace 选项

$document = new Document();

$document->preserveWhiteSpace();

$document->loadXml($xml);

`count`

方法 count () 计算匹配选择器的子元素数量

// prints the number of links in the document
echo $document->count('a');

// prints the number of items in the list
echo $document->first('ul')->count('li');

`matches`

如果节点匹配选择器，则返回 true

$element->matches('div#content');

// strict match
// returns true if the element is a div with id equals content and nothing else
// if the element has any other attributes the method returns false
$element->matches('div#content', true);

`isElementNode`

检查元素是否是元素（DOMElement）

$element->isElementNode();

`isTextNode`

检查元素是否是文本节点（DOMText）

$element->isTextNode();

`isCommentNode`

检查元素是否是注释（DOMComment）

$element->isCommentNode();

imangazaliev / didom

维护者

详细信息

README

内容

安装

快速开始

创建新文档

使用构造函数

使用单独的方法

搜索元素

使用find()方法

使用魔法方法__invoke()

使用xpath()方法

验证元素是否存在

在元素中搜索

方法findInDocument()

支持的选择器

更改内容

更改内部 HTML

更改内部 XML

更改值（作为纯文本）

输出

获取 HTML

使用方法 html()

转换为字符串

格式化 HTML 输出

内部 HTML

获取 XML

获取内容

创建新元素

创建类的实例

使用方法 createElement

获取元素名称

获取父元素

获取同级元素

获取子元素

获取所有者文档

处理元素属性

创建/更新属性

使用方法 setAttribute

使用方法 attr

使用魔法方法 __set

获取属性值

使用方法 getAttribute

使用方法 attr

使用魔法方法 __get

验证属性是否存在

使用方法 hasAttribute

使用魔法方法 __isset

删除属性

使用方法 removeAttribute

使用魔法方法 __unset

比较元素

添加子元素

添加子元素

替换元素

删除元素

处理缓存

从缓存中获取

缓存设置

杂项

preserveWhiteSpace

count

matches

isElementNode

isTextNode

isCommentNode

与其他解析器的比较

使用`find()`方法

使用魔法方法`__invoke()`

使用`xpath()`方法

方法`findInDocument()`

使用方法 `html()`

使用方法 `createElement`

使用方法 `setAttribute`

使用方法 `attr`

使用魔法方法 `__set`

使用方法 `getAttribute`

使用方法 `attr`

使用魔法方法 `__get`

使用方法 `hasAttribute`

使用魔法方法 `__isset`

使用方法 `removeAttribute`

使用魔法方法 `__unset`

`preserveWhiteSpace`

`count`

`matches`

`isElementNode`

`isTextNode`

`isCommentNode`