alexanderpavlov / stream-parser
PHP 多格式流式解析器
v1.4.4
2021-09-14 08:34 UTC
Requires
- php: ^7.1.3|^8.0
- ext-xmlreader: *
- maxakawizard/json-collection-parser: ^1.1
- tightenco/collect: ^5.0|^6.0|^7.0|^8.0
Requires (Dev)
- mockery/mockery: ^0.9.9
- phpunit/phpunit: ^6.0
This package is auto-updated.
Last update: 2024-09-14 15:07:04 UTC
README
在解析 XML/CSV/JSON/... 文档时,有 2 种方法可以考虑
DOM 加载:加载整个文档,便于导航和解析,为开发者提供了最大的灵活性。
流式处理:意味着遍历文档,就像一个光标,在路径上的每个元素处停止,从而避免过度消耗内存。
https://www.linkedin.com/pulse/processing-xml-documents-dom-vs-streaming-marius-ilina/
因此,对于大文件,回调将在文件下载的同时执行,从内存角度看将更加高效。
安装
composer require rodenastyle/stream-parser
推荐用法
尽可能委托回调执行,以免阻塞文档读取
(基于 Laravel Queue 的示例)
use Tightenco\Collect\Support\Collection; StreamParser::xml("https://example.com/users.xml")->each(function(Collection $user){ dispatch(new App\Jobs\SendEmail($user)); });
实际输入/代码/输出演示
XML
<bookstore> <book ISBN="10-000000-001"> <title>The Iliad and The Odyssey</title> <price>12.95</price> <comments> <userComment rating="4"> Best translation I've read. </userComment> <userComment rating="2"> I like other versions better. </userComment> </comments> </book> [...] </bookstore>
use Tightenco\Collect\Support\Collection; StreamParser::xml("https://example.com/books.xml")->each(function(Collection $book){ var_dump($book); var_dump($book->get('comments')->toArray()); });
class Tightenco\Collect\Support\Collection#19 (1) {
protected $items =>
array(4) {
'ISBN' =>
string(13) "10-000000-001"
'title' =>
string(25) "The Iliad and The Odyssey"
'price' =>
string(5) "12.95"
'comments' =>
class Tightenco\Collect\Support\Collection#17 (1) {
protected $items =>
array(2) {
...
}
}
}
}
array(2) {
[0] =>
array(2) {
'rating' =>
string(1) "4"
'userComment' =>
string(27) "Best translation I've read."
}
[1] =>
array(2) {
'rating' =>
string(1) "2"
'userComment' =>
string(29) "I like other versions better."
}
}
此外,您还可以使用 ->withSeparatedParametersList()
来获取每个元素的参数,这些参数被分离在 __params
属性上。同时,->withoutSkippingFirstElement()
也有助于解析第一个元素(通常是包含其他元素的元素)。
JSON
[ { "title": "The Iliad and The Odyssey", "price": 12.95, "comments": [ {"comment": "Best translation I've read."}, {"comment": "I like other versions better."} ] }, { "title": "Anthology of World Literature", "price": 24.95, "comments": [ {"comment": "Needs more modern literature."}, {"comment": "Excellent overview of world literature."} ] } ]
use Tightenco\Collect\Support\Collection; StreamParser::json("https://example.com/books.json")->each(function(Collection $book){ var_dump($book->get('comments')->count()); });
int(2)
int(2)
CSV
title,price,comments
The Iliad and The Odyssey,12.95,"Best translation I've read.,I like other versions better."
Anthology of World Literature,24.95,"Needs more modern literature.,Excellent overview of world literature."
use Tightenco\Collect\Support\Collection; StreamParser::csv("https://example.com/books.csv")->each(function(Collection $book){ var_dump($book->get('comments')->last()); });
string(29) "I like other versions better."
string(39) "Excellent overview of world literature."
许可证
此库在 MIT 许可证下发布。