rodenastyle/stream-parser

PHP 多格式流式解析器

资助包维护!
Patreon

安装数: 133 134

依赖项: 3

建议者: 0

安全: 0

星标: 439

关注者: 12

分支: 46

开放问题: 7

v2.0.1 2023-10-03 09:39 UTC

README

Build Status Latest Version on Packagist Quality Score Code Coverage License

当涉及到解析 XML/CSV/JSON/... 文档时,有 2 种方法可以考虑

DOM 加载:加载整个文档,使其易于导航和解析,因此为开发人员提供了最大灵活性。

流式处理:意味着遍历文档,类似于光标,在每个元素处停止,从而避免内存过度使用。

https://www.linkedin.com/pulse/processing-xml-documents-dom-vs-streaming-marius-ilina/

因此,对于大文件,回调将在文件下载的同时执行,从内存效率的角度来看将更加高效。

安装

composer require rodenastyle/stream-parser

推荐用法

尽可能委托回调执行,以避免阻塞文档读取

(基于 Laravel 队列的示例)

use Illuminate\Support\Collection;

StreamParser::xml("https://example.com/users.xml")->each(function(Collection $user){
    dispatch(new App\Jobs\SendEmail($user));
});

实用输入/代码/输出演示

XML

<bookstore>
    <book ISBN="10-000000-001">
        <title>The Iliad and The Odyssey</title>
        <price>12.95</price>
        <comments>
            <userComment rating="4">
                Best translation I've read.
            </userComment>
            <userComment rating="2">
                I like other versions better.
            </userComment>
        </comments>
    </book>
    [...]
</bookstore>
use Illuminate\Support\Collection;

StreamParser::xml("https://example.com/books.xml")->each(function(Collection $book){
    var_dump($book);
    var_dump($book->get('comments')->toArray());
});
class Tightenco\Collect\Support\Collection#19 (1) {
  protected $items =>
  array(4) {
    'ISBN' =>
    string(13) "10-000000-001"
    'title' =>
    string(25) "The Iliad and The Odyssey"
    'price' =>
    string(5) "12.95"
    'comments' =>
    class Tightenco\Collect\Support\Collection#17 (1) {
      protected $items =>
      array(2) {
        ...
      }
    }
  }
}
array(2) {
  [0] =>
  array(2) {
    'rating' =>
    string(1) "4"
    'userComment' =>
    string(27) "Best translation I've read."
  }
  [1] =>
  array(2) {
    'rating' =>
    string(1) "2"
    'userComment' =>
    string(29) "I like other versions better."
  }
}

此外,您可以使用 ->withSeparatedParametersList() 来获取每个元素的参数,这些参数在 __params 属性上被分隔。此外,->withoutSkippingFirstElement() 有助于解析第一个元素(通常是包含元素的元素)。

JSON

[
  {
    "title": "The Iliad and The Odyssey",
    "price": 12.95,
    "comments": [
      {"comment": "Best translation I've read."},
      {"comment": "I like other versions better."}
    ]
  },
  {
    "title": "Anthology of World Literature",
    "price": 24.95,
    "comments": [
      {"comment": "Needs more modern literature."},
      {"comment": "Excellent overview of world literature."}
    ]
  }
]
use Illuminate\Support\Collection;

StreamParser::json("https://example.com/books.json")->each(function(Collection $book){
    var_dump($book->get('comments')->count());
});
int(2)
int(2)

CSV

title,price,comments
The Iliad and The Odyssey,12.95,"Best translation I've read.,I like other versions better."
Anthology of World Literature,24.95,"Needs more modern literature.,Excellent overview of world literature."
use Illuminate\Support\Collection;

StreamParser::csv("https://example.com/books.csv")->each(function(Collection $book){
    var_dump($book->get('comments')->last());
});
string(29) "I like other versions better."
string(39) "Excellent overview of world literature."

许可

本库采用 MIT 许可证发布。