pietercolpaert / hardf
一个快速的RDF序列化解析器,例如turtle、n-triples、n-quads、trig和N3
Requires
- php: ^7.1|^8.0
Requires (Dev)
- friendsofphp/php-cs-fixer: *
- phpstan/phpstan: ^0.12.36
- phpunit/phpunit: ^7 || ^8 || ^9
This package is auto-updated.
Last update: 2024-08-26 08:14:23 UTC
README
hardf 是一个PHP 7.1+库,允许您处理链接数据(RDF)。它提供
- 解析 从 Turtle、TriG、N-Triples、N-Quads 和 Notation3 (N3) 的三元组/四元组
- 写入 三元组/四元组到 Turtle、TriG、N-Triples 和 N-Quads
解析器和序列化器都支持 流式处理。
这个库是将 N3.js 移植到PHP的版本。
三元组表示
我们使用从NodeJS N3.js库移植到PHP的三元组表示。有关更多信息,请参阅 https://github.com/rdfjs/N3.js/tree/v0.10.0#triple-representation
出于性能考虑,我们没有关注开发友好性。因此,我们使用关联数组而不是PHP对象来实现这个三元组表示。因此,与N3.js相同,现在是一个数组。例如。
<?php $triple = [ 'subject' => 'http://example.org/cartoons#Tom', 'predicate' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'object' => 'http://example.org/cartoons#Cat', 'graph' => 'http://example.org/mycartoon', #optional ];
按照以下方式编码字面量(类似于N3.js)
'"Tom"@en-gb' // lowercase language '"1"^^http://www.w3.org/2001/XMLSchema#integer' // no angular brackets <>
库函数
使用 composer 安装此库
composer require pietercolpaert/hardf
写入
use pietercolpaert\hardf\TriGWriter;
一个应该被实例化并且可以写入TriG或Turtle的类
示例使用
$writer = new TriGWriter([ "prefixes" => [ "schema" =>"http://schema.org/", "dct" =>"http://purl.org/dc/terms/", "geo" =>"http://www.w3.org/2003/01/geo/wgs84_pos#", "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs"=> "http://www.w3.org/2000/01/rdf-schema#" ], "format" => "n-quads" //Other possible values: n-quads, trig or turtle ]); $writer->addPrefix("ex","http://example.org/"); $writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test"); $writer->addTriple("schema:Person","schema:label","\"Person\"@en","http://example.org/#test"); $writer->addTriple("ex:1","dct:title","\"Person1\"@en","http://example.org/#test"); $writer->addTriple("ex:1","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","schema:Person","http://example.org/#test"); $writer->addTriple("ex:2","dct:title","\"Person2\"@en","http://example.org/#test"); $writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test2"); echo $writer->end();
所有方法
//The method names should speak for themselves: $writer = new TriGWriter(["prefixes": [ /* ... */]]); $writer->addTriple($subject, $predicate, $object, $graphl); $writer->addTriples($triples); $writer->addPrefix($prefix, $iri); $writer->addPrefixes($prefixes); //Creates blank node($predicate and/or $object are optional) $writer->blank($predicate, $object); //Creates rdf:list with $elements $list = $writer->addList($elements); //Returns the current output it is already able to create and clear the internal memory use (useful for streaming) $out .= $writer->read(); //Alternatively, you can listen for new chunks through a callback: $writer->setReadCallback(function ($output) { echo $output }); //Call this at the end. The return value will be the full triple output, or the rest of the output such as closing dots and brackets, unless a callback was set. $out .= $writer->end(); //OR $writer->end();
解析
除了 TriG,TriGParser类还解析 Turtle、N-Triples、N-Quads 和 W3C Team Submission N3
所有方法
$parser = new TriGParser($options, $tripleCallback, $prefixCallback); $parser->setTripleCallback($function); $parser->setPrefixCallback($function); $parser->parse($input, $tripleCallback, $prefixCallback); $parser->parseChunk($input); $parser->end();
小文件的简单示例
使用返回值并将其传递给写入器
use pietercolpaert\hardf\TriGParser; use pietercolpaert\hardf\TriGWriter; $parser = new TriGParser(["format" => "n-quads"]); //also parser n-triples, n3, turtle and trig. Format is optional $writer = new TriGWriter(); $triples = $parser->parse("<A> <B> <C> <G> ."); $writer->addTriples($triples); echo $writer->end();
使用回调并将其传递给写入器
$parser = new TriGParser(); $writer = new TriGWriter(["format"=>"trig"]); $parser->parse("<http://A> <https://B> <http://C> <http://G> . <A2> <https://B2> <http://C2> <http://G3> .", function ($e, $triple) use ($writer) { if (!isset($e) && isset($triple)) { $writer->addTriple($triple); echo $writer->read(); //write out what we have so far } else if (!isset($triple)) // flags the end of the file echo $writer->end(); //write the end else echo "Error occured: " . $e; });
使用块并保留前缀的示例
当你需要解析大文件时,你需要只解析块并处理它们。你可以这样做:
$writer = new TriGWriter(["format"=>"n-quads"]); $tripleCallback = function ($error, $triple) use ($writer) { if (isset($error)) throw $error; else if (isset($triple)) { $writer->write(); echo $writer->read(); else if (isset($error)) { throw $error; } else { echo $writer->end(); } }; $prefixCallback = function ($prefix, $iri) use (&$writer) { $writer->addPrefix($prefix, $iri); }; $parser = new TriGParser(["format" => "trig"], $tripleCallback, $prefixCallback); $parser->parseChunk($chunk); $parser->parseChunk($chunk); $parser->parseChunk($chunk); $parser->end(); //Needs to be called
解析器选项
format
输入格式(不区分大小写)blankNodePrefix
(默认为b0_
)强制应用于空白节点名称的前缀,例如TriGWriter(["blankNodePrefix" => 'foo'])
将解析_:bar
为_:foobar
。documentIRI
设置用于解析相对 URI 的基础 URI(如果format
指示 n-triples 或 n-quads,则不适用)lexer
允许使用自己的 lexer 类。Lexer 必须提供以下公共方法tokenize(string $input, bool $finalize = true): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
tokenizeChunk(string $input): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
end(): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
explicitQuantifiers
- [...]
空文档基础 IRI
某些 Turtle 和 N3 文档可能使用相对于基础 IRI 的 IRI 语法(见 此处 和 此处),例如
<> <someProperty> "some value" .
要正确解析此类文档,必须知道文档基础 IRI。否则,我们可能会得到空 IRI(例如,上面的示例中的主题)。
有时基础 IRI 编码在文档中,例如
@base <http://some.base/iri/> .
<> <someProperty> "some value" .
但有时它缺失。在这种情况下,Turtle 规范 要求我们遵循 RFC3986 的第 5.1.1 节,该规范表示如果基础 IRI 未包含在文档中,则应假定其为文档检索 URI(例如,您下载文档的 URL 或转换为 URL 的文件路径)。不幸的是,这不能由 hardf 解析器猜测,而必须由您使用 documentIRI
解析器创建选项提供,例如
parser = new TriGParser(["documentIRI" => "http://some.base/iri/"]);
简单来说,如果您遇到 subject/predicate/object on line X can not be parsed without knowing the the document base IRI.(...)
错误,请使用 documentIRI
选项初始化解析器。
实用工具
use pietercolpaert\hardf\Util;
一个静态类,包含一些用于处理我们的特定三元表示的有用函数。它将帮助您创建和评估字面量、IRI 和扩展前缀。
$bool = isIRI($term); $bool = isLiteral($term); $bool = isBlank($term); $bool = isDefaultGraph($term); $bool = inDefaultGraph($triple); $value = getLiteralValue($literal); $literalType = getLiteralType($literal); $lang = getLiteralLanguage($literal); $bool = isPrefixedName($term); $expanded = expandPrefixedName($prefixedName, $prefixes); $iri = createIRI($iri); $literalObject = createLiteral($value, $modifier = null);
有关更多信息,请参阅 https://github.com/RubenVerborgh/N3.js#utility 中的文档。
两个可执行文件
我们还在 bin/
中提供了两个简单工具作为示例实现:一个验证器和一个翻译器。例如,尝试
curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/validator.php trig curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/convert.php trig n-triples
性能
我们在两个 Turtle 文件上进行了性能比较,并使用 PHP 的 EasyRDF 库、NodeJS 的 N3.js 库和 Hardf 进行了解析。以下是结果
许可证、状态和贡献
hardf库由Ruben Verborgh和Pieter Colpaert版权所有,并按照MIT许可证发布。
欢迎贡献,错误报告或pull请求总是有帮助的。如果您计划实现一个更大的功能,最好是先通过提交一个问题来讨论。