README

hardf 是一个PHP 7.1+库，允许您处理链接数据（RDF）。它提供

解析从 Turtle、TriG、N-Triples、N-Quads 和 Notation3 (N3) 的三元组/四元组
写入三元组/四元组到 Turtle、TriG、N-Triples 和 N-Quads

解析器和序列化器都支持 流式处理。

这个库是将 N3.js 移植到PHP的版本。

三元组表示

我们使用从NodeJS N3.js库移植到PHP的三元组表示。有关更多信息，请参阅 https://github.com/rdfjs/N3.js/tree/v0.10.0#triple-representation

出于性能考虑，我们没有关注开发友好性。因此，我们使用关联数组而不是PHP对象来实现这个三元组表示。因此，与N3.js相同，现在是一个数组。例如。

<?php
$triple = [
    'subject' =>   'http://example.org/cartoons#Tom',
    'predicate' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
    'object' =>    'http://example.org/cartoons#Cat',
    'graph' =>     'http://example.org/mycartoon', #optional
    ];

按照以下方式编码字面量（类似于N3.js）

'"Tom"@en-gb' // lowercase language
'"1"^^http://www.w3.org/2001/XMLSchema#integer' // no angular brackets <>

库函数

使用 composer 安装此库

composer require pietercolpaert/hardf

写入

use pietercolpaert\hardf\TriGWriter;

一个应该被实例化并且可以写入TriG或Turtle的类

示例使用

$writer = new TriGWriter([
    "prefixes" => [
        "schema" =>"http://schema.org/",
        "dct" =>"http://purl.org/dc/terms/",
        "geo" =>"http://www.w3.org/2003/01/geo/wgs84_pos#",
        "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs"=> "http://www.w3.org/2000/01/rdf-schema#"
        ],
    "format" => "n-quads" //Other possible values: n-quads, trig or turtle
]);

$writer->addPrefix("ex","http://example.org/");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","schema:label","\"Person\"@en","http://example.org/#test");
$writer->addTriple("ex:1","dct:title","\"Person1\"@en","http://example.org/#test");
$writer->addTriple("ex:1","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","schema:Person","http://example.org/#test");
$writer->addTriple("ex:2","dct:title","\"Person2\"@en","http://example.org/#test");
$writer->addTriple("schema:Person","dct:title","\"Person\"@en","http://example.org/#test2");
echo $writer->end();

所有方法

//The method names should speak for themselves:
$writer = new TriGWriter(["prefixes": [ /* ... */]]);
$writer->addTriple($subject, $predicate, $object, $graphl);
$writer->addTriples($triples);
$writer->addPrefix($prefix, $iri);
$writer->addPrefixes($prefixes);
//Creates blank node($predicate and/or $object are optional)
$writer->blank($predicate, $object);
//Creates rdf:list with $elements
$list = $writer->addList($elements);

//Returns the current output it is already able to create and clear the internal memory use (useful for streaming)
$out .= $writer->read();
//Alternatively, you can listen for new chunks through a callback:
$writer->setReadCallback(function ($output) { echo $output });

//Call this at the end. The return value will be the full triple output, or the rest of the output such as closing dots and brackets, unless a callback was set.
$out .= $writer->end();
//OR
$writer->end();

解析

除了 TriG，TriGParser类还解析 Turtle、N-Triples、N-Quads 和 W3C Team Submission N3

所有方法

$parser = new TriGParser($options, $tripleCallback, $prefixCallback);
$parser->setTripleCallback($function);
$parser->setPrefixCallback($function);
$parser->parse($input, $tripleCallback, $prefixCallback);
$parser->parseChunk($input);
$parser->end();

小文件的简单示例

使用返回值并将其传递给写入器

use pietercolpaert\hardf\TriGParser;
use pietercolpaert\hardf\TriGWriter;
$parser = new TriGParser(["format" => "n-quads"]); //also parser n-triples, n3, turtle and trig. Format is optional
$writer = new TriGWriter();
$triples = $parser->parse("<A> <B> <C> <G> .");
$writer->addTriples($triples);
echo $writer->end();

使用回调并将其传递给写入器

$parser = new TriGParser();
$writer = new TriGWriter(["format"=>"trig"]);
$parser->parse("<http://A> <https://B> <http://C> <http://G> . <A2> <https://B2> <http://C2> <http://G3> .", function ($e, $triple) use ($writer) {
    if (!isset($e) && isset($triple)) {
        $writer->addTriple($triple);
        echo $writer->read(); //write out what we have so far
    } else if (!isset($triple))      // flags the end of the file
        echo $writer->end();  //write the end
    else
        echo "Error occured: " . $e;
});

使用块并保留前缀的示例

当你需要解析大文件时，你需要只解析块并处理它们。你可以这样做：

$writer = new TriGWriter(["format"=>"n-quads"]);
$tripleCallback = function ($error, $triple) use ($writer) {
    if (isset($error))
        throw $error;
    else if (isset($triple)) {
        $writer->write();
        echo $writer->read();
    else if (isset($error)) {
        throw $error;
    } else {
        echo $writer->end();
    }
};
$prefixCallback = function ($prefix, $iri) use (&$writer) {
    $writer->addPrefix($prefix, $iri);
};
$parser = new TriGParser(["format" => "trig"], $tripleCallback, $prefixCallback);
$parser->parseChunk($chunk);
$parser->parseChunk($chunk);
$parser->parseChunk($chunk);
$parser->end(); //Needs to be called

解析器选项

format 输入格式（不区分大小写）
- 如果没有提供或与以下任何选项不匹配，则可以解析任何 Turtle、TriG、N-Triples 或 N-Quads 输入（但不能是 N3）
- turtle - Turtle
- trig - TriG
- 包含 triple，例如 triple、ntriples、N-Triples - N-Triples
- 包含 quad，例如 quad、nquads、N-Quads - N-Quads
- 包含 n3，例如 n3 - N3
blankNodePrefix（默认为 b0_）强制应用于空白节点名称的前缀，例如 TriGWriter(["blankNodePrefix" => 'foo']) 将解析 _:bar 为 _:foobar。
documentIRI 设置用于解析相对 URI 的基础 URI（如果 format 指示 n-triples 或 n-quads，则不适用）
lexer 允许使用自己的 lexer 类。Lexer 必须提供以下公共方法
- tokenize(string $input, bool $finalize = true): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
- tokenizeChunk(string $input): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
- end(): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
explicitQuantifiers - [...]

空文档基础 IRI

某些 Turtle 和 N3 文档可能使用相对于基础 IRI 的 IRI 语法（见此处和此处），例如

<> <someProperty> "some value" .

要正确解析此类文档，必须知道文档基础 IRI。否则，我们可能会得到空 IRI（例如，上面的示例中的主题）。

有时基础 IRI 编码在文档中，例如

@base <http://some.base/iri/> .
<> <someProperty> "some value" .

但有时它缺失。在这种情况下，Turtle 规范要求我们遵循 RFC3986 的第 5.1.1 节，该规范表示如果基础 IRI 未包含在文档中，则应假定其为文档检索 URI（例如，您下载文档的 URL 或转换为 URL 的文件路径）。不幸的是，这不能由 hardf 解析器猜测，而必须由您使用 documentIRI 解析器创建选项提供，例如

parser = new TriGParser(["documentIRI" => "http://some.base/iri/"]);

简单来说，如果您遇到 subject/predicate/object on line X can not be parsed without knowing the the document base IRI.(...) 错误，请使用 documentIRI 选项初始化解析器。

实用工具

use pietercolpaert\hardf\Util;

一个静态类，包含一些用于处理我们的特定三元表示的有用函数。它将帮助您创建和评估字面量、IRI 和扩展前缀。

$bool = isIRI($term);
$bool = isLiteral($term);
$bool = isBlank($term);
$bool = isDefaultGraph($term);
$bool = inDefaultGraph($triple);
$value = getLiteralValue($literal);
$literalType = getLiteralType($literal);
$lang = getLiteralLanguage($literal);
$bool = isPrefixedName($term);
$expanded = expandPrefixedName($prefixedName, $prefixes);
$iri = createIRI($iri);
$literalObject = createLiteral($value, $modifier = null);

有关更多信息，请参阅 https://github.com/RubenVerborgh/N3.js#utility 中的文档。

两个可执行文件

我们还在 bin/ 中提供了两个简单工具作为示例实现：一个验证器和一个翻译器。例如，尝试

curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/validator.php trig
curl -H "accept: application/trig" http://fragments.dbpedia.org/2015/en | php bin/convert.php trig n-triples

性能

我们在两个 Turtle 文件上进行了性能比较，并使用 PHP 的 EasyRDF 库、NodeJS 的 N3.js 库和 Hardf 进行了解析。以下是结果

许可证、状态和贡献

欢迎贡献，错误报告或pull请求总是有帮助的。如果您计划实现一个更大的功能，最好是先通过提交一个问题来讨论。

pietercolpaert / hardf

维护者

详细信息

README

三元组表示

库函数

写入

所有方法

解析

所有方法

小文件的简单示例

使用块并保留前缀的示例

解析器选项

空文档基础 IRI

实用工具

两个可执行文件

性能

许可证、状态和贡献