frictionlessdata/tableschema

用于处理 Table Schema 的实用库

v1.2.1 2024-08-26 09:54 UTC

This package is auto-updated.

Last update: 2024-08-26 09:58:27 UTC


README

Tests Coveralls Scrutinizer-ci Packagist SemVer Codebase Support

这是一个用于在 PHP 中处理 Table Schema 的实用库。

功能概述和用法指南

安装

$ composer require frictionlessdata/tableschema

Table 类允许遍历符合表架构的数据

基于数据源和表架构实例化 Table 对象。

use frictionlessdata\tableschema\Table;

$table = new Table("tests/fixtures/data.csv", ["fields" => [
    ["name" => "first_name"],
    ["name" => "last_name"],
    ["name" => "order"]
]]);

Schema 可以是 Schema 对象有效的任何参数(见下文),因此您可以使用包含架构的 URL 或文件名

$table = new Table("tests/fixtures/data.csv", "tests/fixtures/data.json");

遍历数据,所有值都根据架构进行转换和验证

foreach ($table as $row) {
    print($row["order"]." ".$row["first_name"]." ".$row["last_name"]."\n");
};

validate 函数将验证架构并获得一些数据样本以验证它

Table::validate(new CsvDataSource("http://invalid.data.source/"), $schema);

您也可以不提供架构而实例化一个表对象,在这种情况下,架构将根据数据自动推断

$table = new Table("tests/fixtures/data.csv");
$table->schema()->fields();  // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]

可选,指定一个 CSV Dialect

$table = new Table("tests/fixtures/data.csv", null, ["delimiter" => ";"]);

Table::read 方法允许获取所有数据作为数组,它还支持修改读取器行为的选项

$table->read()  // returns all the data as an array

read 接受一个选项参数,例如

$table->read(["cast" => false, "limit": 5])

以下选项可用(值是默认值)

$table->read([
    "keyed" => true,  // flag to emit keyed rows
    "extended" => false,  // flag to emit extended rows
    "cast" => true,  //flag to disable data casting if false
    "limit" => null,  // integer limit of rows to return
]);

其他方法和功能

$table->headers()  // ["first_name", "last_name", "order"]
$table->save("output.csv")  // iterate over all the rows and save the to a csv file
$table->schema()  // get the Schema object
$table->read()  // returns all the data as an array

Schema

Schema 类提供用于处理表架构和相关数据的有用方法。

use frictionlessdata\tableschema\Schema;

Schema 对象可以使用以下任一方式构造

  • php 数组(或对象)
$schema = new Schema([
    'fields' => [
        [
            'name' => 'id', 'title' => 'Identifier', 'type' => 'integer', 
            'constraints' => [
                "required" => true,
                "minimum" => 1,
                "maximum" => 500
            ]
        ],
        ['name' => 'name', 'title' => 'Name', 'type' => 'string'],
    ],
    'primaryKey' => 'id'
]);
  • 包含 json 的字符串
$schema = new Schema("{
    \"fields\": [
        {\"name\": \"id\"},
        {\"name\": \"height\", \"type\": \"integer\"}
    ]
}");
$schema = new Schema("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/ecf1b2504332852cca1351657279901eca6fdbb5/datasets/synthetic/schema.json");

架构已加载、解析和验证,如有任何问题将引发异常。

访问符合规范的架构数据

$schema->missingValues(); // [""]
$schema->primaryKey();  // ["id"]
$schema->foreignKeys();  // []
$schema->fields(); // ["id" => IntegerField, "name" => StringField]
$field = $schema->field("id");  // Field object (See Field reference below)

validate 函数接受与 Schema 构造函数相同的参数,但返回错误列表而不是引发异常

// validate functions accepts the same arguments as the Schema constructor
$validationErrors = Schema::validate("http://invalid.schema.json");
foreach ($validationErrors as $validationError) {
    print(validationError->getMessage();
};

根据架构验证和转换数据行

$row = $schema->castRow(["id" => "1", "name" => "First Name"]);

如果行验证失败,将引发异常

返回包含所有原生值的行

$row  // ["id" => 1, "name" => "First Name"];

验证行以获取错误列表

$schema->validateRow(["id" => "foobar"]);  // ["id is not numeric", "name is required" .. ]

根据源数据推断架构

$schema = Schema::infer("tests/fixtures/data.csv");
$table->schema()->fields();  // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]

您还可以创建一个新的空架构进行编辑

$schema = new Schema();

设置字段

$schema->fields([
    "id" => (object)["type" => "integer"],
    "name" => (object)["type" => "string"],
]);

根据给定的描述符(见下文 Field 类参考)创建适当的 Field 对象

$schema->field("id");  // IntegerField object

添加/更新或删除字段

$schema->field("email", ["type" => "string", "format" => "email"]);
$schema->field("name", ["type" => "string"]);
$schema->removeField("name");

设置或更新其他表架构属性

$schema->primaryKey(["id"]);

每次更改后 - 架构将进行验证,并在出现验证错误时引发异常

最后,您可以得到完整的验证描述符

$schema->fullDescriptor();

然后将其保存到 json 文件中

$schema->save("my-schema.json");

Field

Field 类代表单个表架构字段描述符

从描述符创建字段

use frictionlessdata\tableschema\Fields\FieldsFactory;
$field = FieldsFactory::field([
    "name" => "id", "type" => "integer",
    "constraints" => ["required" => true, "minimum" => 5]
]);

使用字段进行转换和验证值

$field->castValue("3");  // exception: value is below minimum
$field->castValue("7");  // 7

其他方法用于访问字段数据

$field("id")->format();  // "default"
$field("id")->name();  // "id"
$field("id")->type(); // "integer"
$field("id")->constraints();  // (object)["required"=>true, "minimum"=>1, "maximum"=>500]
$field("id")->enum();  // []
$field("id")->required();  // true
$field("id")->unique();  // false
$field("id")->title();  // "Id" (or null if not provided in descriptor)
$field("id")->description();  // "The ID" (or null if not provided in descriptor)
$field("id")->rdfType();  // "http://schema.org/Thing" (or null if not provided in descriptor)

贡献

请阅读贡献指南: 如何贡献