prewk/snapper-php

匿名化(反)序列化

3.2.6 2018-02-25 17:14 UTC

README

将关系型数据库中的行转换为序列化快照,这些快照可以在稍后时间反序列化成新的行,同时保留复杂的关系。

示例

从数据库到序列化器

// Parent table "parents"
[
  "id" => 1,
  "name" => "The parent",
  "favorite_child" => 2 // Note: Circular dependency
],
// Child table "children"
[
  "id" => 1,
  "parent_id" => 1,
  "description" => "I'm child A"
],
// Child table "children"
[
  "id" => 2,
  "parent_id" => 1, // Note: Circular dependency
  "description" => "I'm child B"
]

从序列化器出来

[
  "op" => "INSERT",
  "type" => "parents",
  "rows" => [ 
    [
      "id" => "dee78c67-7c0b-4750-9f44-414f5a45006f",
      "name" => "The parent",
      "favorite_child" => null
    ]
  ]
],
[
  "op" => "INSERT",
  "type" => "children",
  "rows" => [
    [
      "id" => "3d228dca-11e2-43ec-bb03-ea4dace489f7",
      "parent_id" => "dee78c67-7c0b-4750-9f44-414f5a45006f",
      "description" => "I'm child A"
    ],
    [
      "id" => "9eebf63a-69a5-42c7-b1fb-81a5e2058ec9",
      "parent_id" => "dee78c67-7c0b-4750-9f44-414f5a45006f",
      "description" => "I'm child B"
    ]    
  ]
],
[
  "op" => "UPDATE",
  "type" => "parents",
  "rows" => [
    [
      "id" => "dee78c67-7c0b-4750-9f44-414f5a45006f",
      "favorite_child" => "9eebf63a-69a5-42c7-b1fb-81a5e2058ec9"
    ]
  ]
]

再次反序列化到数据库

INSERT INTO parents (name, favorite_child) VALUES ("The parent", NULL); (LAST_INSERT_ID 555)
INSERT INTO children (parent_id, description) VALUES (555, "I'm child A"); (LAST_INSERT_ID 333)
INSERT INTO children (parent_id, description) VALUES (555, "I'm child B"); (LAST_INSERT_ID 334)
UPDATE parents SET favorite_child = 334 WHERE id = 555;

如何序列化?

<?php
use Prewk\Snapper;

// Define a recipe defining the fields and their references
$r = new Snapper\Recipe;

$recipe = [
  "parents" => $r
    ->primary("id") // Primary key at field "id"
    ->ingredients([
      "name" => $r->value(), // Field "name" is just a value
      // Field "favorite_child" has a circular dependency to the "children" table
      "favorite_child" => $r->circular(
        // Define the relationship and values considered "no relationship"
        $r->ref("children")->optional(null),
        // Fallback to that value until the circular relationship can be resolved
        $r->raw(null)
      ),
    ]),
  "children" => $r
    ->primary("id") // Primary key at field "id"
    ->ingredients([
      "parent_id" => $r->ref("parents"), // Field "parent_id" is referencing the "parents" table
      "description" => $r->value() // Field "description" is just a value
    ])
];

// Create a serializer
$serializer = new Snapper\Serializer(
  new Snapper\Sorter,
  new Snapper\Serializer\SerializationBookKeeper,
  $recipe
);

// Feed the serializer with database rows 
$serializer->add("parents", [
  "id" => 1,
  "name" => "The parent",
  "favorite_child" => 2
]);
$serializer->add("children", [
  "id" => 1,
  "parent_id" => 1,
  "description" => "I'm child A"
]);
$serializer->add("children", [
  "id" => 2,
  "parent_id" => 1,
  "description" => "I'm child B"
]);

// Serialize into a snapshot
$serialization = $serializer->compile()->getOps();

如何反序列化?

<?php
use Prewk\Snapper;

// $recipe = <Same as above>
// $serialization = <The snapshot>
// $dbh = <PDO handle>

// Create inserters
$inserters = [
  "parents" => function(array $rows) use ($dbh) {
    $ids = [];
    
    foreach ($rows as $row) {
      $stmt = $dbh->prepare("INSERT INTO parents (name, favorite_child) VALUES (:name, :favorite_child)");
      $stmt->execute([
        ":name" => $row["name"],
        ":favorite_child" => $row["favorite_child"]
      ]);
    
      $ids[] = $dbh->lastInsertId();        
    }
    
    return $ids;
  },
  "children" => function(array $rows) use ($dbh) {
    $ids = [];
    
    foreach ($rows as $row) {
      $stmt = $dbh->prepare("INSERT INTO children (parent_id, description) VALUES (:parent_id, :description)");
      $stmt->execute([
        ":parent_id" => $row["parent_id"],
        ":description" => $row["description"]
      ]);
    
      $ids[] = $dbh->lastInsertId();
    }
    
    return $ids;
  }
];

// Create updaters
$updaters = [
  "parents" => function(array $rows) use ($dbh) {
    foreach ($rows as $row) {
      $stmt = $dbh->prepare("UPDATE parents SET favorite_child=:favorite_child WHERE id=:id");
      $stmt->execute([
        ":id" => $row["id"],
        ":favorite_child" => $row["favorite_child"]
      ]);
    }    
  },
  "children" => null, // Won't be called in this example
];

// Create a deserializer
$deserializer = new Snapper\Deserializer(
  new Snapper\DeserializationBookKeeper,
  $recipes,
  $inserters,
  $updaters
);

// Deserialize
$deserializer->deserialize($serialization);

成分类型

<?php
[
  "foo" => $recipe->value()
]

通过字段的值。

原始

<?php
[
  "foo" => $recipe->raw(123)
]

强制字段的值。

引用

<?php
[
  "foo_id" => $recipe->ref("foos"),
  "bar_id" => $recipe->ref("bars")->optional(0, null)
]

字段是外键,引用另一个表。使用 optional 方法传递参数,当与遇到的字段值相等时,被视为非引用,并且像 Value 一样传递。

形态

<?php
[
  "foo_type" => $recipe->value(),
  "foo_id" => $recipe->morph("foo_type", function(\Prewk\Snapper\Ingredients\Morph\MorphMapper $mapper) {
    return $mapper
      ->on("FOO", "foos")
      ->on("BAR", "bars");
  })->optional(null)
]

使用两个字段指定多态关系,一个类型(foo_type)和一个id(foo_id),并将 foo_type 中的值映射到其他表中。

需要两个字段都存在才能正常工作,支持可选值,当匹配时忽略外键关系并直接传递值。

匹配

<?php
[
  "type" => $recipe->value(),
  "varies" => $recipe->match("type", function(\Prewk\Snapper\Ingredients\Match\MatchMapper $mapper) us ($recipe) {
    return $mapper
      ->on("FOO", $recipe->ref("foos"))
      ->pattern("/BAR/", $recipe->ref("bars"))
      ->default($recipe->value());
  })
]

查看给定的字段(例如上例中的 type),并根据其值变成不同的成分。

  • on: 精确匹配
  • pattern: 正则表达式匹配
  • default: 回退

如果没有任何匹配成功且没有提供 default,则该字段将不会被包含。

JSON

<?php
use \Prewk\Snapper\Ingredients\Json;

[
  "data" => $recipe->json(function(Json\JsonRecipe $json) {
    return $json
      // Match { "foo": { "bar": { "baz": <value> } } }
      ->path("foo.bar.baz", function(Json\MatchedJson $matched) {
        return $matched
          ->ref("bazes")->optional(null, 0); // Treat null and 0 as value instead of reference
      })
      // Match { "quxes": [<value>, <value>, <value>, <value>] }
      ->pattern("/quxes\\.\\d+$/", function(Json\MatchedJson $matched) {
        return $matched
          ->ref("quxes");
      })
      // Match { "content": <value> }
      ->path("content", function(Json\MatchedJson $matched) {
        return $matched
          // Match { "content": "Lorem ipsum qux:=123= dolor qux:=456= amet" }
          ->pattern("qux:=(.*?)=", function(
            Json\PatternReplacer $replacer,
            string $replacement
          ) {
            // Here we tell the recipe about what references we found and
            // teach it to search and replace them
            return $replacer->replace(
              "quxes",
              1, // Refers to the index of the resulting preg, so: $matches[1]
              "qux:=$replacement="
            );
          });
      });
  })
]

定义嵌套在JSON中的引用。

在JSON中的字符串和整数引用上

此库是为正常的整数键(AUTO_INCREMENT)情况构建的,但在序列化引用时,它们将被转换为UUIDs(v4)。它们看起来像这样:a0ff60f5-87fe-4d4e-855b-8993f1c3b065

这会在JSON中引起问题...

{ "foo_id": 123 }

...被序列化成...

{ "foo_id": "a0ff60f5-87fe-4d4e-855b-8993f1c3b065" }

...再回到整数键

{ "foo_id": "456" }

"456" 并不严格等于 456。因此,反序列化逻辑如下

  1. 如果从插入器返回的id 是数字,则所有 "UUID" 将被替换为 INSERT_ID
  2. 所有 UUID 将被替换为 INSERT_ID

循环

<?php
[
  "foo_id" => $recipe->circular(
    $recipe->ref("foos")->optional(0),
    $recipe->raw(0)
  )
]

如果您的两个表互相包含循环引用,请将其中一个引用包裹在一个 Circular 成分中。指定一个有效的回退作为可选值,并将其指定为回退 Raw 成分。

结果序列化将以包含回退值的 INSERT 操作开始,并以包含实际引用的 UPDATE 操作结束。

事件

onDeps

<?php
$deserializer->onDeps("foos", function(string $dependeeType, $dependeeId, $dependencyId) {
  // Every time a dependency of type "foos" has been deserialized, this closure will be called
});

覆盖配方/更新器/插入器

<?php

$serializer
  ->setRecipe("foos", $fooRecipe)
  ->setRecipe("bars", $barRecipe);

$deserializer
  ->setRecipe("foos", $fooRecipe)
  ->setInserter("foos", $fooInserter)
  ->setUpdater("foos", $fooUpdater);

配方JSON

可以将配方转换为/从JSON转换,这可能很有用。

<?php
use Prewk\Snapper;

$someRecipe = $r->primary("id")->ingredients(["name" => $r->value()]);

$json = json_encode($someRecipe);

file_put_contents("recipe.json", $json);

$json = file_get_contents("recipe.json");

// Note: decode to associative array
$someRecipe = Snapper\Recipe::fromArray(json_decode($json, true));

验证

验证序列化

检查序列化中的不可解析的引用

<?php
use Prewk\Snapper;

$validator = new Snapper\Validator(new DeserializationBookKeeper, $recipes);

$isValid = $validator->validate($serialization);

验证JSON配方

可以使用JSON模式验证器验证配方。

<?php
use Prewk\Snapper;
use JsonSchema\Validator as JsonValidator; // https://github.com/justinrainbow/json-schema

$validator = new Snapper\SchemaValidator(new JsonValidator);

$json = file_get_contents("recipe.json");

// Note: don't decode to associative array
$validator->validate(json_decode($json));

批量插入/更新

如果需要优化,则将调用插入/更新闭包,以批处理行形式,每行可以在一个SQL操作中执行。

<?php
$inserters = [
  "foos" => function(array $rows) use ($db) {
    $allValues = [];
    $vars = [];
    foreach ($rows as $index => $row) {
      $values = [];

      foreach ($row as $field => $value) {
        $vars[":" . $field . "_" . $index] = $value;
        $values[] = ":" . $field . "_" . $index;
      }

      $allValues[] = "(" . implode(", ", $values) . ")";
    }
    
    /*
     * $rows = [
     *   ["some_field" => "foo", "another_field" => "bar"],
     *   ["some_field" => "baz", "another_field" => "qux"],
     *   ["some_field" => "lorem", "another_field" => "ipsum"]
     * ]
     * 
     * -->
     * 
     * INSERT INTO foos (some_field, another_field) VALUES
     *   ("foo", "bar"),
     *   ("baz", "qux")
     *   ("lorem", "ipsum")
     */
    $insert = "INSERT INTO foos (some_field, another_field) VALUES " . implode(", ", $allValues);
    $stmt = $db->prepare($insert);
    $stmt->execute($vars);
    
    $lastId = $db->lastInsertId();
    
    // If last insert id is 666, then return [664, 665, 666] 
    return range($lastId - count($rows) + 1, $lastId);
  },
];

以下规则适用于插入器的返回值

  • 返回void是可以接受的,但如果后续行依赖于跳过的主键,则序列化将失败
  • 要返回主键,请返回与 $rows 相同长度的数组,其他任何内容都是无效的

批量分组逻辑将以下条件之一视为“开始新的操作批处理”

  • 遇到新的行类型(表)
  • 该行依赖于同一批处理中较早行的主键
  • 字段的确切数量或名称在每一行之间都发生了变化

Id清单

编译的结果($serializer->compile())是一个Serialization。它有两个方法

  • getOps() - 获取一个表示操作序列的数组(如果你愿意,就是实际的“序列化”)
  • getIdManifest() - 获取一个字典,将序列化中的内部uuid映射到给定的数据库id,按类型(表名)分组

为什么?

这个库对于提供多租户服务的快照功能很有用,其中用户拥有一组或多组具有复杂内部关系的复杂数据。

操作指南

我不想创建表X,因为它已经存在

反序列化器不关心插入器闭包的内部逻辑,它只要求你返回主键

$somethingId = 123; // Predetermined

$deserializer->setInserter("something", function(array $row) use ($somethingId) {
  // Maybe you don't want to do anything to the database here or maybe
  // you want to do an UPDATE instead of an INSERT - up to you
  
  return $somethingId;
});

我需要序列化行元数据

只需在配方中使用$recipe->value(),并将任何你想要的字段输入Serializer,它最终会出现在快照中。

在反序列化时,在你的插入器中选择要插入数据库的字段。

我有复合键

目前不支持,一种解决方案是添加一个唯一的id,并假装它是主键。

许可证

MIT