inetprocess / neuralyzer
Requires
- php: >=7.2.5
- ext-pdo: *
- doctrine/dbal: ^2
- fakerphp/faker: ^1
- symfony/config: ^5
- symfony/console: ^5
- symfony/dependency-injection: ^5
- symfony/expression-language: ^5
- symfony/finder: ^5
- symfony/stopwatch: ^5
- symfony/yaml: ^5
Requires (Dev)
This package is not auto-updated.
Last update: 2022-02-01 12:54:32 UTC
README
edyan/neuralyzer
摘要
该项目是一个库和命令行工具,通过更新数据或生成虚假数据(更新与插入)来匿名化数据库。它使用 Faker 根据配置文件中定义的规则生成数据。
因为它可以逐行或使用批量机制,所以你可以加载包含数千万条虚假记录的表。
它使用 Doctrine DBAL 来抽象与数据库的交互。它应该能够与任何数据库类型一起工作。目前它已与 MySQL、PostgreSQL 和 SQLServer 完成测试。
Neuralyzer 有一个选项在启动匿名化之前通过注入一个带有 WHERE
条件的 DELETE FROM
来清理表格(请参阅配置参数 delete
和 delete_where
)。
Neuralyzer 以前有一个清理表格的选项,但现在由预操作和后操作管理。
entities: books: cols: title: { method: sentence, params: [8], unique: true } action: update pre_actions: - db.query("DELETE FROM books") post_actions: - db.query("DELETE FROM books WHERE title LIKE '%war%'")
作为库安装
composer require edyan/neuralyzer
作为可执行文件安装
您甚至可以直接下载可执行文件(以 v3.1 为例)
$ wget https://github.com/edyan/neuralyzer/raw/v4.0/neuralyzer.phar $ sudo mv neuralyzer.phar /usr/local/bin/neuralyzer $ sudo chmod +x /usr/local/bin/neuralyzer $ neuralyzer
用法
使用该工具的最简单方法是先从命令行工具开始。在克隆项目并运行 composer install
之后,尝试
$ bin/neuralyzer
自动生成配置
Neuralyzer 能够读取数据库并为您生成配置。命令 config:generate
接受以下选项
Options:
-D, --driver=DRIVER Driver (check Doctrine documentation to have the list) [default: "pdo_mysql"]
-H, --host=HOST Host [default: "127.0.0.1"]
-d, --db=DB Database Name
-u, --user=USER User Name [default: "www-data"]
-p, --password=PASSWORD Password (or it'll be prompted)
-f, --file=FILE File [default: "neuralyzer.yml"]
--protect Protect IDs and other fields
--ignore-table=IGNORE-TABLE Table to ignore. Can be repeated (multiple values allowed)
--ignore-field=IGNORE-FIELD Field to ignore. Regexp in the form "table.field". Can be repeated (multiple values allowed)
示例
bin/neuralyzer config:generate --db test_db -u root -p root --ignore-table config --ignore-field ".*\.id.*"
这会产生一个看起来像这样的文件
entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update # Will update existing data, "insert" would create new data pre_actions: { } post_actions: { } books: cols: name: { method: sentence, params: [8] } date_modified: { method: date, params: ['Y-m-d H:i:s', now] } action: update pre_actions: { } post_actions: { } guesser: Edyan\Neuralyzer\Guesser guesser_version: '3.0' language: en_US
您必须修改该文件以更改其配置。例如,如果您需要在匿名化时删除数据并更改语言(有关可用语言的说明,请参阅 Faker 的文档),请执行以下操作
# be careful that some languages have only a few methods. # Example : https://github.com/FakerPHP/Faker/tree/v1.14.1/src/Faker/Provider/fr_FR language: fr_FR
信息:您还可以在独立模式下使用删除,而不进行任何匿名化。这将删除 books 中的所有内容
entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update books: pre_actions: - db.query("DELETE FROM books")
如果您想删除所有内容然后插入 1000 本新书
guesser_version: '3.0' entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update books: cols: name: { method: sentence, params: [8] } action: insert pre_actions: - db.query("DELETE FROM books") limit: 1000
运行匿名化器
要运行匿名化器,命令简单为 "run",并期望
Options:
-D, --driver=DRIVER Driver (check Doctrine documentation to have the list) [default: "pdo_mysql"]
-H, --host=HOST Host [default: "127.0.0.1"]
-d, --db=DB Database Name
-u, --user=USER User Name [default: "www-data"]
-p, --password=PASSWORD Password (or prompted)
-c, --config=CONFIG Configuration File [default: "neuralyzer.yml"]
-t, --table=TABLE Do a single table
--pretend Don't run the queries
-s, --sql Display the SQL
-m, --mode=MODE Set the mode : batch or queries [default: "batch"]
示例
bin/neuralyzer run --db test_db -u root -p root
这会产生这种类型的输出
Anonymizing authors 2/2 [============================] 100% Queries: UPDATE authors SET first_name = 'Don', last_name = 'Wisoky' WHERE id = '1' UPDATE authors SET first_name = 'Sasha', last_name = 'Denesik' WHERE id = '2' ....
警告:在大型表中,使用 --sql
将产生巨大的输出。仅用于调试目的。
库
该库旨在与任何工具(如 CLI 工具)集成。它包含以下内容:
- 配置读取器和配置写入器
- 猜测器
- 数据库匿名化工具
猜测器
猜测器是配置生成器的核心组件。它根据字段名称或字段类型猜测应用哪种 faker 方法。
由于需要将其注入到写入器中,因此可以非常容易地进行扩展。
配置写入器
写入器有助于生成包含所有数据库表和字段的 YAML 文件。基本用法如下:
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => '127.0.0.1', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $writer = new \Edyan\Neuralyzer\Configuration\Writer; $data = $writer->generateConfFromDB($dbUtils, new \Edyan\Neuralyzer\Guesser); $writer->save($data, 'neuralyzer.yml');
如果需要,您可以使用正则表达式保护某些列或表。
<?php // ... $writer = new \Edyan\Neuralyzer\Configuration\Writer; $writer->protectCols(true); // will protect primary keys // define cols to protect (must be prefixed with the table name) $writer->setProtectedCols([ '.*\.id', '.*\..*_id', '.*\.date_modified', '.*\.date_entered', '.*\.date_created', '.*\.deleted', ]); // Define tables to ignore, also with regexp $writer->setIgnoredTables([ 'acl_.*', 'config', 'email_cache', ]); // Write the configuration $data = $writer->generateConfFromDB($dbUtils, new \Edyan\Neuralyzer\Guesser); $writer->save($data, 'neuralyzer.yml');
配置读取器
配置读取器与写入器正好相反。其主要工作是验证 YAML 文件的配置是否正确,然后提供访问其参数的方法。例如:
<?php require_once 'vendor/autoload.php'; // will throw an exception if it's not valid $reader = new Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml'); $tables = $reader->getEntities();
数据库匿名化工具
目前唯一可用的匿名化工具是数据库匿名化工具。它期望一个 PDO 对象和一个配置读取器对象。
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); $expression = $container->get('Edyan\Neuralyzer\Utils\Expression'); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => '127.0.0.1', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $db = new \Edyan\Neuralyzer\Anonymizer\DB($expression, $dbUtils); $db->setConfiguration( new \Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml') );
初始化后,匿名化表的以下方法是:
<?php public function processEntity(string $entity, callable $callback = null): array;
参数
Entity
:例如表名(必需)Callback
(可调用/可选)例如使用进度条等
可以通过调用以下方法设置一些选项:
<?php // Limit of fake generated records for updates and creates. // Default : 0 = everything to update / nothing to insert public function setLimit(int $limit); // Don't do anything, default true public function setPretend(bool $pretend); // Return or not a result, default false public function setReturnRes(bool $returnRes);
完整示例
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); $expression = $container->get('Edyan\Neuralyzer\Utils\Expression'); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => 'mysql', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $reader = new \Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml'); $db = new \Edyan\Neuralyzer\Anonymizer\DB($expression, $dbUtils); $db->setConfiguration($reader); $db->setPretend(false); // Get tables $tables = $reader->getEntities(); foreach ($tables as $table) { $total = $dbUtils->countResults($table); if ($total === 0) { fwrite(STDOUT, "$table is empty" . PHP_EOL); continue; } fwrite(STDOUT, "$table anonymized" . PHP_EOL); $db->processEntity($table); }
预操作和后操作
您可以为 pre_actions
和 post_actions
设置一个数组,这些操作将在 neuralyzer 开始匿名化实体之前和之后执行。
这些操作实际上是 symfony 表达式(请参阅 Symfony 表达式语言),它们依赖于 服务。这些服务从 Service/
目录加载。
目前只有一个服务:Database
,它包含一个可用的方法 query
,可按以下方式使用:db.query("DELETE FROM table")
。
配置参考
bin/neuralyzer config:example
提供了一个默认配置,其中所有参数都有解释
config: # Set the guesser class guesser: Edyan\Neuralyzer\Guesser # Set the version of the guesser the conf has been written with guesser_version: '3.0' # Faker's language, make sure all your methods have a translation language: en_US # List all entities, theirs cols and actions entities: # Required, Example: people # Prototype - # Either "update" or "insert" data action: update # Should we delete data with what is defined in "delete_where" ? delete: ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions) # Condition applied in a WHERE if delete is set to "true" delete_where: ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions), Example: '1 = 1' cols: # Examples: first_name: method: firstName last_name: method: lastName # Prototype - # Faker method to use, see doc : https://fakerphp.github.io/ method: ~ # Required # Set this option to true to generate unique values for that field (see faker->unique() generator) unique: false # Faker's parameters, see Faker's doc params: [] # Limit the number of written records (update or insert). 100 by default for insert limit: 0 # The list of expressions language actions to executed before neuralyzing. Be careful that "pretend" has no effect here. pre_actions: [] # The list of expressions language actions to executed after neuralyzing. Be careful that "pretend" has no effect here. post_actions: []
自定义应用程序逻辑
当使用自定义 doctrine 类型时,doctrine 将生成一个错误,表明该类型未知。这可以通过提供一个引导文件来注册自定义 doctrine 类型来解决。
bootstrap.php
<?php require_once '../vendor/autoload.php'; \Doctrine\DBAL\Types\Type::addType('custom_type', 'Namespace\Of\The\Custom\Type');
然后将引导文件提供给运行命令
bin/neuralyzer run --db test_db -u root -p root -b bootstrap.php
开发
Neuralyzer 使用 Robo 通过 Docker 运行其测试并构建其 phar。
克隆项目,运行 composer install
然后...
运行测试
- 如果因为数据库尚未就绪而有大量错误,请更改
--wait
选项。 - 更改
--php
选项为7.2
或7.4
- 如果想要禁用 PHPUnit 代码覆盖率,请设置
--no-coverage
。
与 MySQL 一起使用
$ vendor/bin/robo test --php 7.2 --wait 10 --db mysql --db-version 5 $ vendor/bin/robo test --php 7.3 --wait 10 --db mysql --db-version 8 $ vendor/bin/robo test --php 7.4 --wait 10 --db mysql --db-version 8 $ vendor/bin/robo test --php 8.0 --wait 10 --db mysql --db-version 8
支持 PostgreSQL 9、10 和 11(12 也支持)
$ vendor/bin/robo test --php 7.2 --wait 10 --db pgsql --db-version 10 $ vendor/bin/robo test --php 7.3 --wait 10 --db pgsql --db-version 11 $ vendor/bin/robo test --php 7.4 --wait 10 --db pgsql --db-version 12 $ vendor/bin/robo test --php 8.0 --wait 10 --db pgsql --db-version 13
支持 SQL Server
警告:由于 SQL Server ... 或 Doctrine / Dbal 的奇怪行为,2 个测试 失败。PHPUnit 无法比较 2 个数据集,因为字段顺序不一致。
$ vendor/bin/robo test --php 7.2 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 7.3 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 7.4 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 8.0 --wait 15 --db sqlsrv
构建发布版本(使用 phar 和 git 标签)
$ php -d phar.readonly=0 vendor/bin/robo release
仅构建 phar
$ php -d phar.readonly=0 vendor/bin/robo phar
使用 phpinsights 提高代码质量
docker run -it --rm -v $(pwd):/app nunomaduro/phpinsights analyse --fix
更新依赖项以确保与 PHP 7.2 兼容
vendor/bin/robo composer:update