edyan / neuralyzer
数据匿名化库和CLI工具
Requires
- php: >=7.2.5
- ext-pdo: *
- doctrine/dbal: ^2
- fakerphp/faker: ^1
- symfony/config: ^5
- symfony/console: ^5
- symfony/dependency-injection: ^5
- symfony/expression-language: ^5
- symfony/finder: ^5
- symfony/stopwatch: ^5
- symfony/yaml: ^5
Requires (Dev)
This package is not auto-updated.
Last update: 2024-09-13 10:03:16 UTC
README
edyan/neuralyzer
摘要
此项目是一个库和命令行工具,通过更新数据或生成虚假数据(更新与插入)来对数据库进行匿名化。它使用Faker根据配置文件中定义的规则生成数据。
由于其可以逐行处理或使用批量机制,您可以用数亿条虚假记录加载表。
它使用Doctrine DBAL来抽象数据库交互。它应该能够与任何数据库类型一起工作。目前它已与MySQL、PostgreSQL和SQLServer进行了广泛测试。
Neuralyzer有一个选项在启动匿名化之前通过注入一个带有WHERE
条件的DELETE FROM
来清理表(请参阅配置参数delete
和delete_where
)。
Neuralyzer曾经有一个清理表的选项,但现在它由预和后操作管理
entities: books: cols: title: { method: sentence, params: [8], unique: true } action: update pre_actions: - db.query("DELETE FROM books") post_actions: - db.query("DELETE FROM books WHERE title LIKE '%war%'")
作为库安装
composer require edyan/neuralyzer
作为可执行文件安装
您甚至可以直接下载可执行文件(以v3.1为例)
$ wget https://github.com/edyan/neuralyzer/raw/v4.0/neuralyzer.phar $ sudo mv neuralyzer.phar /usr/local/bin/neuralyzer $ sudo chmod +x /usr/local/bin/neuralyzer $ neuralyzer
用法
使用该工具的最简单方法是先从命令行工具开始。在克隆项目并运行composer install
之后,尝试以下操作:
$ bin/neuralyzer
自动生成配置
Neuralyzer能够读取数据库并为您生成配置。命令config:generate
接受以下选项
Options:
-D, --driver=DRIVER Driver (check Doctrine documentation to have the list) [default: "pdo_mysql"]
-H, --host=HOST Host [default: "127.0.0.1"]
-d, --db=DB Database Name
-u, --user=USER User Name [default: "www-data"]
-p, --password=PASSWORD Password (or it'll be prompted)
-f, --file=FILE File [default: "neuralyzer.yml"]
--protect Protect IDs and other fields
--ignore-table=IGNORE-TABLE Table to ignore. Can be repeated (multiple values allowed)
--ignore-field=IGNORE-FIELD Field to ignore. Regexp in the form "table.field". Can be repeated (multiple values allowed)
示例
bin/neuralyzer config:generate --db test_db -u root -p root --ignore-table config --ignore-field ".*\.id.*"
这会生成一个看起来像的文件
entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update # Will update existing data, "insert" would create new data pre_actions: { } post_actions: { } books: cols: name: { method: sentence, params: [8] } date_modified: { method: date, params: ['Y-m-d H:i:s', now] } action: update pre_actions: { } post_actions: { } guesser: Edyan\Neuralyzer\Guesser guesser_version: '3.0' language: en_US
您需要修改该文件以更改其配置。例如,如果您需要在匿名化时删除数据并更改语言(请参阅Faker的文档以获取可用的语言),则执行以下操作:
# be careful that some languages have only a few methods. # Example : https://github.com/FakerPHP/Faker/tree/v1.14.1/src/Faker/Provider/fr_FR language: fr_FR
INFO:您也可以在不进行任何匿名化的情况下单独使用删除。这将删除books中的所有内容
entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update books: pre_actions: - db.query("DELETE FROM books")
如果您想删除所有内容然后插入1000本新书
guesser_version: '3.0' entities: authors: cols: first_name: { method: firstName, unique: false } last_name: { method: lastName, unique: false } action: update books: cols: name: { method: sentence, params: [8] } action: insert pre_actions: - db.query("DELETE FROM books") limit: 1000
运行匿名化器
要运行匿名化器,命令很简单为“run”,并期望以下内容:
Options:
-D, --driver=DRIVER Driver (check Doctrine documentation to have the list) [default: "pdo_mysql"]
-H, --host=HOST Host [default: "127.0.0.1"]
-d, --db=DB Database Name
-u, --user=USER User Name [default: "www-data"]
-p, --password=PASSWORD Password (or prompted)
-c, --config=CONFIG Configuration File [default: "neuralyzer.yml"]
-t, --table=TABLE Do a single table
--pretend Don't run the queries
-s, --sql Display the SQL
-m, --mode=MODE Set the mode : batch or queries [default: "batch"]
示例
bin/neuralyzer run --db test_db -u root -p root
这将产生这种类型的输出
Anonymizing authors 2/2 [============================] 100% Queries: UPDATE authors SET first_name = 'Don', last_name = 'Wisoky' WHERE id = '1' UPDATE authors SET first_name = 'Sasha', last_name = 'Denesik' WHERE id = '2' ....
WARNING:在大型表中,--sql
将产生非常大的输出。请仅用于调试目的。
库
此库旨在与任何工具集成,例如CLI工具。它包含
- 配置读取器和配置写入器
- 猜测器
- 数据库匿名化器
猜测器
猜测器是配置生成器的核心部分。它根据字段名称或字段类型猜测要应用的Faker方法类型。
它可以非常容易地进行扩展,因为它需要注入到写入器中。
配置写入器
写入器有助于生成一个包含所有表和字段的yaml文件。基本用法可能如下所示
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => '127.0.0.1', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $writer = new \Edyan\Neuralyzer\Configuration\Writer; $data = $writer->generateConfFromDB($dbUtils, new \Edyan\Neuralyzer\Guesser); $writer->save($data, 'neuralyzer.yml');
如果您需要,可以保护某些列(使用正则表达式)或表
<?php // ... $writer = new \Edyan\Neuralyzer\Configuration\Writer; $writer->protectCols(true); // will protect primary keys // define cols to protect (must be prefixed with the table name) $writer->setProtectedCols([ '.*\.id', '.*\..*_id', '.*\.date_modified', '.*\.date_entered', '.*\.date_created', '.*\.deleted', ]); // Define tables to ignore, also with regexp $writer->setIgnoredTables([ 'acl_.*', 'config', 'email_cache', ]); // Write the configuration $data = $writer->generateConfFromDB($dbUtils, new \Edyan\Neuralyzer\Guesser); $writer->save($data, 'neuralyzer.yml');
配置读取器
配置读取器与写入器正好相反。其主要任务是验证yaml文件的配置是否正确,然后提供访问其参数的方法。示例
<?php require_once 'vendor/autoload.php'; // will throw an exception if it's not valid $reader = new Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml'); $tables = $reader->getEntities();
数据库匿名化器
目前可用的唯一匿名化器是数据库匿名化器。它期望PDO和配置读取器对象
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); $expression = $container->get('Edyan\Neuralyzer\Utils\Expression'); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => '127.0.0.1', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $db = new \Edyan\Neuralyzer\Anonymizer\DB($expression, $dbUtils); $db->setConfiguration( new \Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml') );
初始化后,匿名化表的以下方法是
<?php public function processEntity(string $entity, callable $callback = null): array;
参数
Entity
:例如表名(必需)Callback
(可调用/可选)例如使用进度条
可以通过调用设置一些选项
<?php // Limit of fake generated records for updates and creates. // Default : 0 = everything to update / nothing to insert public function setLimit(int $limit); // Don't do anything, default true public function setPretend(bool $pretend); // Return or not a result, default false public function setReturnRes(bool $returnRes);
完整示例
<?php require_once 'vendor/autoload.php'; // Create a container $container = Edyan\Neuralyzer\ContainerFactory::createContainer(); $expression = $container->get('Edyan\Neuralyzer\Utils\Expression'); // Configure DB Utils, required $dbUtils = $container->get('Edyan\Neuralyzer\Utils\DBUtils'); // See Doctrine DBAL configuration : // https://www.doctrine-project.org/projects/doctrine-dbal/en/2.7/reference/configuration.html $dbUtils->configure([ 'driver' => 'pdo_mysql', 'host' => 'mysql', 'dbname' => 'test_db', 'user' => 'root', 'password' => 'root', ]); $reader = new \Edyan\Neuralyzer\Configuration\Reader('neuralyzer.yml'); $db = new \Edyan\Neuralyzer\Anonymizer\DB($expression, $dbUtils); $db->setConfiguration($reader); $db->setPretend(false); // Get tables $tables = $reader->getEntities(); foreach ($tables as $table) { $total = $dbUtils->countResults($table); if ($total === 0) { fwrite(STDOUT, "$table is empty" . PHP_EOL); continue; } fwrite(STDOUT, "$table anonymized" . PHP_EOL); $db->processEntity($table); }
前和后操作
您可以为pre_actions
和post_actions
设置一个数组,这些操作将在开始和匿名化实体之前和之后执行。
这些操作实际上是symfony表达式(参见Symfony表达式语言),依赖于服务。这些服务从Service/
目录加载。
目前只有一个服务:Database
,其中包含一个可用的query
方法:db.query("DELETE FROM table")
。
配置参考
bin/neuralyzer config:example
提供了一个默认配置,其中解释了所有参数
config: # Set the guesser class guesser: Edyan\Neuralyzer\Guesser # Set the version of the guesser the conf has been written with guesser_version: '3.0' # Faker's language, make sure all your methods have a translation language: en_US # List all entities, theirs cols and actions entities: # Required, Example: people # Prototype - # Either "update" or "insert" data action: update # Should we delete data with what is defined in "delete_where" ? delete: ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions) # Condition applied in a WHERE if delete is set to "true" delete_where: ~ # Deprecated (delete and delete_where have been deprecated. Use now pre and post_actions), Example: '1 = 1' cols: # Examples: first_name: method: firstName last_name: method: lastName # Prototype - # Faker method to use, see doc : https://fakerphp.github.io/ method: ~ # Required # Set this option to true to generate unique values for that field (see faker->unique() generator) unique: false # Faker's parameters, see Faker's doc params: [] # Limit the number of written records (update or insert). 100 by default for insert limit: 0 # The list of expressions language actions to executed before neuralyzing. Be careful that "pretend" has no effect here. pre_actions: [] # The list of expressions language actions to executed after neuralyzing. Be careful that "pretend" has no effect here. post_actions: []
自定义应用程序逻辑
当使用自定义doctrine类型时,doctrine会生成一个错误,指出该类型未知。这可以通过提供一个bootstrap文件来注册自定义doctrine类型来解决。
bootstrap.php
<?php require_once '../vendor/autoload.php'; \Doctrine\DBAL\Types\Type::addType('custom_type', 'Namespace\Of\The\Custom\Type');
然后提供bootstrap文件给run命令
bin/neuralyzer run --db test_db -u root -p root -b bootstrap.php
开发
Neuralyzer使用Robo通过Docker运行其测试和构建其phar。
克隆项目,运行composer install
然后...
运行测试
- 如果因为数据库没有准备好而有大量错误,请更改
--wait
选项。 - 更改
--php
选项以使用7.2
或7.4
。 - 如果您想禁用PHPUnit代码覆盖率,请设置
--no-coverage
。
使用MySQL
$ vendor/bin/robo test --php 7.2 --wait 10 --db mysql --db-version 5 $ vendor/bin/robo test --php 7.3 --wait 10 --db mysql --db-version 8 $ vendor/bin/robo test --php 7.4 --wait 10 --db mysql --db-version 8 $ vendor/bin/robo test --php 8.0 --wait 10 --db mysql --db-version 8
使用PostgreSQL 9、10和11(12也适用)
$ vendor/bin/robo test --php 7.2 --wait 10 --db pgsql --db-version 10 $ vendor/bin/robo test --php 7.3 --wait 10 --db pgsql --db-version 11 $ vendor/bin/robo test --php 7.4 --wait 10 --db pgsql --db-version 12 $ vendor/bin/robo test --php 8.0 --wait 10 --db pgsql --db-version 13
使用SQL Server
警告:2个测试失败,因为SQL Server的奇怪行为...或Doctrine / Dbal。PHPUnit无法比较两个数据集,因为字段顺序不同。
$ vendor/bin/robo test --php 7.2 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 7.3 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 7.4 --wait 15 --db sqlsrv $ vendor/bin/robo test --php 8.0 --wait 15 --db sqlsrv
构建一个发布版本(包括phar和git标签)
$ php -d phar.readonly=0 vendor/bin/robo release
仅构建phar
$ php -d phar.readonly=0 vendor/bin/robo phar
使用phpinsights提高代码质量
docker run -it --rm -v $(pwd):/app nunomaduro/phpinsights analyse --fix
更新依赖关系以确保它与PHP 7.2兼容
vendor/bin/robo composer:update