n3xt0r / mysql-data-anonymizer
在测试数据库中匿名化敏感数据
Requires
- php: >=7.2
- amphp/mysql: ^1.1
- fzaninotto/faker: ^1.8
README
MySQL数据匿名化器是一个PHP库,可以在数据库中匿名化您的数据。始终使用生产数据库来测试您的程序,但担心客户数据泄露?MySQL数据匿名化器是您的正确工具。此工具可以帮助您用假数据替换所有敏感数据。默认情况下,假数据由fzaninotto/Faker生成器提供,但您也可以使用自己的生成器。为了提高性能,使用AMP/MySQL创建多个MySQL连接同时进行。
MySQL数据匿名化器需要PHP >= 7.2。
目录
配置
将 config-sample.php 文件重命名为 config.php,并根据您的环境修改配置。
<?php return array ( 'DB_HOST' => '127.0.0.1', 'DB_NAME' => 'dbname', 'DB_USER' => 'username', 'DB_PASSWORD' => 'password', 'NB_MAX_MYSQL_CLIENT' => 50, 'NB_MAX_PROMISE_IN_LOOP' => 50, 'DEFAULT_GENERATOR_LOCALE' => 'en_US' );
NB_MAX_MYSQL_CLIENT 是在执行脚本时同时执行的MySQL连接的最大数量。默认情况下,MySQL最多支持同时执行151个连接,但您可以通过修改MySQL变量 'max_connections' 来打破此限制。
NB_MAX_PROMISE_IN_LOOP 是我们在承诺表中保持的最大承诺数。每个承诺代表一个SQL查询的未来结果。数字越大,执行速度越快。但您必须小心,保持大量承诺将消耗过多的内存和CPU资源。如果您的处理器负担不起,运行时间将是预期时间的至少10倍。如果您的处理器性能不太清楚,请将此变量保留为50,甚至如果您不太自信,可以设置为20。
DEFAULT_GENERATOR_LOCALE 通过Faker生成器影响生成数据的语言和格式。您可以在此处找到完整的地区列表:这里
示例代码
<?php require './vendor/autoload.php'; use Globalis\MysqlDataAnonymizer\Anonymizer; $anonymizer = new Anonymizer(); // Describe `users` table. $anonymizer->table('users', function ($table) { // Specify a primary key of the table. An array should be passed in for composite key. $table->primary('id'); // Add a global filter to the queries. // Only string is accepted so you need to write down the complete WHERE statement here. $table->globalWhere('email4 != email5 AND id != 10'); // Replace with static data. $table->column('email1')->replaceWith('john@example.com'); // Use #row# template to get "email_0@example.com", "email_1@example.com", "email_2@example.com" $table->column('email2')->replaceWith('email_#row#@example.com'); // To replace with dynamic data a $generator is needed. // By default, a fzaninotto/Faker generator will be used. // Any generator object can be set like that - `$anonymizer->setGenerator($generator);` $table->column('email3')->replaceWith(function ($generator) { return $generator->email; }); // Use `where` to leave some data untouched for a specific column. // If you don't list a column here, it will be left untouched too. $table->column('email4')->where('ID != 1')->replaceWith(function ($generator) { return $generator->unique()->email; }); // Use the values of current row to update a field // This is a position sensitive operation, so the value of field 'email4' here is the updated value. // So if you put this line before the previous one, the value of 'email4' here would be the valeu of 'email4' before update. $table->column('email5')->replaceByFields(function ($rowData) { return strtolower($rowData['email4']); }); // Here we assume that there is a foreign key in the table 'class' on the column 'user_id'. // To make sure 'user_id' get updated when we update 'id', use function 'synchronizeColumn'. $table->column('id')->replaceWith(function ($generator) { return $generator->unique()->uuid; })->synchronizeColumn(['user_id', 'class']); }); $anonymizer->run(); echo 'Anonymization has been completed!';
有关更多假数据类型和假数据生成器的详细信息,您可以在fzaninotto/Faker的GitHub页面上找到所需内容。
辅助程序和提供者
您可以在 src/helpers 和 src/providers 中添加自己的辅助程序和生成器类。辅助程序和提供者的文件名需要保持以下格式:'XXXHelper.php'、'XXXProvider.php',否则将不会被加载。
自定义辅助程序的示例
<?php namespace Globalis\MysqlDataAnonymizer\Helpers; //Default namespace, should always use this one class StrHelper //Class name needs to be the same as file name { public static function toLower($string) { return strtolower($string); } }
然后在您的脚本中,您可以使用它如下
<?php require './vendor/autoload.php'; use Globalis\MysqlDataAnonymizer\Anonymizer; use Globalis\MysqlDataAnonymizer\Helpers; $anonymizer = new Anonymizer(); $anonymizer->table('users', function ($table) { $table->primary('id'); $table->column('name')->replaceByFields(function ($rowData, $generator) { return Helpers\StrHelper::toLower(($rowData['name'])); }); }
自定义提供者的示例
<?php namespace Globalis\MysqlDataAnonymizer\Provider; //Default namespace, should always use this one class EnumProvider extends \Faker\Provider\Base //Class name needs to be the same as file name, and provider classes need to extend \Faker\Provider\Base { //This simple method returns a fruit randomly from the list public function fruit() { $enum = ['apple', 'orange', 'banana']; return $enum[rand(0, 2)]; } }
然后在您的脚本中,您可以使用它如下
<?php require './vendor/autoload.php'; use Globalis\MysqlDataAnonymizer\Anonymizer; $anonymizer = new Anonymizer(); $anonymizer->table('users', function ($table) { $table->primary('id'); $table->column('favorite_fruit')->replaceWith(function ($generator) { return $generator->fruit; }); }
来自globalis-ms的分叉