digiaonline / common-sanitization-stages
此包已被放弃且不再维护。未建议替代包。
一组在数据清洗过程中常用到的管道阶段
1.2.0
2019-02-06 13:42 UTC
Requires
- php: >=7.1
- league/pipeline: ^0.3.0 | ^1.0
Requires (Dev)
- ezyang/htmlpurifier: ^4.10
- phpstan/phpstan: ^0.9.2
- phpunit/phpunit: ^7.2
Suggests
- ezyang/htmlpurifier: Required to use the HTML purification stage
README
一组在数据清洗过程中常用到的管道阶段
要求
- PHP >= 7.1
安装
composer require digiaonline/common-sanitization-stages
用法
假设您正在将一些旧数据导入到新系统中。原始数据是用户生成内容,因此不可信。此外,它包含一些您想要去除的简单HTML。除此之外,旧数据必须能够存储在CSV文件中,因此您需要以某种方式对其进行编码,以确保CSV分隔符不会出现在文本值中。
要实现这一点,只需将您需要的阶段组合成一个管道,然后运行该管道对您的数据进行处理
<?php require_once(__DIR__.'/vendor/autoload.php'); $rawInputData = <<<EOT The quick brown fox<br /> jumped over the <i>incredibly lazy dog</i> & it ran away. EOT; $encodedInputData = \base64_encode($rawInputData); /** @var \League\Pipeline\Pipeline $pipeline */ $pipeline = (new \League\Pipeline\Pipeline()) ->pipe(new \Digia\Sanitization\Stages\Base64DecodeStage()) ->pipe(new \Digia\Sanitization\Stages\HtmlPurifierStage()) ->pipe(new \Digia\Sanitization\Stages\HtmlEntityDecodeStage()) ->pipe(new \Digia\Sanitization\Stages\StripLineFeedsStage(["\n"], true)) ->pipe(new \Digia\Sanitization\Stages\TrimStringStage()); $outputData = $pipeline->process($encodedInputData); var_dump($outputData); // string(70) "The quick brown fox jumped over the incredibly lazy dog & it ran away."
许可证
MIT