digiaonline/common-sanitization-stages

此包已被放弃且不再维护。未建议替代包。

一组在数据清洗过程中常用到的管道阶段

1.2.0 2019-02-06 13:42 UTC

This package is auto-updated.

Last update: 2023-04-06 23:10:49 UTC


README

Build Status Coverage Status

一组在数据清洗过程中常用到的管道阶段

要求

  • PHP >= 7.1

安装

composer require digiaonline/common-sanitization-stages

用法

假设您正在将一些旧数据导入到新系统中。原始数据是用户生成内容,因此不可信。此外,它包含一些您想要去除的简单HTML。除此之外,旧数据必须能够存储在CSV文件中,因此您需要以某种方式对其进行编码,以确保CSV分隔符不会出现在文本值中。

要实现这一点,只需将您需要的阶段组合成一个管道,然后运行该管道对您的数据进行处理

<?php

require_once(__DIR__.'/vendor/autoload.php');

$rawInputData = <<<EOT
  The quick brown fox<br />
jumped over the <i>incredibly lazy dog</i> &amp; it
ran away.

EOT;

$encodedInputData = \base64_encode($rawInputData);

/** @var \League\Pipeline\Pipeline $pipeline */
$pipeline = (new \League\Pipeline\Pipeline())
    ->pipe(new \Digia\Sanitization\Stages\Base64DecodeStage())
    ->pipe(new \Digia\Sanitization\Stages\HtmlPurifierStage())
    ->pipe(new \Digia\Sanitization\Stages\HtmlEntityDecodeStage())
    ->pipe(new \Digia\Sanitization\Stages\StripLineFeedsStage(["\n"], true))
    ->pipe(new \Digia\Sanitization\Stages\TrimStringStage());

$outputData = $pipeline->process($encodedInputData);

var_dump($outputData); // string(70) "The quick brown fox jumped over the incredibly lazy dog & it ran away."

许可证

MIT