rgilyov/laravel-csv-importer

使用 Laravel 以简单可靠的方式导入、解析、验证和转换您的 CSV 文件

1.0.1 2017-03-12 21:11 UTC

This package is not auto-updated.

Last update: 2024-09-30 01:34:22 UTC


README

使用 Laravel 以灵活可靠的方式导入、解析、验证和转换您的 CSV 文件

安装

composer require rgilyov/laravel-csv-importer

config/app.php 中注册 \RGilyov\CsvImporter\CsvImporterServiceProvider

    'providers' => [
        //...
        \RGilyov\CsvImporter\CsvImporterServiceProvider::class,
    ];

安装后,您可以发布默认配置文件

php artisan vendor:publish --tag=config

支持 Laravel 5 及以上版本,hhvm 也得到支持。

要求

每个创建的导入器都将内置 mutex 功能,以确保导入的安全性并避免可能的数据不兼容性,这在需要导入超过 100k 行的 CSV 文件时尤为重要,因此 Laravel 应用程序应该在 .env 文件中设置 fileredismemcached 缓存驱动程序。

基本用法

要创建新的 CSV 导入器,应该创建一个继承自 \RGilyov\CsvImporter\BaseCsvImporter 抽象类的类,或者可以使用 php artisan make:csv-importer MyImporter 控制台命令,执行后将在 app/CsvImporters/ 目录中创建名为 MyImporter.php 的新文件,并设置基本导入器。

    <?php

    namespace App\CsvImporters;

    use RGilyov\CsvImporter\BaseCsvImporter;

    class MyImporter extends BaseCsvImporter
    {
        /**
         *  Specify mappings and rules for the csv importer, you also may create csv files to write csv entities
         *  and overwrite global configurations
         *
         * @return array
         */
        public function csvConfigurations()
        {
            return [
                'mappings' => [
                    'serial_number' => ['required', 'validation' => ['numeric'], 'cast' => 'string'],
                    'title'         => ['validation' => ['required'], 'cast' => ['string']],
                    'company'       => ['validation' => ['string']]
                ],
                'csv_files' => [
                    'valid_entities'   => '/valid_entities.csv',
                    'invalid_entities' => '/invalid_entities.csv',
                ]
            ];
        }

        /**
         *  Will be executed for a csv line if it passed validation
         *
         * @param $item
         * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException
         * @return void
         */
        public function handle($item)
        {
            $this->insertTo('valid_entities', $item);
        }

        /**
         *  Will be executed if a csv line did not pass validation
         *
         * @param $item
         * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException
         * @return void
         */
        public function invalid($item)
        {
            $this->insertTo('invalid_entities', $item);
        }
    }

主要有 3 个方法

  • csvConfigurations 返回给定类型 CSV 的配置,配置有 3 个部分

    • 'mappings':您可以指定期望给定 CSV 具有字段的(标题),并为每个字段(标题)附加规则,您可以指定 3 种类型的规则(过滤器)

      • 要使字段(标题)对导入是必需的,需要设置字段(标题)的 required 参数,例如 'name' => ['required'],如果给定的 CSV 文件没有该字段(标题),将抛出错误。

      • 您可以为每个字段(标题)设置验证,导入器使用 Laravel 验证,因此您可以使用其中的任何规则来检查 CSV 值(例如 'email' => ['required', validation => ['email']]),如果值不是有效的电子邮件,则包含电子邮件的 CSV 行将被放入 invalid($item) 方法中,否则放入 handle($item)

      • 您可以将 CSV 值转换为任何原生 PHP 类型,并格式化 CSV 中包含的日期,例如 'name' => ['required', 'cast' => 'string']'birth_date' => ['cast' => ['string', 'date']],转换将在验证之前进行。

    • 'csv_files':将创建键中指定的文件,您可以使用 $this->insertTo('csv_file_name', $item); 方法将 CSV 行写入每个文件,例如,您可以区分无效的 CSV 行和有效的,它使用 Laravel 文件系统 \Storage 支持类 https://laravel.net.cn/docs/5.4/filesystem

    • 'config':您可以为给定 CSV 导入器在此处覆盖全局 config/csv-importer.php 配置。

  • handle:将执行验证通过的 CSV 行。

  • invalid:将执行未通过验证的 CSV 行。

让我们最终导入一个 CSV 文件

    $importer = (new \App\CsvImporters\MyImporter())->setCsvFile('my_huge_csv_with_1000k_lines.csv');
    $importer->run();

    // progress information will be here, due to the import process already started above
    $result = $importer->run();

导入开始后,您将无法启动另一个导入,直到第一个导入完成。

虽然在导入过程中,您可能想了解运行过程中的进度

    $progress = $importer->getProgress();

    /*
        [
            'data' => ["message"  => 'The import process is running'],
            'meta' => [
                'processed'  => 250000,
                'remains'    => 750000,
                'percentage' => 25,
                'finished'   => false,
                'init'       => false,
                'running'    => true
            ]
        ]
    */

在导入结束时,您将在数据中看到关键值finished => true。因此,您需要完成csv导入。

    $finishDetails = $importer->finish();

    /*
        [
            [
                'data' => [
                    "message" => 'The import process successfully finished.'
                ],
                'meta' => ["finished" => true, 'init' => false, 'running' => false],
                'csv_files' => [
                    'valid_entities.csv',
                    'invalid_entities.csv'
                ]
            ]
        ]
    */

如果出现问题,您可以取消当前的导入进程。

    $importer->cancel();

导入器自定义

除了上述方法外,导入器还有一个可以帮助您轻松扩展特定情况的函数列表。

  • before - 将在导入过程开始之前执行
  • after - 将在导入过程完成后执行
  • onCancel - 将在导入过程被取消之前执行
  • initProgressBar - 将初始化新的进度条
  • progressBarDetails - 进度条的附加信息
  • setFinalDetails - 在导入过程的finish()之后设置附加信息
  • setError - 向错误列表中添加一个错误(如果存在)
    <?php

    namespace App\CsvImporters;

    use RGilyov\CsvImporter\BaseCsvImporter;

    class MyImporter extends BaseCsvImporter
    {
        //...

        /**
         * Will be executed before importing
         *
         * @return void
         */
        protected function before()
        {
            // do something before the import start
            if (! $this->checkSomething()) {
                $this->setError('Oops', 'something went wrong.');
            };
        }

        /**
         *  Adjust additional information to progress bar during import process
         *
         * @return null|string|array
         */
        public function progressBarDetails()
        {
            return "I'm a csv importer and I'm running :)";
        }

        /**
         * Will be executed after importing
         *
         * @return void
         */
        protected function after()
        {
            // do something after the import finished
            $entities = \App\CsvEntity::all(); // just a demo, in real life you don't want to do it ;)
            $this->initProgressBar('Something running.', $entities->count());

            $entities->each(function ($entity) {
                // do something
                $this->incrementProgress();
            });

            $this->setFinalDetails('Final details.');
        }

        /**
         *  Will be executed during the import process canceling
         */
        protected function onCancel()
        {
            \DB::rollBack();
        }
    }

基本的csv聚合

如果将csv文件设置为导入类,您可以对其count,从中获取distinct值,或者遍历csv

    $importer = (new \App\CsvImporters\MyImporter())->setCsvFile('my_huge_csv_with_1000k_lines.csv');

    $quantity = $importer->countCsv(); // returns amount of csv lines without headers
    $distinctNames = $importer->distinct('name'); // returns array with distinct names

    $importer->each(function ($item) { // encoded and casted csv line
        // do something
    });

上述所有方法在没有设置csv文件时都返回false

配置

存在3层配置

    1. 全局配置文件config/csv-importer.php,它为所有csv导入器应用默认参数
    1. 局部配置,由csvConfigurations()方法返回,覆盖全局配置
    1. 使用setter进行的手动配置自定义,覆盖globallocal配置
  1. 全局配置
        /*
        |--------------------------------------------------------------------------
        | Main csv import configurations
        |--------------------------------------------------------------------------
        |
        | `cache_driver` - keeps all progress and final information, it also allows
        |   the mutex functionality to work, there are only 3 cache drivers supported:
        |   redis, file and memcached
        |
        | `mutex_lock_time` - how long script will be executed and how long
        |   the import process will be locked, another words if we will import
        |   list of electric guitars we won't be able to run another import of electric
        |   guitars at the same time, to avoid duplicates and different sorts of
        |   incompatibilities. The value set in minutes.
        |
        | `memory_limit` - if you want store all csv values in memory or something like that,
        |   you may increase amount of memory for the script
        |
        | `encoding` - which encoding we have, UTF-8 by default
        |
        */
        'cache_driver' => env('CACHE_DRIVER', 'file'),
        
        'mutex_lock_time' => 300,

        'memory_limit' => 128,

        /*
         * An import class's short name (without namespace) by default
         */
        'mutex_lock_key' => null,

        /*
         * Encoding of given csv file
         */
        'input_encoding' => 'UTF-8',

        /*
         * Encoding of processed csv values
         */
        'output_encoding' => 'UTF-8',

        /*
         * Specify which date format the given csv file has
         * to use `date` ('Y-m-d') and `datetime` ('Y-m-d H:i:s') casters,
         * if the parameter will be set to `null` `date` caster will replace
         * `/` and `\` and `|` and `.` and `,` on `-` and will assume that
         * the given csv file has `Y-m-d` or `d-m-Y` date format
         */
        'csv_date_format' => null,

        'delimiter' => ',',

        'enclosure' => '"',

        /*
         * Warning: The library depends on PHP SplFileObject class.
         * Since this class exhibits a reported bug (https://bugs.php.net/bug.php?id=55413),
         * Data using the escape character are correctly
         * escaped but the escape character is not removed from the CSV content.
         */
        'escape' => '\\',

        'newline' => "\n",

        /*
        |--------------------------------------------------------------------------
        | Progress bar messages
        |--------------------------------------------------------------------------
        */

        'does_not_running' => 'Import process does not run',
        'initialization'   => 'Initialization',
        'progress'         => 'Import process is running',
        'final_stage'      => 'Final stage',
        'finished'         => 'Almost done, please click to the `finish` button to proceed',
        'final'            => 'The import process successfully finished!'
  1. 局部配置
    <?php

    namespace App\CsvImporters;

    use RGilyov\CsvImporter\BaseCsvImporter;

    class MyImporter extends BaseCsvImporter
    {
        /**
         *  Specify mappings and rules for the csv importer, you also may create csv files to write csv entities
         *  and overwrite global configurations
         *
         * @return array
         */
        public function csvConfigurations()
        {
            return [
                'mappings' => [//...],
                'csv_files' => [//...],
                'config' => [
                    'mutex_lock_time' => 500,
                    'memory_limit' => 256,
                    'mutex_lock_key' => 'my-key',
                    'input_encoding' => 'cp1252',
                    'output_encoding' => 'UTF-8',
                    'csv_date_format' => 'm/d/Y',
                    'delimiter' => ';',
                    'enclosure' => '\'',
                    'escape' => '\\',
                    'newline' => "\n",

                    /*
                    |--------------------------------------------------------------------------
                    | Progress bar messages
                    |--------------------------------------------------------------------------
                    */

                    'does_not_running' => 'Something does not run',
                    'initialization'   => 'Init',
                    'progress'         => 'Something running',
                    'final_stage'      => 'After the import had finished',
                    'finished'         => 'Please click to the `finish` button to proceed',
                    'final'            => 'Something successfully finished!'
                ]
            ];
        }
    }
  1. 具有setter的配置
    (new \App\CsvImporters\MyImporter())
                ->setCsvFile('my_huge_csv_with_1000k_lines.csv')
                ->setCsvDateFormat('y-m-d')
                ->setDelimiter('d')
                ->setEnclosure('e')
                ->setEscape("x")
                ->setInputEncoding('cp1252')
                ->setOutputEncoding('UTF-8')
                ->setNewline('newline')
                ->run();

从csv标题到定义的映射和反向数组转换

听起来很糟糕,还是直接展示它是如何工作的吧

假设我们有一个具有这种结构的csv文件

name,some_weird_header
John,some_weird_data

然后我们创建一个导入类并定义mappings,在这种情况下,我们只对name字段(标题)感兴趣

    class GuitarsCsvImporter extends BaseCsvImporter
    {
        /**
         *  Specify mappings and rules for the csv importer, you also may create csv files to write csv entities
         *  and overwrite global configurations
         *
         * @return array
         */
        public function csvConfigurations()
        {
            return [
                'mappings' => [
                    'name' => [] // <- defined mappings, we only need data from this column
                ]
            ];
        }

        /**
         *  Will be executed for a csv line if it passed validation
         *
         * @param $item
         * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException
         * @return void
         */
        public function handle($item)
        {
            /*
                $item contains ['name' => 'John', 'some_weird_header' => 'some_weird_data']
                so the $item will have all columns inside, so we need extract only columns we need, which was defined
                inside csv configurations mappings array
            */

            $dataOnlyFromDefinedFields = $this->extractDefinedFields($item); // will return ['name' => 'John']

            /*
                Assume we need to do some manipulations with the $item
                array and then write it to the `valid_entities.csv`
                we need to make sure that data inside formatted array are
                match headers inside the csv, we can do it with this `toCsvHeaders($item)` method:
            */

            // will return ['name' => 'John', 'some_weird_header' => null]
            $csvHeadersData = $this->toCsvHeaders($dataOnlyFromDefinedFields);

            $this->insertTo('valid_entities', $csvHeadersData);
        }
    }

互斥键连接

有些情况下,您需要能够在同一时间运行多个类似的导入进程,例如,您数据库中有guitarsguitar_companies表,以及两个csv文件ltd_guitars.csvblack_machine_guitars.csv,您使用相同的导入类对这两个csv进行操作,但由于导入过程被锁定,您无法同时导入两个,在这种情况下使用互斥键连接,为每个guitar company拥有不同的互斥键

    class GuitarsCsvImporter extends BaseCsvImporter
    {
        //..

        protected $guitarCompany;

        public function setCompany(\App\GuitarCompany $guitarCompany)
        {
            $this->guitarCompany = $guitarCompany;
            $this->concatMutexKey($guitarCompany->id);

            return $this;
        }

        /**
         *  Will be executed for a csv line if it passed validation
         *
         * @param $item
         * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException
         * @return void
         */
        public function handle($item)
        {
            \App\Guitars::create(
                array_merge(['guitar_company_id' => $this->guitarCompany->id], $this->extractDefinedFields($item))
            );
        }
    }

现在您可以在同一时间运行每个公司的导入器。但不是同一个公司。

自定义过滤器

如上所述,在基本用法章节中,csv导入器有3种类型的过滤器,您可以针对每个csv字段(标题)指定这些过滤器,但有时您需要做一些更复杂的事情,例如

  • 如果给定的csv有字段(标题)A或字段(标题)B,并且如果两者都缺失则抛出标题验证错误,在这种情况下参数required由于csv导入将检查具有参数的每个字段(标题),这些参数是AND逻辑,在这种情况下您需要创建headers filter

  • 或者例如,您需要对特定字段的全部值进行白名单检查,这是一种新的validation rule(filter)

  • 或者您可能想要对csv值进行一些高级转换,在这种情况下您需要创建cast filter

自定义标题过滤器

要创建自定义标题过滤器,您需要创建一个继承自\RGilyov\CsvImporter\BaseHeadersFilter的类,或者运行php artisan make:csv-importer-headers-filter MyHeadersFilter,这将创建一个具有基本设置的MyHeadersFilter.php文件,并将其放在app/CsvImporters/HeadersFilters/文件夹中

    <?php

    namespace App\CsvImporters\HeadersFilters;

    use RGilyov\CsvImporter\BaseHeadersFilter;

    class MyHeadersFilter extends BaseHeadersFilter
    {
        /**
         * Specify error message
         *
         * @var string
         */
        public $errorMessage = 'The csv must contain either `name` field either `first_name` and `last_name` fields';

        /**
         * @param array $csvHeaders
         * @return bool
         */
        public function filter(array $csvHeaders)
        {
            if (isset($csvHeaders['name']) || (isset($csvHeaders['first_name']) && isset($csvHeaders['last_name']))) {
                return true;
            }

            return false;
        }
    }

过滤器具有属性 errorMessage,您可以指定在导入过程开始后,当 filter 方法返回 false 时抛出的错误消息。您还可以指定要返回的过滤器名称,或检查其是否存在,或将其重置等,要这样做,您需要指定 protected $name 属性并设置您想要的任何名称,默认情况下为类的简短名称,在这种情况下为 MyHeadersFilter

要为导入器注册新的头过滤器,您需要使用 addHeadersFilter() 静态函数

    \App\CsvImporters\MyImporter::addHeadersFilter(new \App\CsvImporters\HeadersFilters\MyHeadersFilter());

    // you may also use closure, you can specify name by passing second argument, otherwise it will be called `filter`
    \App\CsvImporters\MyImporter::addHeadersFilter(function ($csvHeaders) {
        if (isset($csvHeaders['A']) || isset($csvHeaders['B'])) {
            return true;
        }

        return false;
    }, 'ab-filter');

    // you may add multiple filters with one method
    $ABFilter = function ($csvHeaders) {
                    if (isset($csvHeaders['A']) || isset($csvHeaders['B'])) {
                        return true;
                    }

                    return false;
                }

    $myHeadersFilter = new \App\CsvImporters\HeadersFilters\MyHeadersFilter();

    \App\CsvImporters\MyImporter::addHeadersFilters($ABFilter, $myHeadersFilter);

当然,您可以在导入过程中清除、获取、取消设置和检查过滤器

    \App\CsvImporters\MyImporter::headersFilterExists('MyHeadersFilter'); // will return `true`
    \App\CsvImporters\MyImporter::getHeadersFilter('MyHeadersFilter'); // will return the filter object
    \App\CsvImporters\MyImporter::getHeadersFilters(); // will return array with all filter objects
    \App\CsvImporters\MyImporter::unsetHeadersFilter('MyHeadersFilter'); // will return `true`
    \App\CsvImporters\MyImporter::flushHeadersFilters(); // will return empty array

    // example case
    if ($request->get('without_filters')) {
        \App\CsvImporters\MyImporter::flushHeadersFilters();
    }

自定义验证过滤器

要创建自定义验证过滤器,您需要创建一个扩展 \RGilyov\CsvImporter\BaseValidationFilter 的类,或者运行 php artisan make:csv-importer-validation-filter MyValidationFilter,这将创建一个包含基本设置的 MyValidationFilter.php 文件,位于 app/CsvImporters/ValidationFilters/ 目录中

    <?php

    namespace App\CsvImporters\ValidationFilters;

    use RGilyov\CsvImporter\BaseValidationFilter;

    class MyValidationFilter extends BaseValidationFilter
    {
        /**
         * @var string
         */
        protected $name = 'bad_word_validation';

        /**
         * @param mixed $value
         * @return bool
         */
        public function filter($value)
        {
            if (strpos($value, 'bad_word') !== false) {
                return false;
            }

            return true;
        }
    }

对于头过滤器属性 name 并不重要,但对于验证过滤器来说,指定它是非常有用的,因为我们将需要在 csvConfigurations() 方法中的 validation 属性内设置它,对于字段(头):'name' => ['validation' => 'bad_word_validation']

但是,有时您可能想要创建一个全局的验证过滤器,以便能够验证整个 CSV 实体数组,例如,只有当 CSV 行不包含空的用户名或用户名和姓氏都不为空时才有效,要获取所有 CSV 列的数组而不是仅仅一个值,您需要设置 public $global = true,当然,无需在导入类 CSV 配置中指定此类验证过滤器

    <?php

    namespace App\CsvImporters\ValidationFilters;

    use RGilyov\CsvImporter\BaseValidationFilter;

    class MyGlobalValidationFilter extends BaseValidationFilter
    {
        /**
         * @var string
         */
        protected $name = 'global_validation';

        /**
         * @var string
         */
        public $global = true;

        /**
         * @param mixed $value
         * @return bool
         */
        public function filter($value) // we will get array here, coz $this->global set to `true`
        {
            if (empty($value['name']) || (empty($value['first_name']) && empty($value['last_name']))) {
                return false;
            }

            return true;
        }
    }

所有过滤器的操作与 自定义头过滤器 章节中描述的操作类似

    \App\CsvImporters\MyImporter::addValidationFilter(new \App\CsvImporters\ValidationFilters\MyValidationFilter());

    // closure validation filters are global
    \App\CsvImporters\MyImporter::addValidationFilter(function ($item) {
        if (!empty($csvHeaders['A']) || !empty($csvHeaders['B'])) {
            return true;
        }

        return false;
    }, 'not-empty');

    // you may add multiple filters with one method
    $notEmpty = function ($item) {
                    if (!empty($item['A']) || !empty($item['B'])) {
                        return true;
                    }

                    return false;
                }

    $myValidationFilter       = new \App\CsvImporters\ValidationFilters\MyValidationFilter();
    $myGlobalValidationFilter = new \App\CsvImporters\ValidationFilters\MyGlobalValidationFilter();

    \App\CsvImporters\MyImporter::addValidationFilters($notEmpty, $myValidationFilter, $myGlobalValidationFilter);

    /////////////////////////////////////////////////////////////////////////////

    \App\CsvImporters\MyImporter::validationFilterExists('bad_word_validation'); // will return `true`
    \App\CsvImporters\MyImporter::getValidationFilter('global_validation'); // will return instance of MyGlobalValidationFilter class
    \App\CsvImporters\MyImporter::getValidationFilters(); // will return array with all filter objects
    \App\CsvImporters\MyImporter::unsetValidationFilter('bad_word_validation'); // will return `true`
    \App\CsvImporters\MyImporter::flushValidationFilters(); // will return empty array

    // example case
    if ($request->get('without_filters')) {
        \App\CsvImporters\MyImporter::flushValidationFilters();
    }
    WARNING!!!
    All closure validation filters are global.
    WARNING!!!
    If you will set not related validation rule(filter) for a field(header) and not specify and register
    a custom validation filter for that the RGilyov\CsvImporter\Exceptions\ImportValidationException will be thrown

自定义转换过滤器

要创建自定义转换过滤器,您需要创建一个扩展 \RGilyov\CsvImporter\BaseCastFilter 的类,或者运行 php artisan make:csv-importer-cast-filter MyCastFilter,这将创建一个包含基本设置的 MyCastFilter.php 文件,位于 app/CsvImporters/CastFilters/ 目录中

    <?php

    namespace App\CsvImporters\CastFilters;

    use RGilyov\CsvImporter\BaseCastFilter;

    class MyCastFilter extends BaseCastFilter
    {
        protected $name = 'lowercase';

        /**
         * @param $value
         * @return mixed
         */
        public function filter($value)
        {
            return strtolower($value);
        }
    }

与验证过滤器一样,name 对于转换过滤器也很重要,因为我们将需要在 csvConfigurations() 映射中设置它,对于字段(头):'field' => ['cast' => 'lowercase']

所有过滤器的操作与 自定义头过滤器自定义验证过滤器 章节中描述的操作类似

    \App\CsvImporters\MyImporter::addCastFilter(new \App\CsvImporters\CastFilters\MyCastFilter());

    \App\CsvImporters\MyImporter::addCastFilter(function ($value) {
        return htmlspecialchars($value);
    });

    $htmlentities = function ($value) {
                    return htmlentities($value);
                }

    $myCastFilter = new \App\CsvImporters\CastFilters\MyCastFilter();

    \App\CsvImporters\MyImporter::addCastFilters($htmlentities, $myCastFilter);

    /////////////////////////////////////////////////////////////////////////////

    \App\CsvImporters\MyImporter::castFilterExists('lowercase'); // will return `true`
    \App\CsvImporters\MyImporter::getCastFilter('lowercase'); // will return instance of MyCastFilter class
    \App\CsvImporters\MyImporter::getCastFilters(); // will return array with all filter objects
    \App\CsvImporters\MyImporter::unsetCastFilter('lowercase'); // will return `true`
    \App\CsvImporters\MyImporter::flushCastFilters(); // will return empty array

    // example case
    if ($request->get('without_filters')) {
        \App\CsvImporters\MyImporter::flushCastFilters();
    }

注册自定义过滤器的最佳方式

我认为注册自定义过滤器的最佳方式是使用服务提供者:https://laravel.net.cn/docs/5.4/providers