rgilyov / laravel-csv-importer
使用 Laravel 以简单可靠的方式导入、解析、验证和转换您的 CSV 文件
Requires
- php: >=5.6.4
- arvenil/ninja-mutex: 0.6.0
- laravel/framework: ~4.2|^5
- league/csv: 8.0
- nesbot/carbon: ~1.20
Requires (Dev)
- mockery/mockery: ^0.9.4
- orchestra/testbench: ~3.0
- phpunit/phpunit: 5.7
- predis/predis: 1.0.*
This package is not auto-updated.
Last update: 2024-09-30 01:34:22 UTC
README
使用 Laravel 以灵活可靠的方式导入、解析、验证和转换您的 CSV 文件
安装
composer require rgilyov/laravel-csv-importer
在 config/app.php 中注册 \RGilyov\CsvImporter\CsvImporterServiceProvider
'providers' => [ //... \RGilyov\CsvImporter\CsvImporterServiceProvider::class, ];
安装后,您可以发布默认配置文件
php artisan vendor:publish --tag=config
支持 Laravel 5 及以上版本,hhvm 也得到支持。
要求
每个创建的导入器都将内置 mutex 功能,以确保导入的安全性并避免可能的数据不兼容性,这在需要导入超过 100k 行的 CSV 文件时尤为重要,因此 Laravel 应用程序应该在 .env 文件中设置 file、redis 或 memcached 缓存驱动程序。
基本用法
要创建新的 CSV 导入器,应该创建一个继承自 \RGilyov\CsvImporter\BaseCsvImporter 抽象类的类,或者可以使用 php artisan make:csv-importer MyImporter 控制台命令,执行后将在 app/CsvImporters/ 目录中创建名为 MyImporter.php 的新文件,并设置基本导入器。
<?php namespace App\CsvImporters; use RGilyov\CsvImporter\BaseCsvImporter; class MyImporter extends BaseCsvImporter { /** * Specify mappings and rules for the csv importer, you also may create csv files to write csv entities * and overwrite global configurations * * @return array */ public function csvConfigurations() { return [ 'mappings' => [ 'serial_number' => ['required', 'validation' => ['numeric'], 'cast' => 'string'], 'title' => ['validation' => ['required'], 'cast' => ['string']], 'company' => ['validation' => ['string']] ], 'csv_files' => [ 'valid_entities' => '/valid_entities.csv', 'invalid_entities' => '/invalid_entities.csv', ] ]; } /** * Will be executed for a csv line if it passed validation * * @param $item * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException * @return void */ public function handle($item) { $this->insertTo('valid_entities', $item); } /** * Will be executed if a csv line did not pass validation * * @param $item * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException * @return void */ public function invalid($item) { $this->insertTo('invalid_entities', $item); } }
主要有 3 个方法
-
csvConfigurations返回给定类型 CSV 的配置,配置有 3 个部分-
'mappings':您可以指定期望给定 CSV 具有字段的(标题),并为每个字段(标题)附加规则,您可以指定 3 种类型的规则(过滤器)-
要使字段(标题)对导入是必需的,需要设置字段(标题)的
required参数,例如'name' => ['required'],如果给定的 CSV 文件没有该字段(标题),将抛出错误。 -
您可以为每个字段(标题)设置验证,导入器使用 Laravel 验证,因此您可以使用其中的任何规则来检查 CSV 值(例如
'email' => ['required', validation => ['email']]),如果值不是有效的电子邮件,则包含电子邮件的 CSV 行将被放入invalid($item)方法中,否则放入handle($item)。 -
您可以将 CSV 值转换为任何原生 PHP 类型,并格式化 CSV 中包含的日期,例如
'name' => ['required', 'cast' => 'string']或'birth_date' => ['cast' => ['string', 'date']],转换将在验证之前进行。
-
-
'csv_files':将创建键中指定的文件,您可以使用$this->insertTo('csv_file_name', $item);方法将 CSV 行写入每个文件,例如,您可以区分无效的 CSV 行和有效的,它使用 Laravel 文件系统\Storage支持类 https://laravel.net.cn/docs/5.4/filesystem。 -
'config':您可以为给定 CSV 导入器在此处覆盖全局config/csv-importer.php配置。
-
-
handle:将执行验证通过的 CSV 行。 -
invalid:将执行未通过验证的 CSV 行。
让我们最终导入一个 CSV 文件
$importer = (new \App\CsvImporters\MyImporter())->setCsvFile('my_huge_csv_with_1000k_lines.csv'); $importer->run(); // progress information will be here, due to the import process already started above $result = $importer->run();
导入开始后,您将无法启动另一个导入,直到第一个导入完成。
虽然在导入过程中,您可能想了解运行过程中的进度
$progress = $importer->getProgress(); /* [ 'data' => ["message" => 'The import process is running'], 'meta' => [ 'processed' => 250000, 'remains' => 750000, 'percentage' => 25, 'finished' => false, 'init' => false, 'running' => true ] ] */
在导入结束时,您将在数据中看到关键值finished => true。因此,您需要完成csv导入。
$finishDetails = $importer->finish(); /* [ [ 'data' => [ "message" => 'The import process successfully finished.' ], 'meta' => ["finished" => true, 'init' => false, 'running' => false], 'csv_files' => [ 'valid_entities.csv', 'invalid_entities.csv' ] ] ] */
如果出现问题,您可以取消当前的导入进程。
$importer->cancel();
导入器自定义
除了上述方法外,导入器还有一个可以帮助您轻松扩展特定情况的函数列表。
before- 将在导入过程开始之前执行after- 将在导入过程完成后执行onCancel- 将在导入过程被取消之前执行initProgressBar- 将初始化新的进度条progressBarDetails- 进度条的附加信息setFinalDetails- 在导入过程的finish()之后设置附加信息setError- 向错误列表中添加一个错误(如果存在)
<?php namespace App\CsvImporters; use RGilyov\CsvImporter\BaseCsvImporter; class MyImporter extends BaseCsvImporter { //... /** * Will be executed before importing * * @return void */ protected function before() { // do something before the import start if (! $this->checkSomething()) { $this->setError('Oops', 'something went wrong.'); }; } /** * Adjust additional information to progress bar during import process * * @return null|string|array */ public function progressBarDetails() { return "I'm a csv importer and I'm running :)"; } /** * Will be executed after importing * * @return void */ protected function after() { // do something after the import finished $entities = \App\CsvEntity::all(); // just a demo, in real life you don't want to do it ;) $this->initProgressBar('Something running.', $entities->count()); $entities->each(function ($entity) { // do something $this->incrementProgress(); }); $this->setFinalDetails('Final details.'); } /** * Will be executed during the import process canceling */ protected function onCancel() { \DB::rollBack(); } }
基本的csv聚合
如果将csv文件设置为导入类,您可以对其count,从中获取distinct值,或者遍历csv
$importer = (new \App\CsvImporters\MyImporter())->setCsvFile('my_huge_csv_with_1000k_lines.csv'); $quantity = $importer->countCsv(); // returns amount of csv lines without headers $distinctNames = $importer->distinct('name'); // returns array with distinct names $importer->each(function ($item) { // encoded and casted csv line // do something });
上述所有方法在没有设置csv文件时都返回false。
配置
存在3层配置
-
- 全局配置文件
config/csv-importer.php,它为所有csv导入器应用默认参数
- 全局配置文件
-
- 局部配置,由
csvConfigurations()方法返回,覆盖全局配置
- 局部配置,由
-
- 使用setter进行的手动配置自定义,覆盖
global和local配置
- 使用setter进行的手动配置自定义,覆盖
- 全局配置
/* |-------------------------------------------------------------------------- | Main csv import configurations |-------------------------------------------------------------------------- | | `cache_driver` - keeps all progress and final information, it also allows | the mutex functionality to work, there are only 3 cache drivers supported: | redis, file and memcached | | `mutex_lock_time` - how long script will be executed and how long | the import process will be locked, another words if we will import | list of electric guitars we won't be able to run another import of electric | guitars at the same time, to avoid duplicates and different sorts of | incompatibilities. The value set in minutes. | | `memory_limit` - if you want store all csv values in memory or something like that, | you may increase amount of memory for the script | | `encoding` - which encoding we have, UTF-8 by default | */ 'cache_driver' => env('CACHE_DRIVER', 'file'), 'mutex_lock_time' => 300, 'memory_limit' => 128, /* * An import class's short name (without namespace) by default */ 'mutex_lock_key' => null, /* * Encoding of given csv file */ 'input_encoding' => 'UTF-8', /* * Encoding of processed csv values */ 'output_encoding' => 'UTF-8', /* * Specify which date format the given csv file has * to use `date` ('Y-m-d') and `datetime` ('Y-m-d H:i:s') casters, * if the parameter will be set to `null` `date` caster will replace * `/` and `\` and `|` and `.` and `,` on `-` and will assume that * the given csv file has `Y-m-d` or `d-m-Y` date format */ 'csv_date_format' => null, 'delimiter' => ',', 'enclosure' => '"', /* * Warning: The library depends on PHP SplFileObject class. * Since this class exhibits a reported bug (https://bugs.php.net/bug.php?id=55413), * Data using the escape character are correctly * escaped but the escape character is not removed from the CSV content. */ 'escape' => '\\', 'newline' => "\n", /* |-------------------------------------------------------------------------- | Progress bar messages |-------------------------------------------------------------------------- */ 'does_not_running' => 'Import process does not run', 'initialization' => 'Initialization', 'progress' => 'Import process is running', 'final_stage' => 'Final stage', 'finished' => 'Almost done, please click to the `finish` button to proceed', 'final' => 'The import process successfully finished!'
- 局部配置
<?php namespace App\CsvImporters; use RGilyov\CsvImporter\BaseCsvImporter; class MyImporter extends BaseCsvImporter { /** * Specify mappings and rules for the csv importer, you also may create csv files to write csv entities * and overwrite global configurations * * @return array */ public function csvConfigurations() { return [ 'mappings' => [//...], 'csv_files' => [//...], 'config' => [ 'mutex_lock_time' => 500, 'memory_limit' => 256, 'mutex_lock_key' => 'my-key', 'input_encoding' => 'cp1252', 'output_encoding' => 'UTF-8', 'csv_date_format' => 'm/d/Y', 'delimiter' => ';', 'enclosure' => '\'', 'escape' => '\\', 'newline' => "\n", /* |-------------------------------------------------------------------------- | Progress bar messages |-------------------------------------------------------------------------- */ 'does_not_running' => 'Something does not run', 'initialization' => 'Init', 'progress' => 'Something running', 'final_stage' => 'After the import had finished', 'finished' => 'Please click to the `finish` button to proceed', 'final' => 'Something successfully finished!' ] ]; } }
- 具有setter的配置
(new \App\CsvImporters\MyImporter()) ->setCsvFile('my_huge_csv_with_1000k_lines.csv') ->setCsvDateFormat('y-m-d') ->setDelimiter('d') ->setEnclosure('e') ->setEscape("x") ->setInputEncoding('cp1252') ->setOutputEncoding('UTF-8') ->setNewline('newline') ->run();
从csv标题到定义的映射和反向数组转换
听起来很糟糕,还是直接展示它是如何工作的吧
假设我们有一个具有这种结构的csv文件
name,some_weird_header
John,some_weird_data
然后我们创建一个导入类并定义mappings,在这种情况下,我们只对name字段(标题)感兴趣
class GuitarsCsvImporter extends BaseCsvImporter { /** * Specify mappings and rules for the csv importer, you also may create csv files to write csv entities * and overwrite global configurations * * @return array */ public function csvConfigurations() { return [ 'mappings' => [ 'name' => [] // <- defined mappings, we only need data from this column ] ]; } /** * Will be executed for a csv line if it passed validation * * @param $item * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException * @return void */ public function handle($item) { /* $item contains ['name' => 'John', 'some_weird_header' => 'some_weird_data'] so the $item will have all columns inside, so we need extract only columns we need, which was defined inside csv configurations mappings array */ $dataOnlyFromDefinedFields = $this->extractDefinedFields($item); // will return ['name' => 'John'] /* Assume we need to do some manipulations with the $item array and then write it to the `valid_entities.csv` we need to make sure that data inside formatted array are match headers inside the csv, we can do it with this `toCsvHeaders($item)` method: */ // will return ['name' => 'John', 'some_weird_header' => null] $csvHeadersData = $this->toCsvHeaders($dataOnlyFromDefinedFields); $this->insertTo('valid_entities', $csvHeadersData); } }
互斥键连接
有些情况下,您需要能够在同一时间运行多个类似的导入进程,例如,您数据库中有guitars和guitar_companies表,以及两个csv文件ltd_guitars.csv和black_machine_guitars.csv,您使用相同的导入类对这两个csv进行操作,但由于导入过程被锁定,您无法同时导入两个,在这种情况下使用互斥键连接,为每个guitar company拥有不同的互斥键
class GuitarsCsvImporter extends BaseCsvImporter { //.. protected $guitarCompany; public function setCompany(\App\GuitarCompany $guitarCompany) { $this->guitarCompany = $guitarCompany; $this->concatMutexKey($guitarCompany->id); return $this; } /** * Will be executed for a csv line if it passed validation * * @param $item * @throws \RGilyov\CsvImporter\Exceptions\CsvImporterException * @return void */ public function handle($item) { \App\Guitars::create( array_merge(['guitar_company_id' => $this->guitarCompany->id], $this->extractDefinedFields($item)) ); } }
现在您可以在同一时间运行每个公司的导入器。但不是同一个公司。
自定义过滤器
如上所述,在基本用法章节中,csv导入器有3种类型的过滤器,您可以针对每个csv字段(标题)指定这些过滤器,但有时您需要做一些更复杂的事情,例如
-
如果给定的csv有字段(标题)
A或字段(标题)B,并且如果两者都缺失则抛出标题验证错误,在这种情况下参数required由于csv导入将检查具有参数的每个字段(标题),这些参数是AND逻辑,在这种情况下您需要创建headers filter。 -
或者例如,您需要对特定字段的全部值进行白名单检查,这是一种新的
validation rule(filter)。 -
或者您可能想要对csv值进行一些高级转换,在这种情况下您需要创建
cast filter。
自定义标题过滤器
要创建自定义标题过滤器,您需要创建一个继承自\RGilyov\CsvImporter\BaseHeadersFilter的类,或者运行php artisan make:csv-importer-headers-filter MyHeadersFilter,这将创建一个具有基本设置的MyHeadersFilter.php文件,并将其放在app/CsvImporters/HeadersFilters/文件夹中
<?php namespace App\CsvImporters\HeadersFilters; use RGilyov\CsvImporter\BaseHeadersFilter; class MyHeadersFilter extends BaseHeadersFilter { /** * Specify error message * * @var string */ public $errorMessage = 'The csv must contain either `name` field either `first_name` and `last_name` fields'; /** * @param array $csvHeaders * @return bool */ public function filter(array $csvHeaders) { if (isset($csvHeaders['name']) || (isset($csvHeaders['first_name']) && isset($csvHeaders['last_name']))) { return true; } return false; } }
过滤器具有属性 errorMessage,您可以指定在导入过程开始后,当 filter 方法返回 false 时抛出的错误消息。您还可以指定要返回的过滤器名称,或检查其是否存在,或将其重置等,要这样做,您需要指定 protected $name 属性并设置您想要的任何名称,默认情况下为类的简短名称,在这种情况下为 MyHeadersFilter
要为导入器注册新的头过滤器,您需要使用 addHeadersFilter() 静态函数
\App\CsvImporters\MyImporter::addHeadersFilter(new \App\CsvImporters\HeadersFilters\MyHeadersFilter()); // you may also use closure, you can specify name by passing second argument, otherwise it will be called `filter` \App\CsvImporters\MyImporter::addHeadersFilter(function ($csvHeaders) { if (isset($csvHeaders['A']) || isset($csvHeaders['B'])) { return true; } return false; }, 'ab-filter'); // you may add multiple filters with one method $ABFilter = function ($csvHeaders) { if (isset($csvHeaders['A']) || isset($csvHeaders['B'])) { return true; } return false; } $myHeadersFilter = new \App\CsvImporters\HeadersFilters\MyHeadersFilter(); \App\CsvImporters\MyImporter::addHeadersFilters($ABFilter, $myHeadersFilter);
当然,您可以在导入过程中清除、获取、取消设置和检查过滤器
\App\CsvImporters\MyImporter::headersFilterExists('MyHeadersFilter'); // will return `true` \App\CsvImporters\MyImporter::getHeadersFilter('MyHeadersFilter'); // will return the filter object \App\CsvImporters\MyImporter::getHeadersFilters(); // will return array with all filter objects \App\CsvImporters\MyImporter::unsetHeadersFilter('MyHeadersFilter'); // will return `true` \App\CsvImporters\MyImporter::flushHeadersFilters(); // will return empty array // example case if ($request->get('without_filters')) { \App\CsvImporters\MyImporter::flushHeadersFilters(); }
自定义验证过滤器
要创建自定义验证过滤器,您需要创建一个扩展 \RGilyov\CsvImporter\BaseValidationFilter 的类,或者运行 php artisan make:csv-importer-validation-filter MyValidationFilter,这将创建一个包含基本设置的 MyValidationFilter.php 文件,位于 app/CsvImporters/ValidationFilters/ 目录中
<?php namespace App\CsvImporters\ValidationFilters; use RGilyov\CsvImporter\BaseValidationFilter; class MyValidationFilter extends BaseValidationFilter { /** * @var string */ protected $name = 'bad_word_validation'; /** * @param mixed $value * @return bool */ public function filter($value) { if (strpos($value, 'bad_word') !== false) { return false; } return true; } }
对于头过滤器属性 name 并不重要,但对于验证过滤器来说,指定它是非常有用的,因为我们将需要在 csvConfigurations() 方法中的 validation 属性内设置它,对于字段(头):'name' => ['validation' => 'bad_word_validation']
但是,有时您可能想要创建一个全局的验证过滤器,以便能够验证整个 CSV 实体数组,例如,只有当 CSV 行不包含空的用户名或用户名和姓氏都不为空时才有效,要获取所有 CSV 列的数组而不是仅仅一个值,您需要设置 public $global = true,当然,无需在导入类 CSV 配置中指定此类验证过滤器
<?php namespace App\CsvImporters\ValidationFilters; use RGilyov\CsvImporter\BaseValidationFilter; class MyGlobalValidationFilter extends BaseValidationFilter { /** * @var string */ protected $name = 'global_validation'; /** * @var string */ public $global = true; /** * @param mixed $value * @return bool */ public function filter($value) // we will get array here, coz $this->global set to `true` { if (empty($value['name']) || (empty($value['first_name']) && empty($value['last_name']))) { return false; } return true; } }
所有过滤器的操作与 自定义头过滤器 章节中描述的操作类似
\App\CsvImporters\MyImporter::addValidationFilter(new \App\CsvImporters\ValidationFilters\MyValidationFilter()); // closure validation filters are global \App\CsvImporters\MyImporter::addValidationFilter(function ($item) { if (!empty($csvHeaders['A']) || !empty($csvHeaders['B'])) { return true; } return false; }, 'not-empty'); // you may add multiple filters with one method $notEmpty = function ($item) { if (!empty($item['A']) || !empty($item['B'])) { return true; } return false; } $myValidationFilter = new \App\CsvImporters\ValidationFilters\MyValidationFilter(); $myGlobalValidationFilter = new \App\CsvImporters\ValidationFilters\MyGlobalValidationFilter(); \App\CsvImporters\MyImporter::addValidationFilters($notEmpty, $myValidationFilter, $myGlobalValidationFilter); ///////////////////////////////////////////////////////////////////////////// \App\CsvImporters\MyImporter::validationFilterExists('bad_word_validation'); // will return `true` \App\CsvImporters\MyImporter::getValidationFilter('global_validation'); // will return instance of MyGlobalValidationFilter class \App\CsvImporters\MyImporter::getValidationFilters(); // will return array with all filter objects \App\CsvImporters\MyImporter::unsetValidationFilter('bad_word_validation'); // will return `true` \App\CsvImporters\MyImporter::flushValidationFilters(); // will return empty array // example case if ($request->get('without_filters')) { \App\CsvImporters\MyImporter::flushValidationFilters(); }
WARNING!!!
All closure validation filters are global.
WARNING!!!
If you will set not related validation rule(filter) for a field(header) and not specify and register
a custom validation filter for that the RGilyov\CsvImporter\Exceptions\ImportValidationException will be thrown
自定义转换过滤器
要创建自定义转换过滤器,您需要创建一个扩展 \RGilyov\CsvImporter\BaseCastFilter 的类,或者运行 php artisan make:csv-importer-cast-filter MyCastFilter,这将创建一个包含基本设置的 MyCastFilter.php 文件,位于 app/CsvImporters/CastFilters/ 目录中
<?php namespace App\CsvImporters\CastFilters; use RGilyov\CsvImporter\BaseCastFilter; class MyCastFilter extends BaseCastFilter { protected $name = 'lowercase'; /** * @param $value * @return mixed */ public function filter($value) { return strtolower($value); } }
与验证过滤器一样,name 对于转换过滤器也很重要,因为我们将需要在 csvConfigurations() 映射中设置它,对于字段(头):'field' => ['cast' => 'lowercase']
所有过滤器的操作与 自定义头过滤器 和 自定义验证过滤器 章节中描述的操作类似
\App\CsvImporters\MyImporter::addCastFilter(new \App\CsvImporters\CastFilters\MyCastFilter()); \App\CsvImporters\MyImporter::addCastFilter(function ($value) { return htmlspecialchars($value); }); $htmlentities = function ($value) { return htmlentities($value); } $myCastFilter = new \App\CsvImporters\CastFilters\MyCastFilter(); \App\CsvImporters\MyImporter::addCastFilters($htmlentities, $myCastFilter); ///////////////////////////////////////////////////////////////////////////// \App\CsvImporters\MyImporter::castFilterExists('lowercase'); // will return `true` \App\CsvImporters\MyImporter::getCastFilter('lowercase'); // will return instance of MyCastFilter class \App\CsvImporters\MyImporter::getCastFilters(); // will return array with all filter objects \App\CsvImporters\MyImporter::unsetCastFilter('lowercase'); // will return `true` \App\CsvImporters\MyImporter::flushCastFilters(); // will return empty array // example case if ($request->get('without_filters')) { \App\CsvImporters\MyImporter::flushCastFilters(); }
注册自定义过滤器的最佳方式
我认为注册自定义过滤器的最佳方式是使用服务提供者:https://laravel.net.cn/docs/5.4/providers