fiedsch / datamanagement

一个从文本文件中读取数据的辅助库

1.3.1 2024-08-25 06:51 UTC

README

用于从文本文件中读取数据的PHP类和辅助函数

  • Data\FileReader 读取文本文件
  • Data\CsvFileReader 读取CSV文件
  • Data\FixedWidthReader 读取包含固定宽度列数据的文本文件
  • Data\Helper 包含诸如 SC() 的辅助函数,该函数将电子表格列名转换为(例如)CsvFileReader->getLine() 生成的数组索引

示例

处理CSV数据

<?php

require __DIR__ . '/../vendor/autoload.php';

use Fiedsch\Data\File\CsvReader;
 
try {
 
  $reader = new CsvReader("testdata.csv", ";");

  // Read and handle all lines containing data.

  while (($line = $reader->getLine()) !== null) {
    // ignore empty lines (i.e. lines containing no data)
    if (!$reader->isEmpty($line)) {
      print_r($line);
    }
  }
  // $reader->close(); // not needed as it will be automatically called when there are no more lines

} catch (Exception $e) {
    print $e->getMessage() . "\n";
}

特性

从v0.3.2版本开始,典型的样板代码 "打开文件、读取每行非空数据、关闭文件" 可以用更花哨的方式编写。使用 getLine() 的可选参数

<?php

  while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
      print_r($line);
  }
  

数据增强

<?php
 
require __DIR__ . '/../vendor/autoload.php';
 
use Fiedsch\Data\File\CsvReader;
use Fiedsch\Data\Augmentation\Augmentor;
use Fiedsch\Data\Augmentation\Provider\TokenServiceProvider;
use Fiedsch\Data\File\CsvWriter;
  
try {

  $augmentor = new Augmentor();
 
  $augmentor->register(new TokenServiceProvider());
  
  $augmentor->addRule('token', function (Augmentor $augmentor, $data) {
     return [ 'token' => $augmentor['token']->getUniqueToken() ];
   });
  
   $reader = new CsvReader("testdata.csv", ";");
   
   $writer = new CsvWriter("testdata.augmented.txt", "\t");
   
   $header_written = false;
   
   while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
     $result = $augmentor->augment($line);
     if (!$header_written) {
        $writer->printLine(array_merge(['input_line'], array_keys($result), $reader->getHeader()));
        $header_written = true;
     }
     $writer->printLine(array_merge([$reader->getLineNumber()], $result, $line));
   }
   
   $writer->close();
 
 } catch (Exception $e) {
     print $e->getMessage() . "\n";
 }

创建标记

方法一:让 TokenCreator 确保我们拥有唯一的标记

<?php
 
require __DIR__ . '/../vendor/autoload.php';
 
use Fiedsch\Data\Utility\TokenCreator;
use Fiedsch\Data\File\Writer;


$creator = new TokenCreator(10, TokenCreator::UPPER);

$output = new Writer('mytokens.txt');
$numTokens = 1000;

while ($numTokens-- > 0) {
 $token = $creator->getUniqueToken();
 $output->printLine([$token]);
}
$output->close();

方法二:先生成标记,然后检查它们是否唯一。对于大量标记,这可能会更快,并且消耗更少的资源

 // same as above, exept 
 // $token = $creator->getUniqueToken();
 // becomes
 $token = $creator->cretateToken();

检查生成的标记是否唯一

echo " both lines show the same numbers, there were no duplicate tokens"
wc -l mytokens.csv
sort mytokens.csv | uniq | wc -l