avadim/fast-excel-reader

轻量级且非常快速的PHP XLSX电子表格阅读器


README

GitHub Release Packagist Downloads GitHub License Static Badge

FastExcelReader

FastExcelReader 是 FastExcelPhp 项目的一部分,该项目包括

简介

这个库设计得轻量级,超级快,并且占用内存最少。

FastExcelReader 可以读取XLSX格式的Excel兼容电子表格(Office 2007+)。它只读取数据,但读取速度非常快,并且占用内存最少。

特性

  • 仅支持XLSX格式(Office 2007+)的多个工作表
  • 支持自动检测货币/数值/日期类型
  • 支持自动格式化和自定义日期值的格式化器
  • 库可以定义和从XLSX文件中提取图像
  • 库可以读取单元格的样式选项 - 格式化模式、颜色、边框、字体等。

安装

使用 composerFastExcelReader 安装到您的项目中

composer require avadim/fast-excel-reader

跳转到

用法

您可以在 /demo 文件夹中找到更多示例

简单示例

demo file

use \avadim\FastExcelReader\Excel;

$file = __DIR__ . '/files/demo-00-simple.xlsx';

// Open XLSX-file
$excel = Excel::open($file);
// Read all values as a flat array from current sheet
$result = $excel->readCells();

您将得到这个数组

Array
(
    [A1] => 'col1'
    [B1] => 'col2'
    [A2] => 111
    [B2] => 'aaa'
    [A3] => 222
    [B3] => 'bbb'
)
// Read all rows in two-dimensional array (ROW x COL)
$result = $excel->readRows();

您将得到这个数组

Array
(
    [1] => Array
        (
            ['A'] => 'col1'
            ['B'] => 'col2'
        )
    [2] => Array
        (
            ['A'] => 111
            ['B'] => 'aaa'
        )
    [3] => Array
        (
            ['A'] => 222
            ['B'] => 'bbb'
        )
)
// Read all columns in two-dimensional array (COL x ROW)
$result = $excel->readColumns();

您将得到这个数组

Array
(
    [A] => Array
        (
            [1] => 'col1'
            [2] => 111
            [3] => 222
        )

    [B] => Array
        (
            [1] => 'col2'
            [2] => 'aaa'
            [3] => 'bbb'
        )

)

循环中逐行读取值

$sheet = $excel->sheet();
foreach ($sheet->nextRow() as $rowNum => $rowData) {
    // $rowData is array ['A' => ..., 'B' => ...]
    $addr = 'C' . $rowNum;
    if ($sheet->hasImage($addr)) {
        $sheet->saveImageTo($addr, $fullDirectoryPath);
    }
    // handling of $rowData here
    // ...
}

// OR
foreach ($sheet->nextRow() as $rowNum => $rowData) {
    // handling of $rowData here
    // ...
    // get image list from current row
    $imageList = $sheet->getImageListByRow();
    foreach ($imageList as $imageInfo) {
        $imageBlob = $sheet->getImageBlob($imageInfo['address']);
    }
}

// OR
foreach ($sheet->nextRow(['A' => 'One', 'B' => 'Two'], Excel::KEYS_FIRST_ROW) as $rowNum => $rowData) {
    // $rowData is array ['One' => ..., 'Two' => ...]
    // ...
}

逐行读取的另一种方式

// Init internal read generator
$sheet->reset(['A' => 'One', 'B' => 'Two'], Excel::KEYS_FIRST_ROW);
// read the first row
$rowData = $sheet->readNextRow();
var_dump($rowData);

// read the next 3 rows
for ($i = 0; $i < 3; $i++) {
    $rowData = $sheet->readNextRow();
    var_dump($rowData);
}

// Reset internal generator and read all rows
$sheet->reset(['A' => 'One', 'B' => 'Two'], Excel::KEYS_FIRST_ROW);
$result = [];
while ($rowData = $sheet->readNextRow()) {
    $result[] = $rowData;
}
var_dump($result);

结果数组中的键

// Read rows and use the first row as column keys
$result = $excel->readRows(true);

您将得到这个结果

Array
(
    [2] => Array
        (
            ['col1'] => 111
            ['col2'] => 'aaa'
        )
    [3] => Array
        (
            ['col1'] => 222
            ['col2'] => 'bbb'
        )
)

可选的第二个参数指定结果数组的键

// Rows and cols start from zero
$result = $excel->readRows(false, Excel::KEYS_ZERO_BASED);

您将得到这个结果

Array
(
    [0] => Array
        (
            [0] => 'col1'
            [1] => 'col2'
        )
    [1] => Array
        (
            [0] => 111
            [1] => 'aaa'
        )
    [2] => Array
        (
            [0] => 222
            [1] => 'bbb'
        )
)

结果模式的允许值

可以与结果模式结合使用的附加选项

例如

$result = $excel->readRows(['A' => 'bee', 'B' => 'honey'], Excel::KEYS_FIRST_ROW | Excel::KEYS_ROW_ZERO_BASED);

您将得到这个结果

Array
(
    [0] => Array
        (
            [bee] => 111
            [honey] => 'aaa'
        )

    [1] => Array
        (
            [bee] => 222
            [honey] => 'bbb'
        )

)

高级示例

use \avadim\FastExcelReader\Excel;

$file = __DIR__ . '/files/demo-02-advanced.xlsx';

$excel = Excel::open($file);

$result = [
    'sheets' => $excel->getSheetNames() // get all sheet names
];

$result['#1'] = $excel
    // select sheet by name
    ->selectSheet('Demo1') 
    // select area with data where the first row contains column keys
    ->setReadArea('B4:D11', true)  
    // set date format
    ->setDateFormat('Y-m-d') 
    // set key for column 'C' to 'Birthday'
    ->readRows(['C' => 'Birthday']); 

// read other arrays with custom column keys
// and in this case we define range by columns only
$columnKeys = ['B' => 'year', 'C' => 'value1', 'D' => 'value2'];
$result['#2'] = $excel
    ->selectSheet('Demo2', 'B:D')
    ->readRows($columnKeys);

$result['#3'] = $excel
    ->setReadArea('F5:H13')
    ->readRows($columnKeys);

您可以通过在工作簿中定义的名称设置读取区域。例如,如果工作簿中定义了名为 Headers 的名称,其范围为 Demo1!$B$4:$D$4,那么您可以通过此名称读取单元格

$excel->setReadArea('Values');
$cells = $excel->readCells();

注意,由于该值包含工作表名称,因此该工作表变为默认工作表。

您可以在工作表中设置读取区域

$sheet = $excel->getSheet('Demo1')->setReadArea('Headers');
$cells = $sheet->readCells();

但是,如果您试图在另一个工作表上使用此名称,您将得到一个错误

$sheet = $excel->getSheet('Demo2')->setReadArea('Headers');
// Exception: Wrong address or range "Values"

如果需要,您可以使用带回调函数的 readSheetCallback() 方法完全控制读取过程

use \avadim\FastExcelReader\Excel;

$excel = Excel::open($file);

$result = [];
$excel->readCallback(function ($row, $col, $val) use(&$result) {
    // Any manipulation here
    $result[$row][$col] = (string)$val;

    // if the function returns true then data reading is interrupted  
    return false;
});
var_dump($result);

日期格式化器

默认情况下,所有日期时间值都返回为时间戳。但您可以使用 dateFormatter() 改变此行为

demo date

$excel = Excel::open($file);
$sheet = $excel->sheet()->setReadArea('B5:D7');
$cells = $sheet->readCells();
echo $cells['C5']; // -2205187200

// If argument TRUE is passed, then all dates will be formatted as specified in cell styles
// IMPORTANT! The datetime format depends on the locale
$excel->dateFormatter(true);
$cells = $sheet->readCells();
echo $cells['C5']; // '14.02.1900'

// You can specify date format pattern
$excel->dateFormatter('Y-m-d');
$cells = $sheet->readCells();
echo $cells['C5']; // '1900-02-14'

// set date formatter function
$excel->dateFormatter(fn($value) => gmdate('m/d/Y', $value));
$cells = $sheet->readCells();
echo $cells['C5']; // '02/14/1900'

// returns DateTime instance
$excel->dateFormatter(fn($value) => (new \DateTime())->setTimestamp($value));
$cells = $sheet->readCells();
echo get_class($cells['C5']); // 'DateTime'

// custom manipulations with datetime values
$excel->dateFormatter(function($value, $format, $styleIdx) use($excel) {
    // get Excel format of the cell, e.g. '[$-F400]h:mm:ss\ AM/PM'
    $excelFormat = $excel->getFormatPattern($styleIdx);

    // get format converted for use in php functions date(), gmdate(), etc
    // for example the Excel pattern above would be converted to 'g:i:s A'
    $phpFormat = $excel->getDateFormatPattern($styleIdx);
    
    // and if you need you can get value of numFmtId for this cell
    $style = $excel->getCompleteStyleByIdx($styleIdx, true);
    $numFmtId = $style['format-num-id'];
    
    // do something and write to $result
    $result = gmdate($phpFormat, $value);

    return $result;
});

有时,如果单元格的格式指定为日期但不含日期,则库可能错误地解释此值。为了避免这种情况,您可以禁用日期格式化

demo date

在此,单元格B1包含字符串 "3.2",单元格B2包含日期 2024-02-03,但两个单元格都设置为日期格式

$excel = Excel::open($file);
// default mode
$cells = $sheet->readCells();
echo $cell['B1']; // -2208798720 - the library tries to interpret the number 3.2 as a timestamp
echo $cell['B2']; // 1706918400 - timestamp of 2024-02-03

// date formatter is on
$excel->dateFormatter(true);
$cells = $sheet->readCells();
echo $cell['B1']; // '03.01.1900'
echo $cell['B2']; // '3.2'

// date formatter is off
$excel->dateFormatter(false);
$cells = $sheet->readCells();
echo $cell['B1']; // '3.2'
echo $cell['B2']; // 1706918400 - timestamp of 2024-02-03

图像函数

// Returns count images on all sheets
$excel->countImages();

// Returns count images on sheet
$sheet->countImages();

// Returns image list of sheet
$sheet->getImageList();

// Returns image list of specified row
$sheet->getImageListByRow($rowNumber);

// Returns TRUE if the specified cell has an image
$sheet->hasImage($cellAddress);

// Returns mime type of image in the specified cell (or NULL)
$sheet->getImageMimeType($cellAddress);

// Returns inner name of image in the specified cell (or NULL)
$sheet->getImageName($cellAddress);

// Returns an image from the cell as a blob (if exists) or NULL
$sheet->getImageBlob($cellAddress);

// Writes an image from the cell to the specified filename
$sheet->saveImage($cellAddress, $fullFilenamePath);

// Writes an image from the cell to the specified directory
$sheet->saveImageTo($cellAddress, $fullDirectoryPath);

单元格值类型

库尝试确定单元格值的类型,并且在大多数情况下它做得正确。因此,您得到数字或字符串值。日期值默认返回为时间戳。但您可以通过设置日期格式(参见 date() PHP 函数的格式选项)来改变此行为。

$excel = Excel::open($file);
$result = $excel->readCells();
print_r($result);

上面的示例将输出

Array
(
    [B2] => -2205187200
    [B3] => 6614697600
    [B4] => -6845212800
)
$excel = Excel::open($file);
$excel->setDateFormat('Y-m-d');
$result = $excel->readCells();
print_r($result);

上面的示例将输出

Array
(
    [B2] => '1900-02-14'
    [B3] => '2179-08-12'
    [B4] => '1753-01-31'
)

如何获取单元格样式的完整信息

通常读取函数只返回单元格值,但您也可以以样式读取值。在这种情况下,对于每个单元格,返回的将不是一个标量值,而是一个数组,如 ['v' => 标量值, 's' => 样式数组, 'f' => 公式]

$excel = Excel::open($file);

$sheet = $excel->sheet();

$rows = $sheet->readRowsWithStyles();
$columns = $sheet->readColumnsWithStyles();
$cells = $sheet->readCellsWithStyles();

$cells = $sheet->readCellsWithStyles();

或者您可以选择只读取样式(不包含值)

$cells = $sheet->readCellStyles();
/*
array (
  'format' => 
  array (
    'format-num-id' => 0,
    'format-pattern' => 'General',
  ),
  'font' => 
  array (
    'font-size' => '10',
    'font-name' => 'Arial',
    'font-family' => '2',
    'font-charset' => '1',
  ),
  'fill' => 
  array (
    'fill-pattern' => 'solid',
    'fill-color' => '#9FC63C',
  ),
  'border' => 
  array (
    'border-left-style' => NULL,
    'border-right-style' => NULL,
    'border-top-style' => NULL,
    'border-bottom-style' => NULL,
    'border-diagonal-style' => NULL,
  ),
)
 */
$cells = $sheet->readCellStyles(true);
/*
array (
  'format-num-id' => 0,
  'format-pattern' => 'General',
  'font-size' => '10',
  'font-name' => 'Arial',
  'font-family' => '2',
  'font-charset' => '1',
  'fill-pattern' => 'solid',
  'fill-color' => '#9FC63C',
  'border-left-style' => NULL,
  'border-right-style' => NULL,
  'border-top-style' => NULL,
  'border-bottom-style' => NULL,
  'border-diagonal-style' => NULL,
)
 */

但我们不推荐在大型文件中使用这些方法

一些有用的方法

Excel对象

  • getSheetNames() -- 返回所有工作表名称数组
  • sheet(?string $name = null) -- 返回默认或指定的工作表
  • getSheet(string $name, ?string $areaRange = null, ?bool $firstRowKeys = false) -- 通过名称获取工作表
  • getSheetById(int $sheetId, ?string $areaRange = null, ?bool $firstRowKeys = false) -- 通过ID获取工作表
  • getFirstSheet(?string $areaRange = null, ?bool $firstRowKeys = false) -- 获取第一个工作表
  • selectSheet(string $name, ?string $areaRange = null, ?bool $firstRowKeys = false) -- 通过名称选择默认工作表并返回它
  • selectSheetById(int $sheetId, ?string $areaRange = null, ?bool $firstRowKeys = false) -- 通过ID选择默认工作表并返回它
  • selectFirstSheet(?string $areaRange = null, ?bool $firstRowKeys = false) -- 将第一个工作表设置为默认并返回它
  • getDefinedNames() -- 返回工作簿中定义的名称

工作表对象

  • name() -- 返回字符串名称
  • isActive() -- 活跃的工作表
  • isHidden() -- 如果工作表被隐藏
  • isVisible() -- 如果工作表可见
  • state() -- 返回工作表字符串状态(用于isHidden()isVisible()
  • dimension() -- 返回工作表属性中的默认工作区域尺寸
  • countRows() -- 从尺寸计数行
  • countColumns() -- 从尺寸计数列
  • firstRow() -- 第一行编号
  • firstCol() -- 第一列字母
  • readFirstRow() -- 以数组形式返回第一行单元格的值
  • readFirstRowWithStyles() -- 以数组形式返回第一行单元格的值和样式

您想支持FastExcelReader吗?

如果您觉得这个包很有用,您可以在GitHub上给我一个星。

或者您可以捐款给我 :)

  • USDT (TRC20) TSsUFvJehQBJCKeYgNNR1cpswY6JZnbZK7
  • USDT (ERC20) 0x5244519D65035aF868a010C2f68a086F473FC82b
  • ETH 0x5244519D65035aF868a010C2f68a086F473FC82b