csskevin / tabledude
帮助您使用 PHP 处理 HTML 表格,并将其保存为关联数组等。
Requires
- php: >=7.3.0
Requires (Dev)
- phpunit/phpunit: ^8.4@dev
This package is auto-updated.
Last update: 2024-09-24 04:10:34 UTC
README
TableDude 解析、转换和分析 HTML 表格。此软件是用纯 PHP 编写的,不使用任何外部库(除了用于测试的 PHPUnit)。
兼容性
您可以使用此软件在 PHP 版本 >= 7
上。
它可能与旧版本兼容,但我尚未对其进行测试。
快速入门
要安装此软件,您可以使用 composer 或手动安装。
Composer 方式
composer require csskevin/tabledude
<?php // If you are working with composer require __DIR__ . '/vendor/autoload.php'; ?>
手动方式
git clone https://github.com/csskevin/TableDude.git
<?php // If you installed this software manual require __DIR__ . "/<pathToTableDude>/src/autoload.php"; ?>
示例
require __DIR__ . "/../src/autoload.php"; // Fetching HTML content $content = file_get_contents("https://github.com/csskevin/TableDude/"); $simpleparser = new \TableDude\Parser\SimpleParser($content); // Parses HTML Tables to PHP Array $tables = $simpleparser->parseHTMLTables(); foreach($tables as $table) { // Creating an instance of horizontal table $horizontalTable = new \TableDude\Converter\HorizontalTable($table); // Setting Header Row Index $horizontalTable->setHeaderRowIndex(0); $groupedTable = $horizontalTable->getGroupedTable(); foreach($groupedTable as $row) { echo "Type: " . $row["Type"] . "\n"; echo "Name: " . $row["Name"] . "\n"; echo "Commit Message: " . $row["Latest commit message"] . "\n"; echo "Commit Time: " . $row["Commit time"] . "\n"; echo "----------------\n"; } }
您还可以通过获取它们的指纹来识别表格标题。
require __DIR__ . "/../src/autoload.php"; // Fetching HTML content $content = file_get_contents("https://w3schools.org.cn/html/html_tables.asp"); $simpleparser = new \TableDude\Parser\SimpleParser($content); $tables = $simpleparser->parseHTMLTables(); foreach($tables as $table) { // Creating an instance of horizontal table $horizontalTable = new \TableDude\Converter\HorizontalTable($table); // Setting Header Row Index $horizontalTable->setHeaderRowIndex(0); $groupedTable = $horizontalTable->getGroupedTable(); $fingerprint = $horizontalTable->getFingerprint(); // Identifying via fingerprint if($fingerprint === 3377504250) { print_r($groupedTable); } }
测试
此软件使用 PHPUnit 进行了测试。在 composer.json
中包含了一个测试脚本。
composer install
composer run-script test
文档
有四个主要组件
- 解析器(将 HTML 表格解析为 PHP 数组)
- 转换器(允许您转换和分组 PHP 数组)
- 分析器(分析表格)
- 工具(一些修改表格的工具)
解析器
\TableDude\Parser\SimpleParser
$htmlContent = "<html><body><table><tr><td>Cell<td></tr></table></body></html>"; // Initializes an SimpleParser instance with a HTML content. // __constructor($htmlContent); $simpleParser = new \TableDude\Parser\SimpleParser($htmlContent); // Returns an array of parsed html tables $parsedHTMLTables = $simpleParser->parseHTMLTables(); /* $parsedHTMLTables would be: array( // First Table array( // First row array( "Cell" ) ) ) */
转换器
有三种不同的转换器
- HorizontalTable
- VerticalTable
- MixedTable
\TableDude\Converter\HorizontalTable
此表格将简单的解析 HTML 表格转换为关联数组,其中键等于定义的标题行。
方法将在下面的示例中记录
$parsedHTMLTable = array( array( "Cell1 in Row1", "Cell2 in Row1", "Cell3 in Row1" ), array( "Cell1 in Row2", "Cell2 in Row2", "Cell3 in Row2" ), array( "Cell1 in Row3", "Cell2 in Row3", "Cell3 in Row3" ) ); // __constructor($parsedHTMLTable) $horizontalTable = \TableDude\Converter\HorizontalTable($parsedHTMLTable); // Sets the first array in parsedHTMLTable as header row /* You can also set the index to -1, then the last row will be selected */ $horizontalTable->setHeaderRowIndex(0); /* Header would now be array( "Cell1 in Row1", "Cell2 in Row1", "Cell3 in Row1" ) */ // convertes the array $groupedTable = $horizontalTable->getGroupedTable(); /* This would return array( array( "Cell1 in Row1" => "Cell1 in Row2", "Cell2 in Row1" => "Cell2 in Row2", "Cell3 in Row1" => "Cell3 in Row2" ), array( "Cell1 in Row1" => "Cell1 in Row3", "Cell2 in Row1" => "Cell2 in Row3", "Cell3 in Row1" => "Cell3 in Row3" ), ) */ // You can also get an fingerprint by the header of an table with, which will be return as an integer $fingerprint = $horizontalTable->getFingerprint(); // This can be used to identify tables by the header for the next time
\TableDude\Converter\VerticalTable
此表格将简单的解析 HTML 表格转换为关联数组,其中键等于定义的标题列。
方法将在下面的示例中记录
$parsedHTMLTable = array( array( "Cell1 in Row1", "Cell2 in Row1", "Cell3 in Row1" ), array( "Cell1 in Row2", "Cell2 in Row2", "Cell3 in Row2" ), array( "Cell1 in Row3", "Cell2 in Row3", "Cell3 in Row3" ) ); // __constructor($parsedHTMLTable) $verticalTable = \TableDude\Converter\VerticalTable($parsedHTMLTable); /* You can also set the index to -1, then the last column will be selected */ $verticalTable->setHeaderColumnIndex(0); // Sets the first column array in parsedHTMLTable as header column /* Header would now be array( "Cell1 in Row1", "Cell1 in Row2", "Cell1 in Row3" ) // convertes the array $groupedTable = $verticalTable->getGroupedTable(); /* This would return array( array( "Cell1 in Row1" => "Cell2 in Row1", "Cell1 in Row2" => "Cell2 in Row2", "Cell1 in Row3" => "Cell2 in Row3" ), array( "Cell1 in Row1" => "Cell3 in Row1", "Cell1 in Row2" => "Cell3 in Row2", "Cell1 in Row3" => "Cell3 in Row3" ), ) */ // You can also get an fingerprint by the header of an table with, which will be return as an integer $fingerprint = $verticalTable->getFingerprint(); // This can be used to identify tables by the header for the next time
\TableDude\Converter\MixedTable
此表格将简单的解析 HTML 表格转换为关联数组,其中键等于定义的标题行和列。
方法将在下面的示例中记录
$parsedHTMLTable = array( array( "Cell1 in Row1", "Cell2 in Row1", "Cell3 in Row1" ), array( "Cell1 in Row2", "Cell2 in Row2", "Cell3 in Row2" ), array( "Cell1 in Row3", "Cell2 in Row3", "Cell3 in Row3" ) ); // __constructor($parsedHTMLTable) $mixedTable = \TableDude\Converter\VerticalTable($parsedHTMLTable); /* You can also set the index to -1, then the last row will be selected */ $mixedTable->setHeaderRowIndex(0); $mixedTable->setHeaderColumnIndex(0); // Nests the vertical Table in the horizontal Table $mixedTable->nestVerticalTableInHorizontalTable(); // Nests the horizontal Table in the vertical Table // $mixedTable->nestHorizontalTableInVerticalTable(); // convertes the array $groupedTable = $mixedTable->getGroupedTable(); /* This would return array( "Cell1 in Row1" => array( "Cell1 in Row2" => "Cell2 in Row2", "Cell1 in Row3" => "Cell2 in Row3" ), "Cell2 in Row1" => array( "Cell1 in Row2" => "Cell3 in Row2", "Cell1 in Row3" => "Cell3 in Row3" ), ) */ // You can also get an fingerprint by the header of an table with, which will be return as an integer // In the mixed table there are different types of fingerprints // There are 6 constants to get the fingerprint of the header you wish /* // TD_FINGERPRINT_VERTICAL_HEADER Would take fingerprint of array( "Cell1 in Row2", "Cell1 in Row3" ) The cell which overlaps with the horizontal header is not included // TD_FINGERPRINT_HORIZONTAL_HEADER Would take fingerprint of array( "Cell2 in Row1", "Cell3 in Row2" ) The cell which overlaps with the vertical header is not included // TD_FINGERPRINT_MIXED_HEADER Would take fingerprint of array( array( "Cell1 in Row2", "Cell1 in Row3" ), array( "Cell2 in Row1", "Cell3 in Row2" ) ) The cell which overlaps with the vertical header is not included ------ The same as above but it includes the cell, which overlaps with the horizontal and the vertical header // TD_FINGERPRINT_VERTICAL_HEADER_WITH_CROSSED_CELL // TD_FINGERPRINT_HORIZONTAL_HEADER_WITH_CROSSED_CELL // TD_FINGERPRINT_MIXED_HEADER_WITH_CROSSED_CELL */ $fingerprint = $mixedTable->getFingerprint(); // This can be used to identify tables by the header for the next time
工具
\TableDude\Tools\ArrayTool
public static function swapArray($array);
交换数组
$array = array( array("1", "2", "3"), array("4", "5", "6") ); $swappedArray = \TableDude\Tools\ArrayTool::swapArray($array); /* Content of $swappedArray array( array("1", "4"), array("2", "5"), array("3", "6") ) */
public static function countLongestRowOfArrayTable($array);
返回最长行的长度
$array = array( array("1", "2", "3"), array("4", "5", "6", "7") ); $longestRow = \TableDude\Tools\ArrayTool::countLongestRowOfArrayTable($array); /* $longestRow would be 4 in this case */
public static function validateVerticalTable($table);
验证,是否为有效的垂直表格。
public static function validateHorizontalTable($table);
验证,是否为有效的水平表格。
public static function validateMixedTable($table);
验证,是否为有效的混合表格。
分析
\TableDude\Analysis\HeaderAnalyzation
public static function getFingerPrintOfArray($array)
返回数组的指纹。这对于通过标题识别表格非常有用。