大雅科 / php-dataframe
表格编辑库
v0.1.1
2021-01-02 11:05 UTC
Requires
- php: ^7.2
- phpoffice/phpspreadsheet: 1.15.*
Requires (Dev)
- phpunit/phpunit: 8.5.10
README
这是一个轻量级的PHP数据框处理包。
php-dataframe包不是一个数据科学包,它只是尝试复制pandas DataFrame的主要功能。
软件需求
使用php-dataframe进行开发需要PHP版本7.2或更高。
安装
使用 composer 在项目中安装PhpSpreadsheet
composer require nagyatka/php-dataframe
用法
创建数据框
使用二维数组创建简单的数据框对象
>>> $df = new DataFrame([[0,1],[2,3]]); >>> print($df); |0 |1 | ============================ 0 |0 |1 | 1 |2 |3 | Shape: 2x2
使用二维数组、列名和索引创建简单的数据框对象
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"]); >>> print($df); |a |b | ============================ 0 |0 |1 | 1 |2 |3 | Shape: 2x2
使用二维数组、列名和索引创建简单的数据框对象
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]); >>> print($df); |a |b | ============================ e |0 |1 | f |2 |3 | Shape: 2x2
获取数据框的基本信息
数据框存储一个二维PHP数组以及相关的列名和索引。
访问原始二维数组
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print_r($df->values); Array ( [0] => Array ( [a] => 0 [b] => 1 [c] => 2 ) [1] => Array ( [a] => 3 [b] => 4 [c] => 5 ) )
获取形状信息
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print("Number of rows: " . $df->shape[0]); Number of rows: 2 >>> print("Number of columns: " . $df->shape[1]); Number of columns: 3
获取/设置列名
>>> print_r($df->getColumnNames()); Array ( [0] => a [1] => b [2] => c ) >>> $df->setColumnNames(["x", "y", "z"]); >>> print($df); |x |y |z | ======================================= e |0 |1 |2 | f |3 |4 |5 | Shape: 2x3
获取/设置索引
>>> print_r($df->getIndices()); Array ( [0] => e [1] => f ) >>> $df->setIndices(["p", "q"]); >>> print($df); |x |y |z | ======================================= p |0 |1 |2 | q |3 |4 |5 | Shape: 2x3
索引和选择数据
选择数据框的一个列
选择列 "a"
,数据框对象返回一个Series对象。
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]); >>> print($df["a"]); Series(Name=a, length=2){[ e: 0, f: 2, ]}
通过列的索引选择列也行得通
>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]); >>> print($df[0]); Series(Name=a, length=2){[ e: 0, f: 2, ]}
选择数据框的多个列
因为PHP不支持数组对象作为键,所以需要使用cols
辅助函数来选择多列。
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print($df[\PHPDataFrame\cols(["b", "c"])]); |b |c | ============================ e |1 |2 | f |4 |5 | Shape: 2x2
您也可以使用列索引来获取子数据框
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print($df[\PHPDataFrame\cols([0,1])]); |a |b | ============================ e |0 |1 | f |3 |4 | Shape: 2x2
选择数据框的一行
您可以使用iloc
操作符选择数据框的一行。
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print($df->iloc["e"]); Series(Index=e, Length=3){[ a: 0, b: 1, c: 2, ]}
当然,也可以使用数字索引。
>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]); >>> print($df->iloc[0]); Series(Index=0, Length=3){[ a: 0, b: 1, c: 2, ]}
选择数据框的多行
因为PHP不支持数组对象作为键,所以需要使用inds
辅助函数来选择多行。
>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "g"]); >>> print($df->iloc[\PHPDataFrame\inds(["g","f"])]); |a |b |c | ======================================= g |6 |7 |8 | f |3 |4 |5 | Shape: 2x3 >>> print($df->iloc[\PHPDataFrame\inds([0,1])]); |a |b |c | ======================================= e |0 |1 |2 | f |3 |4 |5 | Shape: 2x3
由于索引值不必是唯一的,使用带有标签索引的iloc
操作可能会导致多行。
>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "e"]); >>> print($df->iloc["e"]); |a |b |c | ======================================= e |0 |1 |2 | e |6 |7 |8 | Shape: 2x3