大雅科/php-dataframe

v0.1.1 2021-01-02 11:05 UTC

This package is auto-updated.

Last update: 2024-08-29 05:43:41 UTC


README

这是一个轻量级的PHP数据框处理包。

php-dataframe包不是一个数据科学包,它只是尝试复制pandas DataFrame的主要功能。

软件需求

使用php-dataframe进行开发需要PHP版本7.2或更高。

安装

使用 composer 在项目中安装PhpSpreadsheet

composer require nagyatka/php-dataframe

用法

创建数据框

使用二维数组创建简单的数据框对象

>>> $df = new DataFrame([[0,1],[2,3]]);
>>> print($df);

     |0         |1         |
============================
0    |0         |1         |
1    |2         |3         |
Shape: 2x2

使用二维数组、列名和索引创建简单的数据框对象

>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"]);
>>> print($df);

     |a         |b         |
============================
0    |0         |1         |
1    |2         |3         |
Shape: 2x2

使用二维数组、列名和索引创建简单的数据框对象

>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df);

     |a         |b         |
============================
e    |0         |1         |
f    |2         |3         |
Shape: 2x2

获取数据框的基本信息

数据框存储一个二维PHP数组以及相关的列名和索引。

访问原始二维数组

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print_r($df->values);

Array
(
    [0] => Array
        (
            [a] => 0
            [b] => 1
            [c] => 2
        )

    [1] => Array
        (
            [a] => 3
            [b] => 4
            [c] => 5
        )

)

获取形状信息

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print("Number of rows: " . $df->shape[0]);
Number of rows: 2
>>> print("Number of columns: " . $df->shape[1]);
Number of columns: 3

获取/设置列名

>>> print_r($df->getColumnNames());

Array
(
    [0] => a
    [1] => b
    [2] => c
)

>>> $df->setColumnNames(["x", "y", "z"]);
>>> print($df);

     |x         |y         |z         |
=======================================
e    |0         |1         |2         |
f    |3         |4         |5         |
Shape: 2x3

获取/设置索引

>>> print_r($df->getIndices());

Array
(
    [0] => e
    [1] => f
)

>>> $df->setIndices(["p", "q"]);
>>> print($df);

     |x         |y         |z         |
=======================================
p    |0         |1         |2         |
q    |3         |4         |5         |
Shape: 2x3

索引和选择数据

选择数据框的一个列

选择列 "a",数据框对象返回一个Series对象。

>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df["a"]);

Series(Name=a, length=2){[
	e: 0,
	f: 2,
]}

通过列的索引选择列也行得通

>>> $df = new DataFrame([[0,1],[2,3]], ["a", "b"], ["e", "f"]);
>>> print($df[0]);

Series(Name=a, length=2){[
	e: 0,
	f: 2,
]}

选择数据框的多个列

因为PHP不支持数组对象作为键,所以需要使用cols辅助函数来选择多列。

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df[\PHPDataFrame\cols(["b", "c"])]);

     |b         |c         |
============================
e    |1         |2         |
f    |4         |5         |
Shape: 2x2

您也可以使用列索引来获取子数据框

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df[\PHPDataFrame\cols([0,1])]);

     |a         |b         |
============================
e    |0         |1         |
f    |3         |4         |
Shape: 2x2

选择数据框的一行

您可以使用iloc操作符选择数据框的一行。

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df->iloc["e"]);

Series(Index=e, Length=3){[
	a: 0,
	b: 1,
	c: 2,
]}

当然,也可以使用数字索引。

>>> $df = new DataFrame([[0,1,2], [3,4,5]], ["a", "b", "c"], ["e", "f"]);
>>> print($df->iloc[0]);

Series(Index=0, Length=3){[
	a: 0,
	b: 1,
	c: 2,
]}

选择数据框的多行

因为PHP不支持数组对象作为键,所以需要使用inds辅助函数来选择多行。

>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "g"]);
>>> print($df->iloc[\PHPDataFrame\inds(["g","f"])]);

     |a         |b         |c         |
=======================================
g    |6         |7         |8         |
f    |3         |4         |5         |
Shape: 2x3

>>> print($df->iloc[\PHPDataFrame\inds([0,1])]);

     |a         |b         |c         |
=======================================
e    |0         |1         |2         |
f    |3         |4         |5         |
Shape: 2x3

由于索引值不必是唯一的,使用带有标签索引的iloc操作可能会导致多行。

>>> $df = new DataFrame([[0,1,2], [3,4,5], [6,7,8]], ["a", "b", "c"], ["e", "f", "e"]);
>>> print($df->iloc["e"]);

     |a         |b         |c         |
=======================================
e    |0         |1         |2         |
e    |6         |7         |8         |
Shape: 2x3