jotaelesalinas / php-mapreduce
PHP中的map-reduce策略的本地实现
v2.0.0
2022-07-31 04:55 UTC
Requires
- php: >=8.0
Requires (Dev)
- phpunit/phpunit: ^9.5
- squizlabs/php_codesniffer: ^3.7
This package is auto-updated.
Last update: 2024-09-12 18:43:16 UTC
README
PHP PSR-4兼容库,可轻松实现非分布式本地map-reduce。
安装
通过Composer
$ composer require jotaelesalinas/php-mapreduce
基本用法
require_once __DIR__ . '/vendor/autoload.php'; $source = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; $mapper = fn($item) => $item * 2; $reducer = fn($carry, $item) => ($carry ?? 0) + $item; $result = MapReduce\MapReduce::create() ->setInput($source) ->setMapper($mapper) ->setReducer($reducer) ->run(); print_r($result);
输出为
Array
(
[0] => 110
)
过滤器
$odd_numbers = fn($item) => $item % 2 === 0; $greater_than_10 = fn($item) => $item > 10; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reducer, ]) // only odd numbers are passed to the mapper function ->setPreFilter($odd_numbers) // only numbers greater than 10 are passed to the reducer function ->setPostFilter($greater_than_10) ->run(); print_r($result);
输出为
Array
(
[0] => 48
)
分组
按字段的值分组(适用于数组和对象)
$source = [ [ "first_name" => "Susanna", "last_name" => "Connor", "member" => "y", "age" => 20], [ "first_name" => "Adrian", "last_name" => "Smith", "member" => "n", "age" => 22], [ "first_name" => "Mike", "last_name" => "Mendoza", "member" => "n", "age" => 24], [ "first_name" => "Linda", "last_name" => "Duguin", "member" => "y", "age" => 26], [ "first_name" => "Bob", "last_name" => "Svenson", "member" => "n", "age" => 28], [ "first_name" => "Nancy", "last_name" => "Potier", "member" => "y", "age" => 30], [ "first_name" => "Pete", "last_name" => "Adams", "member" => "n", "age" => 32], [ "first_name" => "Susana", "last_name" => "Zommers", "member" => "y", "age" => 34], [ "first_name" => "Adrian", "last_name" => "Deville", "member" => "n", "age" => 36], [ "first_name" => "Mike", "last_name" => "Cole", "member" => "n", "age" => 38], [ "first_name" => "Mike", "last_name" => "Angus", "member" => "n", "age" => 40], ]; // mapper does nothing $mapper = fn($x) => $x; // number of persons and sum of ages $reduceAgeSum = function ($carry, $item) { if (is_null($carry)) { return [ 'count' => 1, 'age_sum' => $item['age'], ]; } $count = $carry['count'] + 1; $age_sum = $carry['age_sum'] + $item['age']; return compact('count', 'age_sum'); }; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reduceAgeSum, ]) // group by field 'member' ->setGroupBy('member') ->run(); print_r($result);
输出为
Array
(
[y] => Array
(
[count] => 4
[age_sum] => 110
)
[n] => Array
(
[count] => 7
[age_sum] => 220
)
)
按从每个项目生成的自定义值分组
$closestTen = fn($x) => floor($x['age'] / 10) * 10; $result = MapReduce\MapReduce::create([ "input" => $source, "mapper" => $mapper, "reducer" => $reduceAgeSum, ]) // group by age ranges of 10 ->setGroupBy($closestTen) ->run(); print_r($result);
输出为
Array
(
[20] => Array
(
[count] => 5
[age_sum] => 120
)
[30] => Array
(
[count] => 5
[age_sum] => 170
)
[40] => Array
(
[count] => 1
[age_sum] => 40
)
)
输入
MapReduce
接受任何类型为iterable
的数据作为输入。这意味着,数组和可遍历的,例如生成器。
当读取不适用于内存的大文件时,这非常方便。
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setInput(csvReadGenerator('myfile.csv')) ->run();
可以通过向setInput()
传递多个参数来指定多个输入,只要所有这些参数都是可迭代的
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setInput($arrayData, csvReadGenerator('myfile.csv')) ->run();
输出
MapReduce
可以配置为将最终数据写入一个或多个目的地。
每个目的地都必须是一个Generator
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setOutput(csvWriteGenerator('results.csv')) ->run();
也可以指定多个输出
$result = MapReduce\MapReduce::create([ "mapper" => $mapper, "reducer" => $reducer, ]) ->setOutput(csvWriteGenerator('results.csv'), consoleGenerator()) ->run();
为了帮助处理输入和输出生成器,建议使用jotaelesalinas/php-generators
包,但这不是强制性的。
您可以在examples文件夹下查看更多详细示例。
变更日志
请参阅CHANGELOG以获取更多关于最近更改的信息。
测试
$ composer test
贡献
请参阅CONTRIBUTING和CONDUCT以获取详细信息。
安全
如果您发现任何安全相关的问题,请直接通过@jotaelesalinas联系我,而不是使用问题跟踪器。
待办事项
- 添加事件以帮助查看大批次中的进度
- 添加文档
- 保险示例
- 适应新库
- 添加保险值
- 改进kml输出(信息、标记)
鸣谢
许可证
MIT许可证(MIT)。请参阅许可证文件以获取更多信息。