jotaelesalinas/php-mapreduce

PHP中的map-reduce策略的本地实现

v2.0.0 2022-07-31 04:55 UTC

This package is auto-updated.

Last update: 2024-09-12 18:43:16 UTC


README

Latest Version on Packagist Software License Build Status Total Downloads

PHP PSR-4兼容库,可轻松实现非分布式本地map-reduce。

安装

通过Composer

$ composer require jotaelesalinas/php-mapreduce

基本用法

require_once __DIR__ . '/vendor/autoload.php';

$source = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
$mapper = fn($item) => $item * 2;
$reducer = fn($carry, $item) => ($carry ?? 0) + $item;

$result = MapReduce\MapReduce::create()
    ->setInput($source)
    ->setMapper($mapper)
    ->setReducer($reducer)
    ->run();

print_r($result);

输出为

Array
(
    [0] => 110
)

过滤器

$odd_numbers = fn($item) => $item % 2 === 0;
$greater_than_10 = fn($item) => $item > 10;

$result = MapReduce\MapReduce::create([
        "input" => $source, 
        "mapper" => $mapper, 
        "reducer" => $reducer, 
    ])
    // only odd numbers are passed to the mapper function
    ->setPreFilter($odd_numbers)
    // only numbers greater than 10 are passed to the reducer function
    ->setPostFilter($greater_than_10)
    ->run();

print_r($result);

输出为

Array
(
    [0] => 48
)

分组

按字段的值分组(适用于数组和对象)

$source = [
    [ "first_name" => "Susanna", "last_name" => "Connor",  "member" => "y", "age" => 20],
    [ "first_name" => "Adrian",  "last_name" => "Smith",   "member" => "n", "age" => 22],
    [ "first_name" => "Mike",    "last_name" => "Mendoza", "member" => "n", "age" => 24],
    [ "first_name" => "Linda",   "last_name" => "Duguin",  "member" => "y", "age" => 26],
    [ "first_name" => "Bob",     "last_name" => "Svenson", "member" => "n", "age" => 28],
    [ "first_name" => "Nancy",   "last_name" => "Potier",  "member" => "y", "age" => 30],
    [ "first_name" => "Pete",    "last_name" => "Adams",   "member" => "n", "age" => 32],
    [ "first_name" => "Susana",  "last_name" => "Zommers", "member" => "y", "age" => 34],
    [ "first_name" => "Adrian",  "last_name" => "Deville", "member" => "n", "age" => 36],
    [ "first_name" => "Mike",    "last_name" => "Cole",    "member" => "n", "age" => 38],
    [ "first_name" => "Mike",    "last_name" => "Angus",   "member" => "n", "age" => 40],
];

// mapper does nothing
$mapper = fn($x) => $x;

// number of persons and sum of ages
$reduceAgeSum = function ($carry, $item) {
    if (is_null($carry)) {
        return [
            'count' => 1,
            'age_sum' => $item['age'],
        ];
    }
    
    $count = $carry['count'] + 1;
    $age_sum = $carry['age_sum'] + $item['age'];
    
    return compact('count', 'age_sum');
};

$result = MapReduce\MapReduce::create([
        "input" => $source, 
        "mapper" => $mapper, 
        "reducer" => $reduceAgeSum, 
    ])
    // group by field 'member'
    ->setGroupBy('member')
    ->run();

print_r($result);

输出为

Array
(
    [y] => Array
        (
            [count] => 4
            [age_sum] => 110
        )

    [n] => Array
        (
            [count] => 7
            [age_sum] => 220
        )

)

按从每个项目生成的自定义值分组

$closestTen = fn($x) => floor($x['age'] / 10) * 10;

$result = MapReduce\MapReduce::create([
        "input" => $source, 
        "mapper" => $mapper, 
        "reducer" => $reduceAgeSum, 
    ])
    // group by age ranges of 10
    ->setGroupBy($closestTen)
    ->run();

print_r($result);

输出为

Array
(
    [20] => Array
        (
            [count] => 5
            [age_sum] => 120
        )

    [30] => Array
        (
            [count] => 5
            [age_sum] => 170
        )

    [40] => Array
        (
            [count] => 1
            [age_sum] => 40
        )

)

输入

MapReduce接受任何类型为iterable的数据作为输入。这意味着,数组和可遍历的,例如生成器。

当读取不适用于内存的大文件时,这非常方便。

$result = MapReduce\MapReduce::create([
        "mapper" => $mapper, 
        "reducer" => $reducer, 
    ])
    ->setInput(csvReadGenerator('myfile.csv'))
    ->run();

可以通过向setInput()传递多个参数来指定多个输入,只要所有这些参数都是可迭代的

$result = MapReduce\MapReduce::create([
        "mapper" => $mapper, 
        "reducer" => $reducer, 
    ])
    ->setInput($arrayData, csvReadGenerator('myfile.csv'))
    ->run();

输出

MapReduce可以配置为将最终数据写入一个或多个目的地。

每个目的地都必须是一个Generator

$result = MapReduce\MapReduce::create([
        "mapper" => $mapper, 
        "reducer" => $reducer, 
    ])
    ->setOutput(csvWriteGenerator('results.csv'))
    ->run();

也可以指定多个输出

$result = MapReduce\MapReduce::create([
        "mapper" => $mapper, 
        "reducer" => $reducer, 
    ])
    ->setOutput(csvWriteGenerator('results.csv'), consoleGenerator())
    ->run();

为了帮助处理输入和输出生成器,建议使用jotaelesalinas/php-generators包,但这不是强制性的。

您可以在examples文件夹下查看更多详细示例。

变更日志

请参阅CHANGELOG以获取更多关于最近更改的信息。

测试

$ composer test

贡献

请参阅CONTRIBUTINGCONDUCT以获取详细信息。

安全

如果您发现任何安全相关的问题,请直接通过@jotaelesalinas联系我,而不是使用问题跟踪器。

待办事项

  • 添加事件以帮助查看大批次中的进度
  • 添加文档
  • 保险示例
    • 适应新库
    • 添加保险值
    • 改进kml输出(信息、标记)

鸣谢

许可证

MIT许可证(MIT)。请参阅许可证文件以获取更多信息。