cylab-be/wowa-training

0.0.7 2021-08-27 12:33 UTC

This package is auto-updated.

Last update: 2024-08-27 19:22:24 UTC


README

pipeline status coverage report

WOWA算子(Torra)是一种强大的聚合算子,可以将多个输入值组合成一个单一的分数。这对于依赖于多个启发式方法的检测和排名系统特别有趣。该系统可以使用WOWA产生一个有意义的单一分数。

PHP的WOWA实现可在以下网址找到:https://github.com/tdebatty/php-aggregation-operators

WOWA算子需要两组参数:p权重w权重。在本项目中,我们使用遗传算法来计算p和w权重的最佳值。在训练过程中,算法使用输入向量的数据集以及每个向量的预期聚合分数。

安装

composer require cylab-be/wowa-training

使用

示例

require __DIR__ . "/vendor/autoload.php"

use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;

use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParemeters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;

// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::DEBUG));

$parameters = new TrainerParameters(
    $logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitilizationMethod);
$trainer = new Trainer($parameters);

// Input data
$data = [
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
];

// expected aggregated value for each data vector
$expected = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6];

var_dump($trainer->run($data, $expected));

上面的示例将生成类似以下内容:

class RUCD\Training\Solution#56 (3) {
  public $weights_w =>
  array(4) {
    [0] =>
    double(0.31568310640557)
    [1] =>
    double(0.37517587135019)
    [2] =>
    double(0.23165073663557)
    [3] =>
    double(0.077490285608666)
  }
  public $weights_p =>
  array(4) {
    [0] =>
    double(0.67852325915809)
    [1] =>
    double(0.0083157109614166)
    [2] =>
    double(0.082353710617992)
    [3] =>
    double(0.2308073192625)
  }
  public $distance =>
  double(0.51636277259465)
}

run方法返回一个解决方案对象,该对象由用于WOWA算子的p权重和w权重组成,以及预期的聚合值与使用这些权重计算出的WOWA聚合值之间的总距离。

参数描述

  • populationSize:算法中种群的大小。建议值:100
  • crossoverRate:定义通过交叉产生的种群百分比。必须在1和100之间。建议值:60
  • mutationRate:定义种群中随机元素更改的数量。必须在1和100之间。建议值:15
  • selectionMethod:确定用于在种群中选择元素的方法(用于生成下一代)。SELECTION_METHOD_RWS用于轮盘赌选择,SELECTION_METHOD_TOS用于锦标赛选择。
  • maxGeneration:确定算法的最大迭代次数。
  • populationInitilizationMethod:确定用于生成初始种群的方法。INITIAL_POPULATION_GENERATION_RANDOM用于随机生成,INITIAL_POPULATION_GENERATION_QUASI_RANDOM用于“准”随机生成。
  • solutionType:指定解决方案对象的类。解决方案对象必须扩展SolutionAbstract类。解决方案对象的类定义了用于评估种群中个体性能的标准。SolutionDistance实现了距离标准,而SolutionAUC使用基于曲线下面积(AUC)计算的准则。请注意,SolutionAUC是为二元分类设计的,预期的向量只能包含0或1值。

交叉验证

示例

require __DIR__ . "/vendor/autoload.php";

use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use RUCD\Training\SolutionDistance;
use RUCD\Training\SolutionAUC;

use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParameters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
$solutionType = new SolutionDistance();

// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::WARNING));

$parameters = new TrainerParameters(
    $logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitializationMethod);
$trainer = new Trainer($parameters, $solutionType);

// Input data
$data = [
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
];

// expected aggregated value for each data vector
$expected = [1,0,0,1,0,1,0,0,0,1,0,0];

var_dump($trainer->runKFold($data, $expected, 3));

方法runKFold执行k折交叉验证。具体来说,它将数据集分为k个折。对于每个折,保留一个单独的折作为验证数据以测试模型,其余的k - 1个折用作训练数据。然后重复进行交叉验证过程k次,其中每个折恰好使用一次作为验证数据。然后可以将k个结果平均,以产生单个估计。对于每个测试折,还计算曲线下面积以评估分类效率(仅适用于包含0和1的预期向量)。

输出方法生成一个包含每个折的w和p向量以及AUC值的数组。

上面的示例生成类似以下结果:

array(3) {
  [0]=>
  array(2) {
    ["auc"]=>
    float(0.5)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#133 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.16573697533351)
        [1]=>
        float(0.76165292950897)
        [2]=>
        float(0.024253730247718)
        [3]=>
        float(0.048356364909798)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.20097150002833)
        [1]=>
        float(0.020364990979043)
        [2]=>
        float(0.17636230606784)
        [3]=>
        float(0.60230120292479)
      }
      ["distance"]=>
      float(1.7892117370011)
    }
  }
  [1]=>
  array(2) {
    ["auc"]=>
    float(0)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#146 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.18742088232865)
        [1]=>
        float(0.57233147854378)
        [2]=>
        float(0.22507083815429)
        [3]=>
        float(0.015176800973267)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.076670559592882)
        [1]=>
        float(0.019193144442706)
        [2]=>
        float(0.18316950831007)
        [3]=>
        float(0.72096678765435)
      }
      ["distance"]=>
      float(1.3403524893715)
    }
  }
  [2]=>
  array(2) {
    ["auc"]=>
    float(1)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#12 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.16274887804484)
        [1]=>
        float(0.527446888854)
        [2]=>
        float(0.21225455965351)
        [3]=>
        float(0.097549673447646)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.10891441031576)
        [1]=>
        float(0.023649196569852)
        [2]=>
        float(0.24106562811561)
        [3]=>
        float(0.62637076499877)
      }
      ["distance"]=>
      float(2.0314776184856)
    }
  }
}

参考文献