cylab-be / wowa-training
0.0.7
2021-08-27 12:33 UTC
Requires
- php: >=7.4
- cylab-be/php-roc: ^1.1.0
- psr/log: ^1.0
- webd/aggregation: *
Requires (Dev)
- monolog/monolog: ^1.23
- phpstan/phpstan: ^0.12.5
- phpunit/phpunit: ^7
- squizlabs/php_codesniffer: ^3.3
This package is auto-updated.
Last update: 2024-08-27 19:22:24 UTC
README
WOWA算子(Torra)是一种强大的聚合算子,可以将多个输入值组合成一个单一的分数。这对于依赖于多个启发式方法的检测和排名系统特别有趣。该系统可以使用WOWA产生一个有意义的单一分数。
PHP的WOWA实现可在以下网址找到:https://github.com/tdebatty/php-aggregation-operators
WOWA算子需要两组参数:p权重和w权重。在本项目中,我们使用遗传算法来计算p和w权重的最佳值。在训练过程中,算法使用输入向量的数据集以及每个向量的预期聚合分数。
安装
composer require cylab-be/wowa-training
使用
示例
require __DIR__ . "/vendor/autoload.php"
use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParemeters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::DEBUG));
$parameters = new TrainerParameters(
$logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitilizationMethod);
$trainer = new Trainer($parameters);
// Input data
$data = [
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
];
// expected aggregated value for each data vector
$expected = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6];
var_dump($trainer->run($data, $expected));
上面的示例将生成类似以下内容:
class RUCD\Training\Solution#56 (3) {
public $weights_w =>
array(4) {
[0] =>
double(0.31568310640557)
[1] =>
double(0.37517587135019)
[2] =>
double(0.23165073663557)
[3] =>
double(0.077490285608666)
}
public $weights_p =>
array(4) {
[0] =>
double(0.67852325915809)
[1] =>
double(0.0083157109614166)
[2] =>
double(0.082353710617992)
[3] =>
double(0.2308073192625)
}
public $distance =>
double(0.51636277259465)
}
run方法返回一个解决方案对象,该对象由用于WOWA算子的p权重和w权重组成,以及预期的聚合值与使用这些权重计算出的WOWA聚合值之间的总距离。
参数描述
- populationSize:算法中种群的大小。建议值:100
- crossoverRate:定义通过交叉产生的种群百分比。必须在1和100之间。建议值:60
- mutationRate:定义种群中随机元素更改的数量。必须在1和100之间。建议值:15
- selectionMethod:确定用于在种群中选择元素的方法(用于生成下一代)。SELECTION_METHOD_RWS用于轮盘赌选择,SELECTION_METHOD_TOS用于锦标赛选择。
- maxGeneration:确定算法的最大迭代次数。
- populationInitilizationMethod:确定用于生成初始种群的方法。INITIAL_POPULATION_GENERATION_RANDOM用于随机生成,INITIAL_POPULATION_GENERATION_QUASI_RANDOM用于“准”随机生成。
- solutionType:指定解决方案对象的类。解决方案对象必须扩展SolutionAbstract类。解决方案对象的类定义了用于评估种群中个体性能的标准。SolutionDistance实现了距离标准,而SolutionAUC使用基于曲线下面积(AUC)计算的准则。请注意,SolutionAUC是为二元分类设计的,预期的向量只能包含0或1值。
交叉验证
示例
require __DIR__ . "/vendor/autoload.php";
use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use RUCD\Training\SolutionDistance;
use RUCD\Training\SolutionAUC;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParameters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
$solutionType = new SolutionDistance();
// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::WARNING));
$parameters = new TrainerParameters(
$logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitializationMethod);
$trainer = new Trainer($parameters, $solutionType);
// Input data
$data = [
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
[0.1, 0.2, 0.3, 0.4],
[0.1, 0.8, 0.3, 0.4],
[0.2, 0.6, 0.3, 0.4],
[0.1, 0.2, 0.5, 0.8],
[0.5, 0.1, 0.2, 0.3],
[0.1, 0.1, 0.1, 0.1],
];
// expected aggregated value for each data vector
$expected = [1,0,0,1,0,1,0,0,0,1,0,0];
var_dump($trainer->runKFold($data, $expected, 3));
方法runKFold执行k折交叉验证。具体来说,它将数据集分为k个折。对于每个折,保留一个单独的折作为验证数据以测试模型,其余的k - 1个折用作训练数据。然后重复进行交叉验证过程k次,其中每个折恰好使用一次作为验证数据。然后可以将k个结果平均,以产生单个估计。对于每个测试折,还计算曲线下面积以评估分类效率(仅适用于包含0和1的预期向量)。
输出方法生成一个包含每个折的w和p向量以及AUC值的数组。
上面的示例生成类似以下结果:
array(3) {
[0]=>
array(2) {
["auc"]=>
float(0.5)
["solution"]=>
object(RUCD\Training\SolutionDistance)#133 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.16573697533351)
[1]=>
float(0.76165292950897)
[2]=>
float(0.024253730247718)
[3]=>
float(0.048356364909798)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.20097150002833)
[1]=>
float(0.020364990979043)
[2]=>
float(0.17636230606784)
[3]=>
float(0.60230120292479)
}
["distance"]=>
float(1.7892117370011)
}
}
[1]=>
array(2) {
["auc"]=>
float(0)
["solution"]=>
object(RUCD\Training\SolutionDistance)#146 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.18742088232865)
[1]=>
float(0.57233147854378)
[2]=>
float(0.22507083815429)
[3]=>
float(0.015176800973267)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.076670559592882)
[1]=>
float(0.019193144442706)
[2]=>
float(0.18316950831007)
[3]=>
float(0.72096678765435)
}
["distance"]=>
float(1.3403524893715)
}
}
[2]=>
array(2) {
["auc"]=>
float(1)
["solution"]=>
object(RUCD\Training\SolutionDistance)#12 (3) {
["weights_w"]=>
array(4) {
[0]=>
float(0.16274887804484)
[1]=>
float(0.527446888854)
[2]=>
float(0.21225455965351)
[3]=>
float(0.097549673447646)
}
["weights_p"]=>
array(4) {
[0]=>
float(0.10891441031576)
[1]=>
float(0.023649196569852)
[2]=>
float(0.24106562811561)
[3]=>
float(0.62637076499877)
}
["distance"]=>
float(2.0314776184856)
}
}
}