README

bayes 接收一个文档（文本片段），并告诉你该文档属于哪个类别。

这个库是从 nodejs 库 https://github.com/ttezel/bayes 移植过来的。

在 nodejs 中经过验证且流行的分类器 - https://npmjs.net.cn/package/bayes
我们保留了 json 序列化签名，因此您可以简单地使用从 PHP 和 nodejs 库学习/训练的 json 输出。

我能用它做什么？

您可以使用它将任何文本内容分类到任何任意集合的类别中。例如

一封邮件是 垃圾邮件 还是 非垃圾邮件？
一篇新闻文章是关于技术、政治还是体育？
一段文本表达的是积极情感还是消极情感？

安装

composer require niiknow/bayes

用法

$classifier = new \Niiknow\Bayes();

// teach it positive phrases

$classifier->learn('amazing, awesome movie!! Yeah!! Oh boy.', 'positive');
$classifier->learn('Sweet, this is incredibly, amazing, perfect, great!!', 'positive');

// teach it a negative phrase

$classifier->learn('terrible, shitty thing. Damn. Sucks!!', 'negative');

// now ask it to categorize a document it has never seen before

$classifier->categorize('awesome, cool, amazing!! Yay.');
// => 'positive'

// serialize the classifier's state as a JSON string.
$stateJson = $classifier->toJson();

// load the classifier back from its JSON representation.
$classifier->fromJson($stateJson);

API

`$classifier = new \Niiknow\Bayes([options])`

返回一个朴素贝叶斯分类器的实例。

传入一个可选的 options 对象来配置实例。如果您在 options 中指定了一个 tokenizer 函数，它将被用作实例的分词器。

`$classifier->learn(text, category)`

教给您的分类器 text 属于哪个 category。您教给分类器的越多，它就越可靠。它将使用所学到的知识来识别它之前没有见过的新的文档。

`$classifier->categorize(text)`

返回它认为 text 属于哪个 category。它的判断基于您用 .learn() 教给它的内容。

`$classifier->probabilities(text)`

提取每个已知类别的概率。

`$classifier->toJson()`

返回分类器的 JSON 表示。

`$classifier->fromJson(jsonStr)`

从 JSON 表示中返回一个分类器实例。与从 $classifier->toJson() 获得的 JSON 表示一起使用。

停用词

您可以在构造函数中传入自己的分词器函数。示例

// array containing stopwords
$stopwords = array("der", "die", "das", "the");

// escape the stopword array and implode with pipe
$s = '~^\W*('.implode("|", array_map("preg_quote", $stopwords)).')\W+\b|\b\W+(?1)\W*$~i';

$options['tokenizer'] = function($text) use ($s) {
            // convert everything to lowercase
            $text = mb_strtolower($text);

            // remove stop words
            $text = preg_replace($s, '', $text);

            // split the words
            preg_match_all('/[[:alpha:]]+/u', $text, $matches);

            // first match list of words
            return $matches[0];
        };

$classifier = new \niiknow\Bayes($options);

niiknow / bayes

维护者

详情