nai-php / naipostagger
一款用 PHP 编写的词性标注器。
v0.2
2021-08-22 08:39 UTC
Requires
- php: >=7.2.0
- aura/sql: ^3.0
- monolog/monolog: ^2.3
This package is auto-updated.
Last update: 2024-09-16 14:08:52 UTC
README
这是一个轻量级的、不依赖于框架的纯 PHP 库,用于词性标注。可用于聊天机器人、个人助理、关键词提取等。由于是用 PHP 编写的,它可以轻松集成到现有或新的应用程序中,真正实现理解用户所写内容的能力。
它基于词汇和预定义的语法规则,无需第三方系统、神经网络、机器学习或需要大量资源的模型。
这是英文版本。文档和 TODO 列表即将到来,更多信息请访问 n-ai.cloud
精确度
在此表中,我将展示不同类型句子语料库的结果。
安装
-
在您的项目文件夹中(例如 "myproject")通过 composer 安装此包;
-
创建 "dictionaries" 文件夹;
-
在 "dictionaries" 文件夹中,克隆或下载 英文词典 仓库;
-
运行此示例脚本
use NaiPosTagger\Pipelines\PipelinePosTagging; use NaiPosTagger\Models\NaiPosArr; include('vendor/autoload.php'); include(__DIR__ . '/vendor/nai-php/naipostagger/src/Utilities/common_functions_helper.php'); define('DICTIONARIES_PATH', __DIR__ . '/./dictionaries/dictionaries-'); define('TRAITS_PATH', __DIR__ . '/./vendor/nai-php/naipostagger/src/'); $sentence = 'my name is Fred'; $PipelinePosTagging = new PipelinePosTagging(); $PipelinePosTagging->language = 'en'; $pos_arr = $PipelinePosTagging->transform($sentence); // for a clear output, better hide metadata $pos_arr = NaiPosArr::clearMetadata($pos_arr); // and further simplify the output $pos_arr = NaiPosArr::flatPosArr($pos_arr); diex($pos_arr);
输出将如下所示
Array ( [0] => Array ( [form] => . [lemma] => . [features] => SENT [sh-feat] => SENT [label] => [rule] => [pos_score] => 0 ) [1] => Array ( [form] => my [lemma] => my [features] => ADJ:pos+m+s [sh-feat] => ADJ [label] => [rule] => [pos_score] => 0 ) [2] => Array ( [form] => name [lemma] => name [features] => NOUN-m:s [sh-feat] => NOUN [label] => [rule] => [pos_score] => 0 ) [3] => Array ( [form] => is [lemma] => is [features] => VER:ind+pres+3+s [sh-feat] => VER [label] => [rule] => [pos_score] => 0 ) [4] => Array ( [form] => Fred [lemma] => Fred [features] => NPR [sh-feat] => NPR [label] => [rule] => [pos_score] => 0 ) [5] => Array ( [form] => . [lemma] => . [features] => SENT [sh-feat] => SENT [label] => [rule] => [pos_score] => 0 ) )
待办事项
- 寻找贡献者
- 清理、检查、修复和标记词典中的术语
- 清理、检查、修复 brill 规则
- 添加更多 ngrams
- 添加更多测试,特别是针对过滤器
- 收集和加载 frill 单词
- 是否对某些类进行更好的 Oop 处理?
- 在用于逻辑分析(尚未发布)的模块中收集同义词和时间表达