mehrdad-dadkhah / php-persian-natural-language-processor
基于 hazm 的波斯语文本处理器简单 php 和 python 包装器
1.0.5
2020-09-07 14:21 UTC
Requires
- php: >=7.2
- symfony/process: >=3.0
README
简单 php 和 python 包装器,基于 hazm 波斯语文本处理器。
系统需求
安装 hazm
如果没有 python
sudo apt install python
然后
sudo apt install python-pip
然后
pip install hazm
安装
composer require mehrdad-dadkhah/php-persian-natural-language-processor
使用
PHP
use MehrdadDadkhah\Language\PersianLanguageProcessor; $parser = new PersianLanguageProcessor(); $parser->allNLP('سلام. این یک متن تست است. موفق باشید');
Python
python /path/to/pr/processor.py allNLP json.dumps('سلام. این یک متن تست است. موفق باشید')
及其结果
array:7 [▼
"chunksGroup" => array:2 [▼
"main" => "[سلام NP] . [این یک متن تست NP] [است VP] . [موفق ADJP] [باشید VP]"
"normalized" => "[سلام NP] . [این یک متن تست NP] [است VP] . [موفق ADJP] [باشید VP]"
]
"postTags" => array:2 [▼
"main" => array:10 [▶]
"normalized" => array:10 [▼
0 => array:2 [▶]
1 => array:2 [▶]
2 => array:2 [▶]
3 => array:2 [▶]
4 => array:2 [▼
0 => "متن"
1 => "N"
]
5 => array:2 [▶]
6 => array:2 [▶]
7 => array:2 [▶]
8 => array:2 [▶]
9 => array:2 [▶]
]
]
"stem" => array:2 [▼
"main" => array:4 [▶]
"normalized" => array:4 [▼
"ADV" => []
"N" => array:2 [▶]
"Ne" => []
"V" => array:3 [▶]
]
]
"wordTokenize" => array:2 [▼
"main" => array:10 [▶]
"normalized" => array:10 [▼
0 => "سلام"
1 => "."
2 => "این"
3 => "یک"
4 => "متن"
5 => "تست"
6 => "است"
7 => "."
8 => "موفق"
9 => "باشید"
]
]
"lemmatized" => array:2 [▼
"main" => array:4 [▼
"ADV" => []
"N" => array:2 [▼
0 => "سلام"
1 => "متن"
]
"Ne" => []
"V" => array:3 [▼
0 => "تست"
1 => "است"
2 => "بود#باش"
]
]
"normalized" => array:4 [▼
"ADV" => []
"N" => array:2 [▼
0 => "سلام"
1 => "متن"
]
"Ne" => []
"V" => array:3 [▼
0 => "تست"
1 => "است"
2 => "بود#باش"
]
]
]
"normalized" => "سلام. این یک متن تست است. موفق باشید"
"sentTokenize" => array:2 [▼
"main" => array:3 [▶]
"normalized" => array:3 [▼
0 => "سلام."
1 => "این یک متن تست است."
2 => "موفق باشید"
]
]
]
函数
- normilizeText(string $text)
- sentTokenizeText(string $text)
- wordTokenizeText(string $text)
- postTagText(string $text)
- chunksText(string $text)
- getChunksGroup(string $text)
- stemText(string $text)
- lemmatizeText(string $text)
- allNLP(string $text)
allNLP 函数调用所有其他函数并返回所有结果。
致谢
用途
许可证
php-persian-natural-language-processor 在 GPLv3 许可证下发布。