amaccis / php-stemmer
PHP 接口到 Snowball 词干算法
2.0.0
2023-04-30 09:50 UTC
Requires
- php: ^8.1.0
- ext-ffi: *
Requires (Dev)
- ext-iconv: *
- phpstan/phpstan: ^1.10
- phpunit/phpunit: ^10.1
README
PHP 词干分析器是什么?
PHP 词干分析器是 Snowball 项目(https://snowballstem.org/)词干算法的 PHP 接口,很大程度上受到 Richard Boulton 的 PyStemmer(https://github.com/snowballstem/pystemmer)的启发。它使用 FFI(PHP >= 7.4.0)并期望在 LD_LIBRARY_PATH 中找到 libstemmer.so 文件(Libstemmer 的共享库版本)。
为了设置此类环境,您可以查看 docker-php-libstemmer Dockerfile,或者您可以使用相应的 docker 镜像: amaccis/php-libstemmer
安装
PHP 词干分析器可在 Packagist(https://packagist.org.cn/packages/amaccis/php-stemmer)上找到,您可以使用 Composer(https://getcomposer.org.cn)安装它。
composer require amaccis/php-stemmer
用法
<?php use Amaccis\Stemmer\Stemmer; use Amaccis\Stemmer\Enum\CharacterEncodingEnum; $algorithms = Stemmer::algorithms(); var_dump($algorithms); /* array(29) { [0] => string(6) "arabic" [1] => string(8) "armenian" [2] => string(6) "basque" [3] => string(7) "catalan" [4] => string(6) "danish" [5] => string(5) "dutch" [6] => string(7) "english" [7] => string(7) "finnish" [8] => string(6) "french" [9] => string(6) "german" [10] => string(5) "greek" [11] => string(5) "hindi" [12] => string(9) "hungarian" [13] => string(10) "indonesian" [14] => string(5) "irish" [15] => string(7) "italian" [16] => string(10) "lithuanian" [17] => string(6) "nepali" [18] => string(9) "norwegian" [19] => string(6) "porter" [20] => string(10) "portuguese" [21] => string(8) "romanian" [22] => string(7) "russian" [23] => string(7) "serbian" [24] => string(7) "spanish" [25] => string(7) "swedish" [26] => string(5) "tamil" [27] => string(7) "turkish" [28] => string(7) "yiddish" } */ $algorithm = "english"; $word = "cycling"; $stemmer = new Stemmer($algorithm); // default character encoding is UTF-8 $stem = $stemmer->stemWord($word); var_dump($stem); /* string(4) "cycl" */ $algorithm = "basque"; $word = "aberatsenetakoa"; $stemmer = new Stemmer($algorithm, CharacterEncodingEnum::ISO_8859_1); $stem = $stemmer->stemWord($word); var_dump($stem); /* string(8) "aberatse" */
许可证
所有文件均为 MIT © Andrea Maccis 所有,但 resources/libstemmer.h BSD-3 © Snowball Project。