tarastalian / text-language-detect
检测给定文本的语言。
dev-master
2017-05-17 08:25 UTC
Requires
- php: >=5.3.0
This package is not auto-updated.
Last update: 2024-09-29 02:29:02 UTC
README
检测给定文本的语言。
该软件包尝试通过将排序的3-gram频率与已知语言的3-gram频率表相关联来检测文本样本的语言。
它实现了一种最初由Cavnar & Trenkle (1994) 提出的技术版本:"基于N-gram的文本分类"。
这是Text_LanguageDetect 0.3.0 (alpha)的分支。
依赖关系
PHP Version: PHP 5.3 or newer, PHP 7
PHP Extension: pcre
PHP Extension: mbstring (optional)
使用示例
<?php use TextLanguageDetect\TextLanguageDetect; use TextLanguageDetect\LanguageDetect\TextLanguageDetectException; $l = new TextLanguageDetect(); echo "Supported languages:\n"; try { $langs = $l->getLanguages(); sort($langs); echo implode(', ', $langs) . "\n\n"; } catch (TextLanguageDetectException $e) { die($e->getMessage()); } $text = <<<EOD Hallo! Das ist ein Text in deutscher Sprache. Mal sehen, ob die Klasse erkennt, welche Sprache das hier ist. EOD; try { //return 2-letter language codes only $l->setNameMode(2); $result = $l->detect($text, 4); print_r($result); } catch (TextLanguageDetectException $e) { die($e->getMessage()); }
输出
// output
Supported languages:
albanian, arabic, azeri, bengali, bulgarian, cebuano, croatian, czech,
danish, dutch, english, estonian, farsi, finnish, french, german, hausa,
hawaiian, hindi, hungarian, icelandic, indonesian, italian, kazakh, kyrgyz,
latin, latvian, lithuanian, macedonian, mongolian, nepali, norwegian, pashto,
pidgin, polish, portuguese, romanian, russian, serbian, slovak, slovene, somali,
spanish, swahili, swedish, tagalog, turkish, ukrainian, urdu, uzbek, vietnamese,
welsh
Array
(
[de] => 0.40703703703704
[nl] => 0.2880658436214
[en] => 0.28333333333333
[da] => 0.23452674897119
)
作者
Nicholas Pisarro - infinityminusnine+pear@gmail.com