landrok/language-detector

一个快速且可靠的PHP库,用于检测语言

资助包维护!
landrok
Ko Fi

安装: 307 269

依赖项: 2

建议者: 0

安全: 0

星级: 117

关注者: 8

分支: 18

开放问题: 6

1.4.0 2023-12-18 21:52 UTC

This package is auto-updated.

Last update: 2024-09-21 14:39:07 UTC


README

Build Status Test Coverage Code Climate

LanguageDetector 是一个PHP库,可以从文本字符串中检测语言。

目录

功能

  • 支持超过50种语言,包括克林贡语
  • 非常快速,无需数据库
  • 包含2MB数据集的包
  • 学习步骤已完成,库已准备好使用
  • 代码小巧,占用空间小
  • N-gram算法
  • 支持PHP 5.4+, 7+和8+以及HHVM。最新版本1.4.x仅支持PHP>=7.4

安装

composer require landrok/language-detector

快速使用

检测语言

实例化一个检测器,传递一个文本并获取检测到的语言。

require_once 'vendor/autoload.php';

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector\LanguageDetector();

$language = $detector->evaluate($text)->getLanguage();

echo $language; // Prints something like 'en'

实例化后,您可以测试多个文本。

require_once 'vendor/autoload.php';

// An array of texts to evaluate
$texts = [
    'My tailor is rich and Alison is in the kitchen with Bob.',
    'Mon tailleur est riche et Alison est dans la cuisine avec Bob'
];

$detector = new LanguageDetector\LanguageDetector();

foreach ($texts as $key => $text) {

    $language = $detector->evaluate($text)->getLanguage();

    echo sprintf(
        "Text %d, language=%s\n",
        $key,
        $language
    );
}

输出可能如下所示

Text 0, language=en
Text 1, language=fr

此外,您可以将LanguageDetector实例用作字符串。

require_once 'vendor/autoload.php';

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector\LanguageDetector();

echo $detector->evaluate($text); // Prints something like 'en'
echo $detector; // Prints something like 'en' after an evaluate()

API 方法

evaluate()

类型 \LanguageDetector\LanguageDetector

它对给定文本进行评估。

示例

在执行evaluate()后,结果将被存储并可供以后使用。

$detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.');

// Then you have access to the detected language
$detector->getLanguage(); // Returns 'en'

您可以一行调用。

$detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.')
         ->getLanguage(); // Returns 'en'

可以直接打印evaluate()的输出。

// Returns 'en'
echo $detector->evaluate('My tailor is rich and Alison is in the kitchen with Bob.');

getLanguage()

类型 string

检测到的语言

示例

$detector->getLanguage(); // Returns 'en'

getLanguages()

类型 array

将进行评估的加载模型的列表。

示例

$detector->getLanguages(); // Returns something like ['de', 'en', 'fr']

getScores()

类型 array

所有评估语言的分数列表。

示例

$detector->getScores();

// Returns something like
Array
(
    [en] => 0.43950135722745
    [nl] => 0.40898789832569
    [...]
    [ja] => 0
    [fa] => 0
)

getSupportedLanguages()

类型 array

将进行评估的支持的语言列表。

示例

$detector->getSupportedLanguages();

// Returns something like
Array
(
    [0] => af
    [1] => ar
    [...]
    [51] => zh-cn
    [52] => zh-tw

)

getText()

类型 string

返回最后一个已评估的字符串

示例

$detector->getText();

// Returns 'My tailor is rich and Alison is in the kitchen with Bob.'

选项

类型 \LanguageDetector\LanguageDetector

为了更好的性能,可以明确指定加载的模型。

示例

$text = 'My tailor is rich and Alison is in the kitchen with Bob.';

$detector = new LanguageDetector(null, ['en', 'fr', 'de']);

$language = $detector->evaluate($text);

echo $language; // Prints something like 'en'

仅适用于单行

类型 \LanguageDetector\LanguageDetector

通过在detect()方法上使用静态调用,您可以在一行内对给定文本进行评估。

示例

echo LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.'
); // Returns 'en'

您可以使用所有API方法。

$detector = LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.'
);

// en
echo $detector;

// en
echo $detector->getLanguage();

// An array of all scores, see API method
print_r($detector->getScores());

// An array of all supported languages, see API method
print_r($detector->getSupportedLanguages());

// The last evaluated string
echo $detector->getText();

// Limit loaded languages for even better performance
echo LanguageDetector\LanguageDetector::detect(
    'My tailor is rich and Alison is in the kitchen with Bob.',
    ['en', 'de', 'fr', 'es']
); // en