aflorea4/php-nlp-client

访问 NLP APIs 的库(固定 HTTP 头部解释)

v0.40.7 2022-07-19 10:10 UTC

This package is auto-updated.

Last update: 2024-09-19 15:12:22 UTC


README

这是一个简单的 PHP 库,用于使用 Web64 的 NLP-Server https://github.com/web64/nlpserver 和其他提供商执行多语言自然语言任务。

通过 Web64 的 NLP Server 可用的 NLP 任务

通过 Stanford 的 CoreNLP Server 可用的 NLP 任务

通过 Microsoft Labs API 可用的 NLP 任务

Laravel 包

此外,此库的 Laravel 包装器也可在此处获得: https://github.com/web64/laravel-nlp

安装

composer require web64/php-nlp-client

NLP 服务器

本包中的大多数 NLP 功能都需要运行 NLP 服务器的一个实例,这是一个简单的 python flask 应用程序,提供对常见的 python NLP 库的 Web 服务 API 访问。

安装说明: https://github.com/web64/nlpserver

实体提取 - 命名实体识别 (NER)

此库提供了三种不同的实体提取方法。

如果您处理的是英文或主要欧洲语言中的文本,您将使用 CoreNLP 或 Spacy 获得最佳结果。

Polyglot 提取的实体质量不是很好,但对于许多语言来说,它是目前唯一可用的选项。

Polyglot 和 Spacy NER 可以通过 NLP Server 访问,CoreNLP 需要自己的独立 Java 服务器。

用法

语言检测

$nlp = new \Web64\Nlp\NlpClient('https://:6400/');
$detected_lang = $nlp->language( "The quick brown fox jumps over the lazy dog" );
// 'en'

文章 & 元数据提取

// From URL
$nlp = new \Web64\Nlp\NlpClient('https://:6400/');
$newspaper = $nlp->newspaper('https://github.com/web64/nlpserver');

// or from HTML
$html = file_get_contents( 'https://github.com/web64/nlpserver' );
$newspaper = $nlp->newspaper_html( $html );

Array
(
    [article_html] => <div><h1><a id="user-content-nlp-server" class="anchor" href="#nlp-server"></a>NLP Server</h1> .... </div>
    [authors] => Array()
    [canonical_url] => https://github.com/web64/nlpserver
    [meta_data] => Array()
    [meta_description] => GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
    [meta_lang] => en
    [source_url] => 
    [text] => NLP Server. Python Flask web service for easy access to multilingual NLP tasks such as language detection, article extraction...
    [title] => web64/nlpserver: NLP Web Service
    [top_image] => https://avatars2.githubusercontent.com/u/76733?s=400&v=4
)

实体提取 & 情感分析(Polyglot)

此操作使用 Polyglot 多语言 NLP 库返回给定文本的实体和情感分数。请确保已下载 Polyglot 所需语言的模型。

$polyglot = $nlp->polyglot_entities( $text, 'en' );

$polyglot->getSentiment(); // -1

$polyglot->getEntityTypes(); 
/*
Array
(
    [Locations] => Array
    (
        [0] => United Kingdom
    )
    [Organizations] =>
    [Persons] => Array
    (
        [0] => Ben
        [1] => Sir Benjamin Hall
        [2] => Benjamin Caunt
    )
)
*/

$polyglot->getLocations();  // Array of Locations
$polyglot->getOrganizations(); // Array of organisations
$polyglot->getPersons(); // Array of people

$polyglot->getEntities();
/*                                              
Returns flat array of all entities
Array                                          
(                                              
    [0] => Ben                                 
    [1] => United Kingdom                      
    [2] => Sir Benjamin Hall                   
    [3] => Benjamin Caunt                      
)
*/

使用 Spacy 进行实体提取

$text = "Harvesters is a 1905 oil painting on canvas by the Danish artist Anna Ancher, a member of the artists' community known as the Skagen Painters.";

$nlp = new \Web64\Nlp\NlpClient('https://:6400/');
$entities = $nlp->spacy_entities( $text );
/*
Array
(
    [DATE] => Array
        (
            [0] => 1905
        )

    [NORP] => Array
        (
            [0] => Danish
        )

    [ORG] => Array
        (
            [0] => the Skagen Painters
        )

    [PERSON] => Array
        (
            [0] => Anna Ancher
        )
)
*/

默认使用英语。要使用其他语言,请确保已下载 Spacy 语言模型,并将语言作为第二个参数添加

$entities = $nlp->spacy_entities( $spanish_text, 'es' );

情感分析

$sentiment = $nlp->sentiment( "This is the worst product ever" );
// -1

$sentiment = $nlp->sentiment( "This is great! " );
// 1

// specify language in second parameter for non-english
$sentiment = $nlp->sentiment( $french_text, 'fr' );

邻近词(嵌入)

$nlp = new \Web64\Nlp\NlpClient('https://:6400/');
$neighbours = $nlp->neighbours('obama', 'en');
/*
Array
(
    [0] => Bush
    [1] => Reagan
    [2] => Clinton
    [3] => Ahmadinejad
    [4] => Nixon
    [5] => Karzai
    [6] => McCain
    [7] => Biden
    [8] => Huckabee
    [9] => Lula
)
*/

摘要

从长文本中提取简短摘要

$summary = $nlp->summarize( $long_text );

可读性

使用 python 版本的 Readability.js 进行文章提取

$nlp = new \Web64\Nlp\NlpClient( 'https://:6400/' );

// From URL:
$article = $nlp->readability('https://github.com/web64/nlpserver');

// From HTML:
$html = file_get_contents( 'https://github.com/web64/nlpserver' );
$article = $nlp->readability_html( $html );

/*
Array
(
    [article_html] => <div><h1>NLP Server</h1><p>Python 3 Flask web service for easy access to multilingual NLP tasks ...
    [short_title] => web64/nlpserver: NLP Web Service
    [text] => NLP Server Python 3 Flask web service for easy access to multilingual NLP tasks such as language detection  ...
    [title] => GitHub - web64/nlpserver: NLP Web Service
)
*/

CoreNLP - 实体提取 (NER)

CoreNLP 的 NER 质量比 Polyglot 好得多,但仅支持包括英语、法语、德语和西班牙语在内的少数语言。

在此处下载 CoreNLP 服务器(Java): https://stanfordnlp.github.io/CoreNLP/index.html#download

安装 CoreNLP

# Update download links with latest versions from the download page

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
unzip stanford-corenlp-full-2018-10-05.zip
cd stanford-corenlp-full-2018-02-27

# Download English language model:
wget http://nlp.stanford.edu/software/stanford-english-kbp-corenlp-2018-10-05-models.jar

运行 CoreNLP 服务器

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

# To run server in as a background process
nohup java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 &

当 CoreNLP 服务器运行时,您可以在端口 9000 上访问它: https://:9000/

有关运行 CoreNLP 服务器更多信息: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html

$corenlp = new \Web64\Nlp\CoreNlp('https://:9000/');
$entities = $corenlp->entities( $text );
/*
Array
(
    [NATIONALITY] => Array
        (
            [0] => German
            [1] => Turkish
        )
    [ORGANIZATION] => Array
        (
            [0] => Foreign Ministry
        )
    [TITLE] => Array
        (
            [0] => reporter
            [1] => journalist
            [2] => correspondent
        )
    [COUNTRY] => Array
        (
            [0] => Turkey
            [1] => Germany
        )
*/

概念图

Microsoft 短文本理解概念图: https://concept.research.microsoft.com/

找到提供的关键字的相关概念

$concept = new \Web64\Nlp\MsConceptGraph;
$res = $concept->get('php');
/*
Array
(
    [language] => 0.40301612064483
    [technology] => 0.19656786271451
    [programming language] => 0.14456578263131
    [open source technology] => 0.057202288091524
    [scripting language] => 0.049921996879875
    [server side language] => 0.044201768070723
    [web technology] => 0.031201248049922
    [server-side language] => 0.027561102444098
    [server side scripting language] => 0.023920956838274
    [feature] => 0.021840873634945
)
*/

Python 库

以下是在NLP服务器中用于NLP和数据提取任务的Python库。

其他PHP NLP项目

贡献

如果您有任何反馈或关于如何改进此包或文档的想法,请与我们联系。