shdev / phpflashtext
flashtext python 实现的端口
1.1.7
2019-07-22 08:56 UTC
Requires
- php: >=5.6.0
Requires (Dev)
- php-coveralls/php-coveralls: ^2.0
- phpunit/phpunit: ^5.7
- symfony/stopwatch: ^3.4
- symfony/var-dumper: ^3.4
This package is not auto-updated.
Last update: 2024-09-29 05:54:42 UTC
README
它是从优秀的python项目https://github.com/vi3k6i5/flashtext移植而来,算法内部细节请参考那里。
此算法允许您一次提取或替换多个关键词。如果您处理300个关键词,每个有5种变体,正则表达式方法比flashtext方法慢。对于1000个关键词,每个有5种变体,正则表达式无法构建。
在PHP 5.6中使用正则表达式非常慢。在新版本中表现更好。
安装
composer require shdev/phpflashtext
使用方法
<?php use Shdev\FlashText\KeywordProcessor; $keywordProcessor= new KeywordProcessor(); $keywords = [ 'java' => ['java_2e', 'java programing'], 'product management' => ['product management techniques', 'product management'], ]; $keywordProcessor->addKeywordsFromAssocArray($keywords); $sentence = 'I know java_2e and product management techniques'; $keywordsExtracted = $keywordProcessor->extractKeywords($sentence); // $keywordsExtracted = ['java', 'product management'] $keywordsExtractedWithSpanInfo = $keywordProcessor->extractKeywords($sentence, true); // $keywordsExtractedWithSpanInfo = [ // ['java', 7, 14], // ['product management', 19, 48], //] $sentenceNew = $keywordProcessor->replaceKeywords($sentence); // $sentenceNew = 'I know java and product management';
引用
关于FlashText算法的原版论文发布在arXiv。
@ARTICLE{2017arXiv171100046S,
author = {{Singh}, V.},
title = "{Replace or Retrieve Keywords In Documents at Scale}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1711.00046},
primaryClass = "cs.DS",
keywords = {Computer Science - Data Structures and Algorithms},
year = 2017,
month = oct,
adsurl = {http://adsabs.harvard.edu/abs/2017arXiv171100046S},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
发布在Medium freeCodeCamp上的文章。