jity/tag-generator

从给定文本中生成标签。

v0.2.1 2012-11-24 10:38 UTC

This package is not auto-updated.

Last update: 2024-09-22 03:27:33 UTC


README

Build Status

关于

此包是Jity项目的一部分。借助此生成器,您可以将任何文本转换为有用的标签集合。

安装

将JityTagGenerator添加到您的composer.json中

{
    "require": {
        "jity/tag-generator": "dev-master"
    }
}

下载包

php composer.phar update

将JityTagGenerator添加到您的AppKernel.php中

public function registerBundles()
{
    $bundles = array(
        ...
        new Jity\TagGeneratorBundle\JityTagGeneratorBundle(),
        ...
    );
    ...
}

用法

这是一个如何使用TagGenerator的简单示例。

use Jity\Tag\TagGenerator,
    Jity\Tag\Filter\Score,
    Jity\Tag\Filter\ScoreGroup,
    Jity\Tag\Filter\Length,
    Jity\Tag\Filter\Occurrence,
    Jity\Tag\Filter\Dictionary,
    Jity\Tag\Filter\Capitalized,
    Jity\Tag\Filter\Uppercase,
    Jity\Tag\Filter\Camelcase,
    Jity\Tag\Filter\Regex;

/* ------------------------------------------------------ */
/* - Configuration */
/* ------------------------------------------------------ */

// Instantiate a new Generator
$generator = new TagGenerator();

// Configure all Filters
$generator

    /* Remove words shorter than 3 chars */
    ->addFilter(
        new Length(1, true, array(
            'min' => 2
        ))
    )

    /* Remove most useless words from collection (stop-words) */
    ->addFilter(
        new Dictionary(1, true, array(
            'match'         => true,
            'casesensitive' => false,
            'dictionaries'  => array(
                'german'    => array(
                    'adjektive',
                    'verben',
                    'klein',
                    'fixwords'
                )
            )
        ))
    )  

    /* Score occurrence of remaining words */
    ->addFilter(
        new Occurrence(5)
    ) 

    /* Score uppercased words */
    ->addFilter(new Uppercase(15))

    /* Score camelcased words */
    ->addFilter(new Camelcase(15))

    /* Score capitalized words */
    ->addFilter(new Capitalized(5));

// Receive the collection of tags
$tags = $generator->getTags('Lorem ipsum etc');

开发

编写自己的过滤器

要完成此操作,您需要实现Jity\Tag\Filter\FilterInterface或扩展Jity\Tag\Filter\AbstractFilter。一个良好且简单的示例是Jity\Tag\Filter\Uppercase过滤器。只需看看这个。

重新编译字典

转到resources/dictionaries/LANG/source并运行

for i in stopwords fixwords adjektive verben compound klein verben worte; do cat source/${i}*.txt | ../compiler.sh "$i"; done