mehrdad-dadkhah/php-persian-natural-language-processor

基于 hazm 的波斯语文本处理器简单 php 和 python 包装器

1.0.5 2020-09-07 14:21 UTC

This package is auto-updated.

Last update: 2024-09-08 00:14:37 UTC


README

简单 php 和 python 包装器,基于 hazm 波斯语文本处理器。

Software License Packagist Version

系统需求

安装 hazm

如果没有 python

sudo apt install python

然后

sudo apt install python-pip

然后

pip install hazm

安装

composer require mehrdad-dadkhah/php-persian-natural-language-processor

使用

PHP

use MehrdadDadkhah\Language\PersianLanguageProcessor;

$parser = new PersianLanguageProcessor();

$parser->allNLP('سلام. این یک متن تست است. موفق باشید');

Python

python /path/to/pr/processor.py allNLP json.dumps('سلام. این یک متن تست است. موفق باشید')

及其结果

array:7 [▼
  "chunksGroup" => array:2 [▼
    "main" => "[سلام NP] . [این یک متن تست NP] [است VP] . [موفق ADJP] [باشید VP]"
    "normalized" => "[سلام NP] . [این یک متن تست NP] [است VP] . [موفق ADJP] [باشید VP]"
  ]
  "postTags" => array:2 [▼
    "main" => array:10 [▶]
    "normalized" => array:10 [▼
      0 => array:2 [▶]
      1 => array:2 [▶]
      2 => array:2 [▶]
      3 => array:2 [▶]
      4 => array:2 [▼
        0 => "متن"
        1 => "N"
      ]
      5 => array:2 [▶]
      6 => array:2 [▶]
      7 => array:2 [▶]
      8 => array:2 [▶]
      9 => array:2 [▶]
    ]
  ]
  "stem" => array:2 [▼
    "main" => array:4 [▶]
    "normalized" => array:4 [▼
      "ADV" => []
      "N" => array:2 [▶]
      "Ne" => []
      "V" => array:3 [▶]
    ]
  ]
  "wordTokenize" => array:2 [▼
    "main" => array:10 [▶]
    "normalized" => array:10 [▼
      0 => "سلام"
      1 => "."
      2 => "این"
      3 => "یک"
      4 => "متن"
      5 => "تست"
      6 => "است"
      7 => "."
      8 => "موفق"
      9 => "باشید"
    ]
  ]
  "lemmatized" => array:2 [▼
    "main" => array:4 [▼
      "ADV" => []
      "N" => array:2 [▼
        0 => "سلام"
        1 => "متن"
      ]
      "Ne" => []
      "V" => array:3 [▼
        0 => "تست"
        1 => "است"
        2 => "بود#باش"
      ]
    ]
    "normalized" => array:4 [▼
      "ADV" => []
      "N" => array:2 [▼
        0 => "سلام"
        1 => "متن"
      ]
      "Ne" => []
      "V" => array:3 [▼
        0 => "تست"
        1 => "است"
        2 => "بود#باش"
      ]
    ]
  ]
  "normalized" => "سلام. این یک متن تست است. موفق باشید"
  "sentTokenize" => array:2 [▼
    "main" => array:3 [▶]
    "normalized" => array:3 [▼
      0 => "سلام."
      1 => "این یک متن تست است."
      2 => "موفق باشید"
    ]
  ]
]

函数

  • normilizeText(string $text)
  • sentTokenizeText(string $text)
  • wordTokenizeText(string $text)
  • postTagText(string $text)
  • chunksText(string $text)
  • getChunksGroup(string $text)
  • stemText(string $text)
  • lemmatizeText(string $text)
  • allNLP(string $text)

allNLP 函数调用所有其他函数并返回所有结果。

致谢

用途

许可证

php-persian-natural-language-processor 在 GPLv3 许可证下发布。