geocurly / name-splitter
姓名分割工具
0.1
2020-05-29 19:53 UTC
Requires
- php: ^7.4
Requires (Dev)
- phpunit/phpunit: ^9.1
- symfony/var-dumper: ^5.0
This package is not auto-updated.
Last update: 2024-09-23 03:16:34 UTC
README
很遗憾,该工具仅支持西里尔字母姓名
有一个姓名分割工具。它接受输入字符串并将其解析为对象。
用法
<?php declare(strict_types=1); use NameSplitter\NameSplitter; $splitter = new NameSplitter(['enc' => 'CP1251']); $result = $splitter->split('Иванов Иван Иванович'); [$surname, $name, $middleName] = [ $result->getSurname(), $result->getName(), $result->getMiddleName(), ];
质量
NameSplitter的测试覆盖了大约13000个俄语姓名案例,准确率为99.65。每个案例都使用了多个模板,因此结果案例数量为124283。您可以使用自己的数据集运行测试(使用--verbose
选项以查看模板错误)
[aleksandr@aleksandr name-splitter]$ ./bin/name-split-test --file=$(realpath fio.csv)
TESTED TEMPLATES:
%Surname %Name %Middle
%Name %Middle %Surname
%Name %Middle
%Name %Surname
%Surname %Name
%Surname %StrictInitials
%StrictInitials %Surname
%Surname %SplitInitials
%SplitInitials %Surname
ACCURACY: 99.65
COUNT CASE TOTAL: 124283
COUNT CASE PASS: 123848
COUNT CASE ERROR: 435
fio.csv
文件的格式为
SomeSurname;SomeName;SomeMiddleName
问题
- 当姓氏与中间名匹配时(例如
Иван Иванович
),工具无法识别模板如%Name %Surname
。 - 当分割的姓名不在词典中时,某些模板可能无法正确工作。
决策
您可以使用前缀和后缀模板
<?php declare(strict_types=1); use NameSplitter\{ NameSplitter, Template\SimpleMatch, Contract\TemplateInterface as TPL, Contract\StateInterface }; $before = [ // for this case we explicitly match name parts with template new SimpleMatch([ TPL::SURNAME => 'Difficult Surname', TPL::NAME => 'Difficult Name' ]), static function(StateInterface $state) { // TODO there is will be your implementation return [ TPL::SURNAME => $surname ?? null, TPL::NAME => $name ?? null, ]; }, ]; // There are may be any callable types if they take to input the StateInterface $after = []; $splitter = new NameSplitter([], $before, $after); $result = $splitter->split('Difficult Surname Difficult Name');