tabuna/similar

通过识别一组句子中基于共享主题的相似字符串来解锁轻松分组的力量。

2.2.0 2022-02-23 22:18 UTC

This package is auto-updated.

Last update: 2024-09-12 23:43:02 UTC


README

Unit tests

这是一个PHP中用于识别相似字符串的基础库,无需使用机器学习。它可以让你从传递的句子集中获取一个主题的组。例如,像Google新闻那样结合不同出版物的新闻标题。

安装

在命令行中运行此命令

$ composer require esplora/similar

用法

我们需要通过传递一个闭包函数作为参数来创建一个对象,该闭包函数检查两个字符串是否相似

use Esplora\Similar\Similar;

$similar = new Similar(function (string $a, string $b) {
    similar_text($a, $b, $copy);

    return 51 < $copy;
});

注意,您不需要使用 similar_text。您可以使用其他实现,如 soundex 或其他。

然后我们必须调用 findOut 方法,传递一个包含字符串的一维数组

$similar->findOut([
    'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS',
    'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday',

    // Superfluous word
    'Can Trump win with ‘fantasy’ electors bid? State GOP says no',
]);

结果将只有一个包含标题的组

'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS',
'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday',

输入数组存储其键,以便您可以进行额外的处理

$similar->findOut([
  'kos' => "Trump acknowledges Biden's win in latest tweet",
  'foo' => 'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS',
  'baz' => 'Trump says Biden won but again refuses to concede',
  'bar' => 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday',
]);

结果将分为两个组

[
  'foo' => 'Elon Musk gets mixed COVID-19 test results as SpaceX launches astronauts to the ISS',
  'bar' => 'Elon Musk may have Covid-19, should quarantine during SpaceX astronaut launch Sunday',
],
[
  'baz' => 'Trump says Biden won but again refuses to concede',
  'kos' => "Trump acknowledges Biden's win in latest tweet",
],

对象

还可以传递对象以评估更复杂的条件。每个传递的对象必须能够通过 __toString() 方法转换为字符串。

$similar->findOut([
    new FixtureStingObject('Lorem ipsum dolor sit amet, consectetur adipiscing elit.'),
]);

许可证

MIT许可证(MIT)。请参阅 许可证文件 获取更多信息。