rizanola/draconic

全文搜索的即时包。支持拼写纠正和词根提取。

1.2.1 2024-09-23 07:21 UTC

This package is auto-updated.

Last update: 2024-09-23 07:21:57 UTC


README

Draconic 是一个简单且相对轻量级的网站全文搜索系统。它唯一需要的扩展是 sqlite,这是 PHP 默认包含的。

安装

Draconic 通过 Composer 提供

composer require rizanola/draconic

使用方法

使用方法相当直接

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Entry;
use Rizanola\Draconic\Section;

// Create a new entry to track
$entry = new Entry
(
    // This is the ID, it can be a string, int or float. This is used to uniquely identify the entry
    "test-entry", 
    
     // This is the entry type, results can be filtered by type e.g. you might have a "product" or an "article"
    "test",
    
    // Sections consist of a heading, a priority and an optional label. Sections with a higher priority are weighted
    // higher in search results, so a query that matches the title of one article and the content of another can display
    // the article with the matched title higher in the search results
    new Section("Test Heading", 2, "heading"), 
    new Section("Test content", 1, "content")
);

// Make a new draconic object
$draconic = new Draconic("/path/to/store/the/sqlite/database.db");

// Insert the new entry. If an entry with that ID already exists in the database, it will be replaced with the new
// entry.
$draconic->addOrUpdateEntries([$entry]);

// Do a search. Draconic will automatically manage typos and stemming. Draconic also supports quoted words, for exact
// matches. The second parameter is the type, which you can use to filter for just one type of entry.
$results = $draconic->search('"test" content', "test");

// Remove the entry from the database
$draconic->removeEntries([$entry->id]);

注意

Draconic 设计用于从纯文本中提取单词。如果你要插入 HTML,请考虑先去除标签和解码实体。例如 html_entity_decode(strip_tags($content))

Draconic 通过从插入内容和搜索查询中的每个单词中删除单个字符来检测拼写错误。这使我们能够捕获四种拼写错误类型

额外字符:如果用户输入 gfram 且内容包含 gram,则搜索查询词的一个变体将是 gram

缺失字符:如果用户输入 gam 且内容包含 gram,则内容词的一个变体将是 gram

替换字符:如果用户输入 fram 且内容包含 gram,则搜索查询词的一个变体将是 ram,内容词的一个变体也将是 ram

交换字符:如果用户输入 garm 且内容包含 gram,则搜索查询词的一个变体将是 gam,内容词的一个变体也将是 gam

Draconic 支持引号内的单词、排除的单词和可选单词

  • "test search":此查询的结果必须包含单词 "test" 和 "search",顺序和拼写必须完全相同。
  • test -search:此查询的结果必须包含 "test",但不能包含 "search"。
  • test|search:此查询的结果必须包含 "test" 或 "search"。

自定义

Draconic 使用自己的逻辑来过滤和排序结果,但有时你需要一些更定制的功能

元数据

您可以为 Entry 添加元数据,这些元数据可用于自定义过滤和排序

<?php
use Rizanola\Draconic\Entry;
use Rizanola\Draconic\Section;

$entry = new Entry("test-entry", null, 
    new Section("test-section")
);

// Metadata is stored as a JSON object, so setMetadata() accepts most values for the second argument
$entry->setMetadata("important", true);

过滤

默认的过滤功能是过滤掉不包含所有搜索词的结果,不包含所有子短语的结果,或包含任何排除词的结果。但是,您也可以编写自己的过滤逻辑。

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Matching\Result;

$draconic = new Draconic(":memory:");

// Add a filter that will hide results that aren't important
$draconic->filterCallable = function(array $words, Result $result) use($draconic): bool
{
    // If a result isn't important, return false
    if(!$result->metadata->important) return false;
    
    // Otherwise, use Draconic's native filtering to filter out poor matches
    return $draconic->filterResult($words, $result);
};

排序

默认的排序功能是首先根据匹配词的接近程度进行排序,然后根据词所在部分的优先级进行排序。但是,您也可以编写自己的排序逻辑。

<?php
use Rizanola\Draconic\Draconic;
use Rizanola\Draconic\Matching\Result;

$draconic = new Draconic(":memory:");

// Add a filter that will display important results first
$draconic->sortCallable = function(array $words, Result $first, Result $second) use($draconic): int
{
    // If one result is more important, then that result should come first
    $importanceComparison = $second->metadata->important <=> $first->metadata->important;
    if($importanceComparison !== 0) return $importanceComparison;
    
    // Otherwise, use Draconic's default sorting to sort matches
    return $draconic->sortResults($words, $first, $second);
};