front/redisearch

CMS/框架无关的 RediSearch 客户端

2.0.4 2021-05-04 17:47 UTC

This package is auto-updated.

Last update: 2024-09-11 10:29:33 UTC


README

这是一个轻量级的、CMS/框架无关的 RediSearch 客户端

对于 RediSearch 版本 1.0,请使用 版本 1.0.2

如何使用

首先,您需要安装和配置 RedisRediSearch

然后通过 composer require front/redisearch 将此包添加到您的项目中。

对于所有类型的操作,第一步是连接到 Redis 服务器

$client = \FKRediSearch\RediSearch\Setup::connect( $server, $port, $password, 0 );

默认值如下: $server = '127.0.0.1'$port = 6379$password = null$database = 0

字段类型

Redisearch 支持一些字段类型

  1. TEXT [NOSTEM] [WEIGHT {weight}] [PHONETIC {matcher}] ex: TEXT NOSTEM WEIGHT 5.6 PHONETIC {maycher}*
  2. NUMERIC [SORTABLE] [NOINDEX]
  3. GEO [SORTABLE] [NOINDEX]
  4. TAG [SEPARATOR {sep}]

*matcher: 将文本字段声明为 PHONETIC 将默认执行音译匹配。必选的 {matcher} 参数指定了音译算法和语言。以下匹配器被支持

dm:en - 英文的双元音 dm:fr - 法国的双元音 dm:pt - 葡萄牙的双元音 dm:es - 西班牙的双元音

创建索引

要创建索引,首先需要指定 indexName,然后可以传递一些可选标志,最后是模式(字段)。

可用的命令有

$index = new Index( $client ); // Initiate Index
$index->setIndexName('test'); // Set indexName

在 RediSearch 版本 2.0 中,索引的整体结构已经改变。与版本 1.0 不同,版本 1.0 将索引文档存储到 Redis HASHed 中,而版本 2.0 自动索引 Redis HASHes。这意味着

  1. 在创建索引时,您需要告诉 RediSearch 应该索引哪些 HASH 前缀
  2. 您想要索引的 HASH 键需要与索引指定的前缀相同

您还需要告诉 RediSearch 应该索引哪种 Redis 类型。目前只支持 HASH,但未来可能会添加其他数据类型。

$index->on('HASH');
$index->setPrefix('doc:');

在 RediSearch 的新版本中,索引可以临时创建,指定不活动的指定时间。每当索引被搜索或添加到时,内部空闲计时器都会重置。这在例如:电子商务网站中个别用户的订单历史记录中非常有用。

$index->setTemporary(3600); // The index will automatically be deleted after one hour. The underlying HASH values remain untouched. 

RediSearch 默认支持基本的英语停用词。有两个选项:您可以选择完全禁用停用词排除或指定自己的列表

$index->setStopWords( 0 ); // This will disables excluding of stop-words
// Or you can add your list of stop-words
$stop_words = array('this', 'that', 'it', 'what', 'is', 'are');
$index->setStopWords( $stop_words );

其他标志包括

$index->setNoOffsetsEnabled( true ); // If set, term offsets won't be stored for documents (saves memory, does not allow exact searches or highlighting).
$index->setNoFieldsEnabled( true ); // If set, field bits for each term won't be stored. This saves memory, does not allow filtering by specific fields.
$index->setDefaultLang( 'english' ); // Set default language of the index. 
$index->setLangField( 'language' ); // Set document field used to specify individual documents language.
$index->setScore(1); // Set a default score for that specific index. 
$index->setScoreField('scoreField'); // Set a field which specifies each individual documents score.
$index->setPayloadField('payloadField'); // Document field that should be used as a binary safe payload string
$index->setMaxFields(34); // Maximum allowed number of fields. This is to preserve memory use
$index->setNoHighlight(); // This will deactivate highlighting feature. 
$index->setNoFreqs(); // Prevents storing term frequencies. 
$index->skipInitialScan(); // Cancels initial HASH scanning when index created. 

最后,我们需要指定模式(字段、它们的类型和标志)

$index->addTextField( $name, $weight = 1.0, $sortable = false, $noindex = false)
      ->addTagField( $name, $sortable = false, $noindex = false, $separator = ',')
      ->addNumericField( $name, $sortable = false, $noindex = false )
      ->addGeoField( $name, $noindex = false );
      
// Example 
$index->addTextField('title', 0.5, true, true) // `title TEXT WEIGHT 0.5, SORTABLE NOINDEX`
    ->addTextField('content') // `content TEXT WEIGHT 1.0`
    ->addTagField('category', true, true, ';') // `category TAG SEPARATOR ',' SORTABLE NOINDEX`
    ->addGeoField('location') // `location GEO`
    ->create(); // Finally, create the index.

一些注意事项:

  • 字段权重是浮点值,仅适用于字段。默认值是 1.0
  • 别忘了在最后调用 create() 方法

同义词

也支持同义词匹配

$synonym = array(
  array( 'boy', 'child', 'baby' ),
  array( 'girl', 'child', 'baby' ),
  array( 'man', 'person', 'adult' )
);

$index->synonymAdd( $synonym );

将文档添加到索引中(索引)

$document = new Document();
$document->setScore(0.3); // The document's rank based on the user's ranking. This must be between 0.0 and 1.0. Default value is 1.0
$document->setLanguage('english'); // This is usefull for stemming
$document->setId('doc:123'); // This is the HASH key. RediSearch uses the prefix of document ID to index the document
// And the fields 
$document->setFields(
  array(
    'title'       => 'Document title like post title',
    'category'    => 'search, fuzzy, synonym, phonetic',
    'date'        => strtotime( '2019-01-14 01:12:00' ),
    'location'    => new GeoLocation(-77.0366, 38.8977),
  )
);

$index->add( $document ); // Finally, add document to the index (in other term, index the document)

持久性

在索引文档后,索引可以写入磁盘,在出现网络问题的情况下,您不会丢失索引。

$index->writeToDisk();

查询构建器

QueryBuilder 类旨在帮助构建用于搜索和聚合的 redisearch 查询。它使用 addCondition()addGenericCondition()addSubcondition() 方法向查询中添加条件,并支持部分搜索、模糊搜索、转义、分词和停用词。示例

$query = new QueryBuilder();
$query->setTokenize()
  ->setFuzzyMatching()
  ->addCondition('field', ['value1', 'value2'], 'OR');
$condition = $query->buildRedisearchQuery();

搜索

下面是如何进行搜索的方法

$search = new Query( $client, 'indexName' );
$results = $search
        ->sortBy( $fieldName, $order = 'ASC' )
        ->geoFilter( $fieldName, $longitude, $latitude, $radius, $distanceUnit = 'km' )
        ->numericFilter( $fieldName, $min, $max = null )
        ->withScores() // If set, we also return the relative internal score of each document. this can be used to merge results from multiple instances
        ->withSortKey() // Returns the value of the sorting key
        ->verbatim() // if set, we do not try to use stemming for query expansion but search the query terms verbatim.
        ->withPayloads() // If set, we retrieve optional document payloads (see FT.ADD). the payloads follow the document id, and if WITHSCORES was set, follow the scores
        ->noStopWords() //  If set, we do not filter stopwords from the query
        ->slop() // If set, we allow a maximum of N intervening number of unmatched offsets between phrase terms. (i.e the slop for exact phrases is 0)
        ->inKeys( $number, $keys ) // If set, we limit the result to a given set of keys specified in the list. the first argument must be the length of the list, and greater than zero. Non-existent keys are ignored - unless all the keys are non-existent.
        ->inFields( $number, $fields ) // If set, filter the results to ones appearing only in specific fields of the document, like title or URL. num is the number of specified field arguments
        ->limit( $offset, $pageSize = 10 ) // If set, we limit the results to the offset and number of results given. The default is 0 10
        ->highlight( $fields, $openTag = '<strong>', $closeTag = '</strong>') 
        ->summarize( $fields = array(), $fragmentCount = 3, $fragmentLength = 50, $separator = '...') // Use this option to return only the sections of the field which contain the matched text
        ->return( $fields ) // Use this keyword to limit which fields from the document are returned. num is the number of fields following the keyword. If num is 0, it acts like NOCONTENT.
        ->noContent() // If it appears after the query, we only return the document ids and not the content. This is useful if RediSearch is only an index on an external document collection
        ->search( $query, $documentsAsArray = false ); // By default, return values will be object, but if TRUE is passed as `$documentsAsArray` results will return as array

搜索结果上有两种方法可用,分别是

$results->getCount(); // Returns search results numbers
$results->getDocuments(); // Returns search results object or array

删除索引

$index->drop($deleteHashes); // This method accepts one param, if it set, the underlying HASHes will be deleted as well.

注意:

  • 只有 NUMBERIC、TAG 和 GEO 字段可以用作过滤器,并且在这些字段上不进行匹配操作。

待办事项:

  • 添加对建议(自动完成)的支持。
  • 实现单个文档删除
  • 实现索引上的过滤
  • 实现文档更新