zjttfs/zpb-redisearch

CMS/框架无关的 RediSearch 客户端

v1.0.1 2021-09-29 02:57 UTC

This package is auto-updated.

Last update: 2024-09-29 05:56:55 UTC


README

这是一个轻量级的、CMS/框架无关的 RediSearch 客户端

对于 RediSearch 版本 1.0,请使用 版本 1.0.2

如何使用

首先,您需要安装和配置 RedisRediSearch

然后通过 composer require front/redisearch 将此包添加到您的项目中。

对于所有类型的操作,第一步是连接到 Redis 服务器

$client = \FKRediSearch\RediSearch\Setup::connect( $server, $port, $password, 0 );

默认值如下: $server = '127.0.0.1'$port = 6379$password = null$database = 0

字段类型

Redisearch 支持一些字段类型

  1. TEXT [NOSTEM] [WEIGHT {weight}] [PHONETIC {matcher}] ex: TEXT NOSTEM WEIGHT 5.6 PHONETIC {maycher}*
  2. NUMERIC [SORTABLE] [NOINDEX]
  3. GEO [SORTABLE] [NOINDEX]
  4. TAG [SEPARATOR {sep}]

*matcher: 将文本字段声明为 PHONETIC 将默认在搜索中执行音韵匹配。必需的 {matcher} 参数指定音韵算法和使用的语言。以下是一些支持的匹配器

dm:en - 英语的双元音元音 dm:fr - 法语的双元音元音 dm:pt - 葡萄牙语的双元音元音 dm:es - 西班牙语的双元音元音

创建索引

要创建索引,首先需要指定 indexName,然后可以传递一些可选的标志,最后,模式(字段)。

可用的命令有

$index = new Index( $client ); // Initiate Index
$index->setIndexName('test'); // Set indexName

在 RediSearch 版本 2.0 中,索引的整体结构已经改变。与版本 1.0 不同,版本 1.0 将索引文档存储到 Redis HASHed 中,在版本 2.0 中,Redis HASHes 被自动索引。这意味着

  1. 在创建索引时,需要告诉 RediSearch 应该索引哪些 HASH 前缀
  2. 您想索引的 HASH 键需要具有与索引相同的指定前缀

您还需要告诉 RediSearch 应该索引哪种 Redis 类型。目前,仅支持 HASH,但未来可能添加其他数据类型。

$index->on('HASH');
$index->setPrefix('doc:');

在 RediSearch 的新版本中,索引可以根据指定的非活动时间创建临时索引。每次索引被搜索或添加时,内部空闲计时器都会重置。这对于例如:电子商务网站上单个用户的订单历史记录特别有用。

$index->setTemporary(3600); // The index will automatically be deleted after one hour. The underlying HASH values remain untouched. 

RediSearch 默认支持基本的英语停用词。有两个选项,您可以选择完全禁用停用词排除或指定自己的列表

$index->setStopWords( 0 ); // This will disables excluding of stop-words
// Or you can add your list of stop-words
$stop_words = array('this', 'that', 'it', 'what', 'is', 'are');
$index->setStopWords( $stop_words );

其他标志如下

$index->setNoOffsetsEnabled( true ); // If set, term offsets won't be stored for documents (saves memory, does not allow exact searches or highlighting).
$index->setNoFieldsEnabled( true ); // If set, field bits for each term won't be stored. This saves memory, does not allow filtering by specific fields.
$index->setDefaultLang( 'english' ); // Set default language of the index. 
$index->setLangField( 'language' ); // Set document field used to specify individual documents language.
$index->setScore(1); // Set a default score for that specific index. 
$index->setScoreField('scoreField'); // Set a field which specifies each individual documents score.
$index->setPayloadField('payloadField'); // Document field that should be used as a binary safe payload string
$index->setMaxFields(34); // Maximum allowed number of fields. This is to preserve memory use
$index->setNoHighlight(); // This will deactivate highlighting feature. 
$index->setNoFreqs(); // Prevents storing term frequencies. 
$index->skipInitialScan(); // Cancels initial HASH scanning when index created. 

最后,我们需要指定模式(字段、它们的类型和标志)

$index->addTextField( $name, $weight = 1.0, $sortable = false, $noindex = false)
      ->addTagField( $name, $sortable = false, $noindex = false, $separator = ',')
      ->addNumericField( $name, $sortable = false, $noindex = false )
      ->addGeoField( $name, $noindex = false );
      
// Example 
$index->addTextField('title', 0.5, true, true) // `title TEXT WEIGHT 0.5, SORTABLE NOINDEX`
    ->addTextField('content') // `content TEXT WEIGHT 1.0`
    ->addTagField('category', true, true, ';') // `category TAG SEPARATOR ',' SORTABLE NOINDEX`
    ->addGeoField('location') // `location GEO`
    ->create(); // Finally, create the index.

一些注意事项:

  • 字段权重是浮点值,仅适用于字段。默认值是 1.0
  • 不要忘记在最后调用 create() 方法

同义词

同义词匹配也受到支持

$synonym = array(
  array( 'boy', 'child', 'baby' ),
  array( 'girl', 'child', 'baby' ),
  array( 'man', 'person', 'adult' )
);

$index->synonymAdd( $synonym );

将文档添加到索引中(索引)

$document = new Document();
$document->setScore(0.3); // The document's rank based on the user's ranking. This must be between 0.0 and 1.0. Default value is 1.0
$document->setLanguage('english'); // This is usefull for stemming
$document->setId('doc:123'); // This is the HASH key. RediSearch uses the prefix of document ID to index the document
// And the fields 
$document->setFields(
  array(
    'title'       => 'Document title like post title',
    'category'    => 'search, fuzzy, synonym, phonetic',
    'date'        => strtotime( '2019-01-14 01:12:00' ),
    'location'    => new GeoLocation(-77.0366, 38.8977),
  )
);

$index->add( $document ); // Finally, add document to the index (in other term, index the document)

持久性

在索引文档后,索引可以写入磁盘,并且在出现网络问题时,您不会丢失索引。

$index->writeToDisk();

查询构建器

类 Query\QueryBuilder 被设计用来帮助构建用于搜索和聚合的 redisearch 查询。它使用 addCondition()addGenericCondition()addSubcondition() 方法向查询中添加条件,并允许使用部分搜索、模糊搜索、转义、分词和停用词。示例

$query = new QueryBuilder();
$query->setTokenize()
  ->setFuzzyMatching()
  ->addCondition('field', ['value1', 'value2'], 'OR');
$condition = $query->buildRedisearchQuery();

搜索

以下是搜索的方法

$search = new Query( $client, 'indexName' );
$results = $search
        ->sortBy( $fieldName, $order = 'ASC' )
        ->geoFilter( $fieldName, $longitude, $latitude, $radius, $distanceUnit = 'km' )
        ->numericFilter( $fieldName, $min, $max = null )
        ->withScores() // If set, we also return the relative internal score of each document. this can be used to merge results from multiple instances
        ->withSortKey() // Returns the value of the sorting key
        ->verbatim() // if set, we do not try to use stemming for query expansion but search the query terms verbatim.
        ->withPayloads() // If set, we retrieve optional document payloads (see FT.ADD). the payloads follow the document id, and if WITHSCORES was set, follow the scores
        ->noStopWords() //  If set, we do not filter stopwords from the query
        ->slop() // If set, we allow a maximum of N intervening number of unmatched offsets between phrase terms. (i.e the slop for exact phrases is 0)
        ->inKeys( $number, $keys ) // If set, we limit the result to a given set of keys specified in the list. the first argument must be the length of the list, and greater than zero. Non-existent keys are ignored - unless all the keys are non-existent.
        ->inFields( $number, $fields ) // If set, filter the results to ones appearing only in specific fields of the document, like title or URL. num is the number of specified field arguments
        ->limit( $offset, $pageSize = 10 ) // If set, we limit the results to the offset and number of results given. The default is 0 10
        ->highlight( $fields, $openTag = '<strong>', $closeTag = '</strong>') 
        ->summarize( $fields = array(), $fragmentCount = 3, $fragmentLength = 50, $separator = '...') // Use this option to return only the sections of the field which contain the matched text
        ->return( $fields ) // Use this keyword to limit which fields from the document are returned. num is the number of fields following the keyword. If num is 0, it acts like NOCONTENT.
        ->noContent() // If it appears after the query, we only return the document ids and not the content. This is useful if RediSearch is only an index on an external document collection
        ->search( $query, $documentsAsArray = false ); // By default, return values will be object, but if TRUE is passed as `$documentsAsArray` results will return as array

有两种方法可用于应用搜索结果

$results->getCount(); // Returns search results numbers
$results->getDocuments(); // Returns search results object or array

删除索引

$index->drop($deleteHashes); // This method accepts one param, if it set, the underlying HASHes will be deleted as well.

注意:

  • 只有NUMBERIC、TAG和GEO字段可以作为过滤条件,它们不支持匹配操作。

待办事项:

  • 添加对建议(自动完成)的支持。
  • 实现单个文档的删除。
  • 在索引上实现过滤。
  • 实现文档更新。