front / redisearch
CMS/框架无关的 RediSearch 客户端
Requires
- predis/predis: ^1.1
README
这是一个轻量级的、CMS/框架无关的 RediSearch 客户端
对于 RediSearch 版本 1.0,请使用 版本 1.0.2
如何使用
首先,您需要安装和配置 Redis 和 RediSearch。
然后通过 composer require front/redisearch
将此包添加到您的项目中。
对于所有类型的操作,第一步是连接到 Redis 服务器
$client = \FKRediSearch\RediSearch\Setup::connect( $server, $port, $password, 0 );
默认值如下: $server = '127.0.0.1'
,$port = 6379
,$password = null
,$database = 0
字段类型
Redisearch 支持一些字段类型
- TEXT [NOSTEM] [WEIGHT {weight}] [PHONETIC {matcher}] ex:
TEXT NOSTEM WEIGHT 5.6 PHONETIC {maycher}*
- NUMERIC [SORTABLE] [NOINDEX]
- GEO [SORTABLE] [NOINDEX]
- TAG [SEPARATOR {sep}]
*matcher: 将文本字段声明为 PHONETIC 将默认执行音译匹配。必选的 {matcher} 参数指定了音译算法和语言。以下匹配器被支持
dm:en
- 英文的双元音 dm:fr
- 法国的双元音 dm:pt
- 葡萄牙的双元音 dm:es
- 西班牙的双元音
创建索引
要创建索引,首先需要指定 indexName
,然后可以传递一些可选标志,最后是模式(字段)。
可用的命令有
$index = new Index( $client ); // Initiate Index $index->setIndexName('test'); // Set indexName
在 RediSearch 版本 2.0 中,索引的整体结构已经改变。与版本 1.0 不同,版本 1.0 将索引文档存储到 Redis HASHed 中,而版本 2.0 自动索引 Redis HASHes。这意味着
- 在创建索引时,您需要告诉 RediSearch 应该索引哪些 HASH 前缀
- 您想要索引的 HASH 键需要与索引指定的前缀相同
您还需要告诉 RediSearch 应该索引哪种 Redis 类型。目前只支持 HASH,但未来可能会添加其他数据类型。
$index->on('HASH'); $index->setPrefix('doc:');
在 RediSearch 的新版本中,索引可以临时创建,指定不活动的指定时间。每当索引被搜索或添加到时,内部空闲计时器都会重置。这在例如:电子商务网站中个别用户的订单历史记录中非常有用。
$index->setTemporary(3600); // The index will automatically be deleted after one hour. The underlying HASH values remain untouched.
RediSearch 默认支持基本的英语停用词。有两个选项:您可以选择完全禁用停用词排除或指定自己的列表
$index->setStopWords( 0 ); // This will disables excluding of stop-words // Or you can add your list of stop-words $stop_words = array('this', 'that', 'it', 'what', 'is', 'are'); $index->setStopWords( $stop_words );
其他标志包括
$index->setNoOffsetsEnabled( true ); // If set, term offsets won't be stored for documents (saves memory, does not allow exact searches or highlighting). $index->setNoFieldsEnabled( true ); // If set, field bits for each term won't be stored. This saves memory, does not allow filtering by specific fields. $index->setDefaultLang( 'english' ); // Set default language of the index. $index->setLangField( 'language' ); // Set document field used to specify individual documents language. $index->setScore(1); // Set a default score for that specific index. $index->setScoreField('scoreField'); // Set a field which specifies each individual documents score. $index->setPayloadField('payloadField'); // Document field that should be used as a binary safe payload string $index->setMaxFields(34); // Maximum allowed number of fields. This is to preserve memory use $index->setNoHighlight(); // This will deactivate highlighting feature. $index->setNoFreqs(); // Prevents storing term frequencies. $index->skipInitialScan(); // Cancels initial HASH scanning when index created.
最后,我们需要指定模式(字段、它们的类型和标志)
$index->addTextField( $name, $weight = 1.0, $sortable = false, $noindex = false) ->addTagField( $name, $sortable = false, $noindex = false, $separator = ',') ->addNumericField( $name, $sortable = false, $noindex = false ) ->addGeoField( $name, $noindex = false ); // Example $index->addTextField('title', 0.5, true, true) // `title TEXT WEIGHT 0.5, SORTABLE NOINDEX` ->addTextField('content') // `content TEXT WEIGHT 1.0` ->addTagField('category', true, true, ';') // `category TAG SEPARATOR ',' SORTABLE NOINDEX` ->addGeoField('location') // `location GEO` ->create(); // Finally, create the index.
一些注意事项:
- 字段权重是浮点值,仅适用于字段。默认值是 1.0
- 别忘了在最后调用
create()
方法
同义词
也支持同义词匹配
$synonym = array( array( 'boy', 'child', 'baby' ), array( 'girl', 'child', 'baby' ), array( 'man', 'person', 'adult' ) ); $index->synonymAdd( $synonym );
将文档添加到索引中(索引)
$document = new Document(); $document->setScore(0.3); // The document's rank based on the user's ranking. This must be between 0.0 and 1.0. Default value is 1.0 $document->setLanguage('english'); // This is usefull for stemming $document->setId('doc:123'); // This is the HASH key. RediSearch uses the prefix of document ID to index the document // And the fields $document->setFields( array( 'title' => 'Document title like post title', 'category' => 'search, fuzzy, synonym, phonetic', 'date' => strtotime( '2019-01-14 01:12:00' ), 'location' => new GeoLocation(-77.0366, 38.8977), ) ); $index->add( $document ); // Finally, add document to the index (in other term, index the document)
持久性
在索引文档后,索引可以写入磁盘,在出现网络问题的情况下,您不会丢失索引。
$index->writeToDisk();
查询构建器
QueryBuilder 类旨在帮助构建用于搜索和聚合的 redisearch 查询。它使用 addCondition()
、addGenericCondition()
和 addSubcondition()
方法向查询中添加条件,并支持部分搜索、模糊搜索、转义、分词和停用词。示例
$query = new QueryBuilder(); $query->setTokenize() ->setFuzzyMatching() ->addCondition('field', ['value1', 'value2'], 'OR'); $condition = $query->buildRedisearchQuery();
搜索
下面是如何进行搜索的方法
$search = new Query( $client, 'indexName' ); $results = $search ->sortBy( $fieldName, $order = 'ASC' ) ->geoFilter( $fieldName, $longitude, $latitude, $radius, $distanceUnit = 'km' ) ->numericFilter( $fieldName, $min, $max = null ) ->withScores() // If set, we also return the relative internal score of each document. this can be used to merge results from multiple instances ->withSortKey() // Returns the value of the sorting key ->verbatim() // if set, we do not try to use stemming for query expansion but search the query terms verbatim. ->withPayloads() // If set, we retrieve optional document payloads (see FT.ADD). the payloads follow the document id, and if WITHSCORES was set, follow the scores ->noStopWords() // If set, we do not filter stopwords from the query ->slop() // If set, we allow a maximum of N intervening number of unmatched offsets between phrase terms. (i.e the slop for exact phrases is 0) ->inKeys( $number, $keys ) // If set, we limit the result to a given set of keys specified in the list. the first argument must be the length of the list, and greater than zero. Non-existent keys are ignored - unless all the keys are non-existent. ->inFields( $number, $fields ) // If set, filter the results to ones appearing only in specific fields of the document, like title or URL. num is the number of specified field arguments ->limit( $offset, $pageSize = 10 ) // If set, we limit the results to the offset and number of results given. The default is 0 10 ->highlight( $fields, $openTag = '<strong>', $closeTag = '</strong>') ->summarize( $fields = array(), $fragmentCount = 3, $fragmentLength = 50, $separator = '...') // Use this option to return only the sections of the field which contain the matched text ->return( $fields ) // Use this keyword to limit which fields from the document are returned. num is the number of fields following the keyword. If num is 0, it acts like NOCONTENT. ->noContent() // If it appears after the query, we only return the document ids and not the content. This is useful if RediSearch is only an index on an external document collection ->search( $query, $documentsAsArray = false ); // By default, return values will be object, but if TRUE is passed as `$documentsAsArray` results will return as array
搜索结果上有两种方法可用,分别是
$results->getCount(); // Returns search results numbers $results->getDocuments(); // Returns search results object or array
删除索引
$index->drop($deleteHashes); // This method accepts one param, if it set, the underlying HASHes will be deleted as well.
注意:
- 只有 NUMBERIC、TAG 和 GEO 字段可以用作过滤器,并且在这些字段上不进行匹配操作。
待办事项:
- 添加对建议(自动完成)的支持。
- 实现单个文档删除
- 实现索引上的过滤
- 实现文档更新