gdbots / query-parser
将搜索查询转换为单词、短语、标签、提及等。
v3.0.0
2021-12-05 19:44 UTC
Requires
- php: >=8.1
Requires (Dev)
- phpunit/phpunit: ^9.5
- ruflin/elastica: ^7.1
README
将搜索查询转换为单词、短语、标签、提及等。
此库支持简单的搜索查询标准。它旨在支持用户可能输入到您网站搜索框或仪表板应用程序中最常见的搜索组合。它有意限制了您可能从 SQL 构建器、Lucene 等处期望的更复杂的嵌套功能。
分词器
除非被双引号包围,否则分词在空白处分割。以下是由 Tokenizer
提取的标记。
class Token implements \JsonSerializable { const T_EOI = 0; // end of input const T_WHITE_SPACE = 1; const T_IGNORED = 2; // an ignored token, e.g. #, !, etc. when found by themselves, don't do anything with them. const T_NUMBER = 3; // 10, 0.8, .64, 6.022e23 const T_REQUIRED = 4; // '+' const T_PROHIBITED = 5; // '-' const T_GREATER_THAN = 6; // '>' const T_LESS_THAN = 7; // '<' const T_EQUALS = 8; // '=' const T_FUZZY = 9; // '~' const T_BOOST = 10; // '^' const T_RANGE_INCL_START = 11; // '[' const T_RANGE_INCL_END = 12; // ']' const T_RANGE_EXCL_START = 13; // '{' const T_RANGE_EXCL_END = 14; // '}' const T_SUBQUERY_START = 15; // '(' const T_SUBQUERY_END = 16; // ')' const T_WILDCARD = 17; // '*' const T_AND = 18; // 'AND' or '&&' const T_OR = 19; // 'OR' or '||' const T_TO = 20; // 'TO' or '..' const T_WORD = 21; const T_FIELD_START = 22; // The "field:" portion of "field:value". const T_FIELD_END = 23; // when a field lexeme ends, i.e. "field:value". This token has no value. const T_PHRASE = 24; // Phrase (one or more quoted words) const T_URL = 25; // a valid url const T_DATE = 26; // date in the format YYYY-MM-DD const T_HASHTAG = 27; // #hashtag const T_MENTION = 28; // @mention const T_EMOTICON = 29; // see https://en.wikipedia.org/wiki/Emoticon const T_EMOJI = 30; // see https://en.wikipedia.org/wiki/Emoji
在扫描过程返回输出之前,删除了 T_WHITE_SPACE
和 T_IGNORED
标记。
QueryParser
默认查询解析器生成一个 ParsedQuery
对象,该对象可用于使用构建器生成针对特定搜索服务的查询。
基本用法
<?php use Gdbots\QueryParser\QueryParser; use Gdbots\QueryParser\Builder\XmlQueryBuilder; $parser = new QueryParser(); $builder = (new XmlQueryBuilder())->setHashtagFieldName('tags'); $result = $parser->parse('hello^5 planet:earth +date:2015-12-25 #omg'); echo $builder->addParsedQuery($result)->toXmlString();
生成以下 xml
<?xml version="1.0"?> <query> <word boost="5" rule="should_match">hello</word> <field name="planet"> <word rule="should_match_term">earth</word> </field> <field name="date" bool_operator="required" cacheable="true"> <date rule="must_match_term">2015-12-25</date> </field> <field name="tags" bool_operator="required" cacheable="true"> <hashtag rule="must_match_term">omg</hashtag> </field> </query>
要按类型获取 Node
对象的列表,请使用
<?php use Gdbots\QueryParser\Node\Hashtag; $result = $parser->parse('#hashtag1 AND #hashtag2'); $hashtags = $result->getNodesOfType(Hashtag::NODE_TYPE);