gdbots/query-parser

将搜索查询转换为单词、短语、标签、提及等。

v3.0.0 2021-12-05 19:44 UTC

This package is auto-updated.

Last update: 2024-09-06 01:49:38 UTC


README

Build Status

将搜索查询转换为单词、短语、标签、提及等。

此库支持简单的搜索查询标准。它旨在支持用户可能输入到您网站搜索框或仪表板应用程序中最常见的搜索组合。它有意限制了您可能从 SQL 构建器、Lucene 等处期望的更复杂的嵌套功能。

分词器

除非被双引号包围,否则分词在空白处分割。以下是由 Tokenizer 提取的标记。

class Token implements \JsonSerializable
{
    const T_EOI              = 0;  // end of input
    const T_WHITE_SPACE      = 1;
    const T_IGNORED          = 2;  // an ignored token, e.g. #, !, etc.  when found by themselves, don't do anything with them.
    const T_NUMBER           = 3;  // 10, 0.8, .64, 6.022e23
    const T_REQUIRED         = 4;  // '+'
    const T_PROHIBITED       = 5;  // '-'
    const T_GREATER_THAN     = 6;  // '>'
    const T_LESS_THAN        = 7;  // '<'
    const T_EQUALS           = 8;  // '='
    const T_FUZZY            = 9;  // '~'
    const T_BOOST            = 10; // '^'
    const T_RANGE_INCL_START = 11; // '['
    const T_RANGE_INCL_END   = 12; // ']'
    const T_RANGE_EXCL_START = 13; // '{'
    const T_RANGE_EXCL_END   = 14; // '}'
    const T_SUBQUERY_START   = 15; // '('
    const T_SUBQUERY_END     = 16; // ')'
    const T_WILDCARD         = 17; // '*'
    const T_AND              = 18; // 'AND' or '&&'
    const T_OR               = 19; // 'OR' or '||'
    const T_TO               = 20; // 'TO' or '..'
    const T_WORD             = 21;
    const T_FIELD_START      = 22; // The "field:" portion of "field:value".
    const T_FIELD_END        = 23; // when a field lexeme ends, i.e. "field:value". This token has no value.
    const T_PHRASE           = 24; // Phrase (one or more quoted words)
    const T_URL              = 25; // a valid url
    const T_DATE             = 26; // date in the format YYYY-MM-DD
    const T_HASHTAG          = 27; // #hashtag
    const T_MENTION          = 28; // @mention
    const T_EMOTICON         = 29; // see https://en.wikipedia.org/wiki/Emoticon
    const T_EMOJI            = 30; // see https://en.wikipedia.org/wiki/Emoji

在扫描过程返回输出之前,删除了 T_WHITE_SPACET_IGNORED 标记。

QueryParser

默认查询解析器生成一个 ParsedQuery 对象,该对象可用于使用构建器生成针对特定搜索服务的查询。

基本用法

<?php

use Gdbots\QueryParser\QueryParser;
use Gdbots\QueryParser\Builder\XmlQueryBuilder;

$parser  = new QueryParser();
$builder = (new XmlQueryBuilder())->setHashtagFieldName('tags');

$result = $parser->parse('hello^5 planet:earth +date:2015-12-25 #omg');
echo $builder->addParsedQuery($result)->toXmlString();

生成以下 xml

<?xml version="1.0"?>
<query>
  <word boost="5" rule="should_match">hello</word>
  <field name="planet">
    <word rule="should_match_term">earth</word>
  </field>
  <field name="date" bool_operator="required" cacheable="true">
    <date rule="must_match_term">2015-12-25</date>
  </field>
  <field name="tags" bool_operator="required" cacheable="true">
    <hashtag rule="must_match_term">omg</hashtag>
  </field>
</query>

要按类型获取 Node 对象的列表,请使用

<?php

use Gdbots\QueryParser\Node\Hashtag;

$result = $parser->parse('#hashtag1 AND #hashtag2');
$hashtags = $result->getNodesOfType(Hashtag::NODE_TYPE);