编程知识 cdmana.com

Elasticsearch exploration: partial matching

brief introduction

Official website address :https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

Partial matching Allows the user to specify a part of the search word and find all the words that contain that part of the fragment .

prefix Prefix query

Matches contain items with the specified prefix (not analyzed) The document of the field . Prefix query corresponds to Lucene Of PrefixQuery.

GET /_search
{
  "query": {
    "prefix": {
      "user.id": {
        "value": "ki"
      }
    }
  }
}
  • field:(Required, object) Field you wish to search.
  • value:(Required, string) Beginning characters of terms you wish to find in the provided<field>.
  • rewrite:(Optional, string) Method used to rewrite the query. For valid values and more information, see therewriteparameter.
  • case_insensitive:(Optional, Boolean) allows ASCII case insensitive matching of the value with the indexed field values when set to true. Default is false which means the case sensitivity of matching depends on the underlying field’s mapping.

speed up prefix queries

You can speed up prefix queries using theindex_prefixesmapping parameter. If enabled, Elasticsearch indexes prefixes between 2 and 5 characters in a separate field. This lets Elasticsearch run prefix queries more efficiently at the cost of a larger index.

You can use index_prefixes Map parameters to speed up prefix queries . If enabled ,Elasticsearch It will be indexed in a separate field 2 To 5 Prefix between characters . This makes Elasticsearch Prefix queries can be run more efficiently at the expense of larger indexes .

allow expensive queries

Prefix queries will not be executed ifsearch.allow_expensive_queriesis set to false. However, ifindex_prefixesare enabled, an optimised query is built which is not considered slow, and will be executed in spite of this setting.

If you will search.allow_expensive_queries Set to false, The prefix query is not performed . however , If enabled index_prefixes, An optimized query is constructed , The query is not slow , Despite this setting, the query will be executed .

Range Query Range queries

Returns documents that contain terms within a provided range.

GET /_search
{
  "query": {
    "range": {
      "age": {
        "gte": 10,
        "lte": 20,
        "boost": 2.0
      }
    }
  }
}
  • field:
  • gt:(Optional) Greater than.
  • gte:(Optional) Greater than or equal to.
  • lt:(Optional) Less than.
  • lte:(Optional) Less than or equal to.
  • format:(Optional, string) Date format used to convertdatevalues in the query.
    • By default, Elasticsearch uses thedateformatprovided in the<field>'s mapping. This value overrides that mapping format.
    • For valid syntax, seeformat.
  • relation:(Optional, string) Indicates how the range query matches values forrangefields. Valid values are:
    • INTERSECTS: Match documents with range field values that intersect the query scope .
    • CONTAINS: Match with documents whose range field values completely contain the query scope .
    • WITHIN: Match with documents whose range field values are completely within the query scope .
  • time_zone:(Optional, string) Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC.
    • Valid values are ISO 8601 UTC offsets, such as +01:00 or -08:00, and IANA time zone IDs, such as America/Los_Angeles.
    • For an example query using the time_zone parameter, see Time zone in range queries.
  • boost:

The range of numbers

{
    "query" : {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "price" : {
                        "gte" : 20,
                        "lt"  : 40
                    }
                }
            }
        }
    }
}

Date range

range Queries can also be applied to date fields :

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-07 00:00:00"
    }
}

When it uses the date field ,range Query support for Date calculation (date math) To operate , For example , If we want to find all documents with a timestamp in the past hour :

"range" : {
    "timestamp" : {
        "gt" : "now-1h"
    }
}

{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  }
}

This filter will always look for all documents with a timestamp in the past hour , Let the filter as a time The sliding window (sliding window) To filter documents .

Date calculation can also be applied to a specific time , It's not just like now Such placeholders . Just add a double sign after a certain date (||) It can be done by following a mathematical expression of date :

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-01 00:00:00||+1M"   # Before  2014  year  1  month  1  Riga  1  month (2014  year  2  month  1  Japan   zero hour )
    }
}

The date calculation is The calendar is related to (calendar aware) Of , So it doesn't just know the number of days per month , Also know the total number of days in a year ( Leap year ) Etc . For more details, please refer to : Time format reference document .

You can use thetime_zoneparameter to convertdatevalues to UTC using a UTC offset. For example:

You can use time_zone Parameters use UTC The offset converts the date value to UTC. for example :

GET /_search
{
  "query": {
    "range": {
      "timestamp": {
        "time_zone": "+01:00",  // Indicates that date values use a UTC offset of +01:00.      
        "gte": "2020-01-01T00:00:00", // With a UTC offset of +01:00, Elasticsearch converts this date to 2019-12-31T23:00:00 UTC.
        "lte": "now"     // The time_zone parameter does not affect the now value.             
      }
    }
  }
}

The formatted date will use the specified default format( Format ) analysis date( date ) Field , But you can pass format parameters to range ( Range ) Query to override the default format :

GET _search
{
    "query": {
        "range" : {
            "timestamp" : {
                "gte": "01/01/2012",
                "lte": "2013",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

String range

range Queries can also handle string fields , String range can be Dictionary order (lexicographically) Or alphabetical order (alphabetically). for example , The following strings are in dictionary order (lexicographically) Sort of :

Words in inverted index are in dictionary order (lexicographically) Arranged , That's why string ranges can be determined using this order .

If we want to find out from a To b( It doesn't contain ) String , It can also be used range The query syntax :

"range" : {
    "title" : {
        "gte" : "a",
        "lt" :  "b"
    }
}

The indexing of number and date fields makes it possible to efficiently calculate ranges . But strings are not , To filter the scope of its use ,Elasticsearch It's actually executed for every word item in the scope term filter , This is much slower than filtering the range of dates or numbers .

The string range is filtering Low base (low cardinality) Field ( There are only a few unique words ) Can work normally when , But the more unique words there are , The slower the string range is calculated .

Regexp Query( Regular )

egexp Allow regular expressions to do term Inquire about . Be careful regexp If not used correctly , It will bring serious performance pressure to the server . such as .* The opening query , Will match keywords in all inverted indexes , This is almost equivalent to a full table scan , It's going to be slow . So if you can , It's best to use regularization before , Add a matching prefix . If you use .*? perhaps + Will reduce the performance of the query .

GET /_search
{
  "query": {
    "regexp": {
      "user.id": {
        "value": "k.*y",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score"
      }
    }
  }
}
  • field:
  • value:
  • flags:(Optional, string) Enables optional operators for the regular expression. For valid values and more information, seeRegular expression syntax.
  • case_insensitive:(Optional, Boolean) allows case insensitive matching of the regular expression value with the indexed field values when set to true. Default is false which means the case sensitivity of matching depends on the underlying field’s mapping.
  • max_determinized_states:(Optional, integer) Maximum number of automaton states required for the query. Default is 10000.
  • rewrite:(Optional, string) Method used to rewrite the query. For valid values and more information, see therewriteparameter.

If you will search.allow_expensive_queries Set to false, The regular expression query will not be executed .

character

meaning

.

Can only refer to any character

*

Repeat the previous match (0 Times or more )

?

Repeat the previous match (0 Time or 1 Time )

+

Repeat the previous match (1 Times or more )

{
  "query": {
    "regexp": {
      "drugname.keyword": {
        "value": ".* Cedipine . Cyril .*",
        "flags_value": 65535,
        "max_determinized_states": 10000,
        "boost": 1
      }
    }
  }
}

BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
String reg = ".*" + CharacterUtil.replaceAllForRegexp(keyword) + ".*";

BoolQueryBuilder drugNameBuilder = QueryBuilders.boolQuery();
drugNameBuilder.must(QueryBuilders.regexpQuery("drugname.keyword", reg));

 remarks : If keyword There are special characters in it , You can manually replace special characters with (.)
q.replace("\"", ".").replace("?", ".").replace("*", ".").replace("^", ".").replace("$", ".").replace("+", ".").replace("|", ".").replace("{", ".")       .replace("}", ".").replace("[", ".") .replace("]", ".").replace("(", ".").replace(")", ".").replace("\\", ".")

Special characters :. ? + * | { } [ ] ( ) " \

Wildcard Query( wildcard )

And prefix Prefix queries have similar properties ,wildcard Wildcard query is also a bottom-level word based query , Unlike prefix queries, it allows you to specify a matching regular expression . It uses standard shell Wildcard query :? Match any character ,* matching 0 Or more characters .

The following search returns the document , among user.id The field contains the following ki Beginning and y The terminologies at the end . These matches can include kiy,kity or kimchy.

GET /_search
{
  "query": {
    "wildcard": {
      "user.id": {
        "value": "ki*y",
        "boost": 1.0,
        "rewrite": "constant_score"
      }
    }
  }
}
  • field:
  • value:(Required, string) Wildcard pattern for terms you wish to find in the provided <field>.This parameter supports two wildcard operators:
    • ?, which matches any single character
    • *, which can match zero or more characters, including an empty one
  • boost:
  • rewrite:
  • case_insensitive:

Wildcard queries will not be executed ifsearch.allow_expensive_queriesis set to false.

Original statement , This article is authorized by the author + Community publication , Unauthorized , Shall not be reproduced .

If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

版权声明
本文为[HLee]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224214851730W.html

Scroll to Top