编程知识 cdmana.com

be curious?! Elasticsearch 25 must know default values

. :

In the technical exchange group, some partners mentioned :“es Node default 1000 The limitation of a slice ”? And this led me to think about Elasticsearch The default value concerns .

It doesn't matter if I search : The chat record involves “ Default ” The discussion of key words is close to 400 Many places .

These default values for architecture selection 、 Developing actual combat 、 Operation and maintenance troubleshooting performance problems have a good reference value , Although official documents have detailed discussions , But scattered in all angles .

In an instinctive curiosity , I think it's very necessary to sort out Elasticsearch The most commonly used default values are applicable to scenarios 、 Parameters 、 Default size 、 static state / Dynamic parameter type 、 Practical suggestions and other knowledge points .

Nothing else , Let more people build the whole picture ahead of time ( relative ) cognition 、 Little detours .

0、 Parameter types and the difference between static and dynamic parameters ?

0.1 Parameter type

The parameter types are : Cluster level parameters 、 Index level 、Maping Level parameters, etc .

0.1.1 Cluster level parameters

  • give an example 1 :cluster.max_shards_per_node

The prefix is :cluster.*, The modification takes effect for the cluster .

  • give an example 2:indices.query.bool.max_clause_count

Need to be in : elasticsearch.yml Set... In the configuration file , restart ES take effect .

0.1.2 Index level parameters

  • give an example :index.number_of_shards

  • The prefix is :index.*, The changes take effect for the index .

0.2 Distinguish between static and dynamic parameters

  • Elasticsearch The number of primary partitions is after the index is created , Cannot be modified ( Unless reindex)

index.number_of_shards It's a static parameter .

  • But the number of copies is , You can use it dynamically :update-index-settings API Adjust at will .

index.number_of_replicas It's a dynamic parameter .

The following contents are respectively from : At the cluster level 、 Index level 、 Mapping level 、 Other commonly used step-by-step explanation .

1、ES colony bool Type supports the maximum number of clauses by default ?

  • Applicable scenario :N Multi clause bool Combination query , Realize the function similar to rule filtering .

  • Parameters :indices.query.bool.max_clause_count.

  • Parameter type : Static parameters ( Need to be in elasticsearch.yml Set in )

  • Default maximum :1024.

  • Reasons for restriction : In order to prevent too many search clauses and occupy too much CPU And memory , It leads to the decrease of cluster performance .

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html

2、ES Cluster data nodes support the default number of partitions ?

  • Applicable scenario : Large amount of data cluster partition selection .

  • Parameters :cluster.max_shards_per_node

  • Default maximum :1000(7.X After version ).

  • Expanding knowledge :(1) Very large scale clusters encounter this problem :

1) The number of slices that each node can store is directly proportional to the size of heap memory available .

2)Elastic Official blog posts suggest : The configuration ratio of heap memory and fragmentation is 1:20, give an example :30GB Heap memory , Most can have 600 A shard .

https://www.elastic.co/guide/en/elasticsearch/reference/7.0/misc-cluster.html#cluster-shard-limit

https://github.com/elastic/kibana/issues/35529

(2) It's impossible to allocate :

1) Too many slices , Write to enlarge , Lead to  bulk queue Full , The rejection rate goes up ;

2) After a certain amount of data , Too few slices , Unable to make full use of multi node resources , Machine resources are not balanced .

3、ES colony index_buffer What's the default scale ?

  • Applicable scenario : The index buffer in heap memory is used to store the documents of the new index . Fill it up , The document in the buffer will be written to a segment on disk . It divides between all the slices on the node .

  • Parameters :

(1) indices.memory.index_buffer_size

(2) indices.memory.min_index_buffer_size

(3) indices.memory.max_index_buffer_size

  • Parameter type : Static parameters ( Need to be in elasticsearch.yml Set in )

  • The default value is :

(1)indices.memory.index_buffer_size: 10%

(2)indices.memory.min_index_buffer_size : 48 Mb

  • Use advice :

(1) It must be configured on each data node in the cluster .

(2) Write one of the preferred optimization parameters in optimization , Helps improve write performance and stability .

https://www.elastic.co/guide/en/elasticsearch/reference/current/indexing-buffer.html

4、ES Default disk usage 85% Write data is no longer supported ?

  • Applicable scenario : One of the parameters for partitioning based on the disk , Control disk utilization low warning water mark value .

  • Parameters :cluster.routing.allocation.disk.watermark.low/high/flood_stage

  • The default value is

(1)cluster.routing.allocation.disk.watermark.low:85%

(2)cluster.routing.allocation.disk.watermark.high:90%

(3)cluster.routing.allocation.disk.watermark.flood_stage:95%

  • Parameter type : Cluster dynamic parameters

  • Use advice

(1)85%: Writing is prohibited ;90%: Index fragmentation is migrated to other available nodes ;95%: The index is read-only .

(2) Disk utilization is also one of the core indicators of monitoring .

5、ES colony default gc The way ?

  • Applicable scenario : Write to the minimum searchable interval ( Company s).

  • Default parameters :

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
  • Use advice

(1) The official advice :

at present , We still think CMS The garbage collector is the best choice for most deployment , But since ES 6.5.0( If in JDK 11 Or later ) since , We also support G1GC.

https://github.com/elastic/elasticsearch/issues/44321

(2) Configuration location :jvm.options, Optimization reference wood Uncle suggested : Change to

-XX:+UseG1GC
-XX:MaxGCPauseMillis=50

among -XX:MaxGCPauseMillis It's the highest control expectation GC Duration , The default value is 200ms , If online business features are relevant to GC Pauses are very sensitive , You can set it lower . however If this value is set too small , It may bring higher cpu Consume .

G1 In case of normal operation of the cluster, it can be reduced G1 The impact of pause on service delay is still very effective , But if it is GC Cause the cluster to get stuck , Well, it's possible to change G1 It can't fundamentally solve the problem . It's usually a cluster data model or Query Need to optimize .

https://elasticsearch.cn/question/4589

6、ES Index default primary partition size ?

  • Applicable scenario : data storage .

  • Parameters :index.number_of_shards

  • Parameter type : Static parameters .

  • The default value is :1(7.X edition , The earlier version was 5); The maximum number of partitions supported in a single index :1024.

  • Use advice :

(1) This value can only be set when an index is created .

(2) Single index 1024 The maximum number of slices is a security limit , It can prevent cluster instability due to resource allocation problems .

(3) You can specify... On each node export ES_JAVA_OPTS =“-Des.index.max_number_of_shards = 128” System properties to modify this restriction .

7、ES The default index compression algorithm is ?

  • Applicable scenario : Write data compression .

  • Parameters :index.codec

  • Parameter type : Static parameters .

  • The default value is :LZ4

  • Use advice :

(1) You can set it to best_compression, It USES DEFLATE To get higher compression ratio , But the cost is that the performance of the stored fields is slow .

(2) Don't pursue compression efficiency , Pursue the user recommendation of low disk occupancy ratio best_compression Compress .

8、ES Index default number of replica partitions ?

  • Applicable scenario : Ensure high availability of business data .

  • Parameters :index.number_of_replicas

  • Parameter type : Dynamic parameters

  • The default value is :1

  • Use advice :

Set up the replica reasonably according to the business needs , Based on data security considerations , It is recommended that the replica should at least set 1.

9、ES The default refresh rate of the index ?

  • Applicable scenario : Write to the minimum searchable interval ( Company s).

  • Parameters :index.refresh_interval

  • Parameter type : Dynamic parameters .

  • Default minimum :1s.

  • Use advice : For business scenarios with low real-time requirements and want to optimize writing , It is suggested to increase the refresh rate according to the actual business situation .

10、ES Indexes terms The default maximum supported length is ?

  • Applicable scenario :Terms query.

  • Parameters :index.max_terms_count

  • Parameter type : Dynamic parameters

  • Default maximum :65536

  • Use advice : Generally, this maximum value will not be exceeded .

11、ES Index default paging returns the maximum number of entries ?

  • Applicable scenario : The depth of the search page .

  • Parameters :index.max_result_window

  • Parameter type : Dynamic parameters .

  • Default maximum :10000.

  • Use advice :

(1) The mechanism of deep page flipping , It's decided that it's going to be slower . Except for special business needs , It is not recommended to change the default value , You can refer to Baidu and google The implementation of the .

(2) All data traversal recommended scroll API. Only page backward recommendation is supported :Search After API.

12、ES Is it necessary to set the index default pipeline ?

  • Applicable scenario : The index is written to the data link by default plus ETL operation .

  • Parameters :index.default_pipeline

  • Parameter type : Dynamic parameters

  • The default value is : Custom pipeline

  • Use advice :

(1) Combined with the actual business needs , Some basics need to be ETL Function suggestion plus .

(2) If not index.default_pipeline It's fine too ,update_by_query + Customize pipeline Combination can also be achieved . however (1) It's more comprehensive 、 A concise plan .

13、ES Indexes Mapping The maximum number of fields is supported by default ?

  • Use scenarios : Prevent indexing Maping It increases infinitely horizontally , Cause memory leak and other exceptions .

  • Parameters :index.mapping.total_fields.limit

  • Parameter type : Dynamic parameters

  • Default maximum :1000

  • Use advice ; Modification is not recommended

14、ES Indexes Mapping The default maximum depth of the field ?

  • Use scenarios : Prevent indexing Maping It grows infinitely vertically , Cause exception .

  • Parameters :index.mapping.depth.limit

  • Parameter type : Dynamic parameters

  • Default maximum :20

  • Use advice ; Modification is not recommended

  • Calculation basis : for example , If all fields are defined at the root object level , Then the depth is 1. If you have an object map , Then the depth is 2, And so on . The default value is 20.

15、ES   Indexes Mapping nested Default support size ?

  • Applicable scenario :nested Type selection .

  • Parameters :

(1)index.mapping.nested_fields.limit

The maximum support of an index is nested Number of types

(2)index.mapping.nested_objects.limit

One nested The maximum number of objects supported by a type

  • Parameter type : Dynamic parameters ( Verified )

  • The default value is :

(1)index.mapping.nested_fields.limit : 50

(2)index.mapping.nested_objects.limit : 10000

  • Use advice :

(1)nested The possible performance problems of are not to be underestimated .

nested The essence : Each nested object is indexed as a separate Lucene file . If we include 100 A single document of user objects is indexed , Will create 101 individual Lucene file .

(2) nested a The difference between father and son documents :

If subdocuments are updated frequently , It is recommended to use the parent-child document .

If subdocuments are not updated frequently , Frequent queries suggest  nested type .

16、ES Index dynamic Mapping Under the condition of , The default string matching is ?

  • Applicable scenario : Don't set it in advance Mapping Precise field scene .

  • Default type :text + keyword type .

  • Examples of actual combat are as follows :

{
  "my_index_0001" : {
    "mappings" : {
      "properties" : {
        "cont" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}
  • Practical suggestions : It is suggested to combine with business needs , Set up accurately in advance Mapping, And optimize data modeling .

17、ES The default scoring mechanism is ?

  • The default value is :BM 25

  • Unless the business needs , Otherwise, it is not recommended to modify .https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html

18、ES keyword What is the default number of characters supported by type ?

1)ES5.X After the version ,keyword The maximum length supported is 32766 individual UTF-8 character ,text There is no limit to the length of characters .

2) Set up ignore_above after , Data beyond a given length will not be indexed , Unable to get term Exact matching search results return .

https://blog.csdn.net/laoyang360/article/details/78207980

19、 Why do you say ,ES Alias is not applicable by default , Not a beginner ES?

One sentence summary : Alias can be zero downtime transformation ( Classic technique , Seamless switching ).https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-aliases.html

20、ES Cluster node default attribute value ?

  • Default : Candidate master 、 Data nodes 、Ingest node 、 Coordinate nodes 、 Machine learning node ( If you pay ) Role .

  • Suggest : When the cluster size reaches a certain level , Be sure to set up a dedicated master node independently 、 Coordinate nodes 、 Data nodes . The roles are clearly defined .

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

21、ES The default node requested by the client is ?

  • If the coordination node is not explicitly specified , The default requested node acts as the coordinator node .

  • Each node is implicitly a coordination node . Coordinate nodes : You need to have enough memory and CPU To deal with the collection phase .

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

22、ES Default participator ?

  • Applicable scenario : Scenarios where the word breaker is not explicitly specified .

  • Default type :analyzer Word segmentation is .

  • Examples of actual combat are as follows :

POST /_analyze
{
  "text": " Standing in the forest of the East ",
  "analyzer": "standard"
}

Segmentation result :

 Yi 
 state 
 stay 
 In the east 
 Fang 
 And 
 Lin 
  • Practical advice :_analyze API It plays an important role in solving the problem of word segmentation !

23、ES Aggregate default UTC Time , Can it be modified ?

  • You can aggregate and modify , Set time zone time_zone Can solve .

  • "+08:00": For East 8 District .

GET my_index/_search?size=0
{
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field":     "date",
        "calendar_interval":  "day",
        "time_zone": "+08:00"
      }
    }
  }
}

24、ES Default heap memory size ?

  • The default value is :2gB, It is suggested to modify it in combination with the actual machine environment .

  • ES Independent machine deployment environment is recommended , Not with other processes : Such as logstash,hadoop,redis Sharing machine resources .

  • JVM Set up suggestions :min(31GB, Half of the machine's memory )

25、ES JDK What version begins to default with its own ?

7.0 edition .7.0 After the release, the default binding is started JDK( The installation bag comes with JDK), So we don't have to install it alone JDK.

Summary

Nothing else , I'm just curious !

Large and comprehensive suggestions refer to official documents , Above 25 The default values are all dead beat Elasticsearch The practical problems mentioned in the communication group .

You are confused with the default value in your actual combat , Welcome to leave a message .

Indexes ( dynamic / static state ) Set reference :

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

China passes Elastic The circle with the largest number of certifiers !

版权声明
本文为[Mingyi world]所创,转载请带上原文链接,感谢

Scroll to Top