编程知识 cdmana.com

Elasticsearch top x key indicators you have to focus on

0、 .

  • In the context of writing heavy business , Have you ever met Elasticsearch Cluster performance issues ?

  • Have you ever met Elasticsearch Speed limitation of data indexing ?

  • Have you ever experienced a delay in searching that takes too long to execute ?

  • Have you ever encountered Elasticsearch Cluster troubleshooting challenges ?

  • Are you trying to improve... With zero downtime Elasticsearch The stability of clusters ?

  • Have you ever thought about looking at it from a monitoring point of view Elasticsearch Key indicators ?

If your answer to any of the above questions is “ yes ”, So this article is for you .

I'm going to talk about troubleshooting and troubleshooting Elasticsearch Experience with performance issues .

By the end of this article , You should have a good understanding of the key indicators , So that when you meet Elasticsearch Monitor the performance or operation of the cluster .

that , Watch Elasticsearch Top X What are the indicators ? This article reveals the answer .

1、 Cluster configuration

Elasticsearch It's a distributed search engine , It can realize fast data indexing and good search performance .

Out of the box Elasticsearch Configuration can satisfy N Multiple business scenarios . however , If you want the best performance , Then understand the index and search requirements and make sure that the cluster configuration meets Elasticsearch Best practices are critical .

Elasticsearch It's built on the scale of the business , Having the best configuration ensures better cluster performance .Elasticsearch Clusters can be broken down into measurable elements , You can think of a node as running Elasticsearch Process machines . The index itself can be seen as a complete search engine , Consisting of one or more segments . You can visualize a slice as Apache Lucene A single instance of , This instance holds documents for indexing and searching , And these documents are evenly distributed among the slices .

The graph is : A three node cluster , The index is divided into six slices

Slicing can improve intake (ingest) And search performance , But too many slices can slow down the speed . Proper fragmentation strategy is very important for cluster . It is recommended that the size of a single slice be set at 30-50 GB  Between .

The number of partitions is higher than the number of nodes, which makes it easy to expand the cluster . But over allocation of shards can slow down search operations , It's because the search is first in query The phase request needs to hit every fragment in the index , And then execute fetch Stages capture and aggregate results .

If your slice only holds 5 GB data , It can be considered as underutilized .

Before we N Multiple articles All mentioned , A simple view of the overall health status of the cluster API as follows :

GET _cluster/health

The results are as follows :

{
  "cluster_name" : "my-application",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 11,
  "active_shards" : 11,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 5,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 68.75
}

One of the questions developers often ask is :“ What is the optimal number of partitions that should be configured for a cluster ? To optimize the performance of the whole cluster .” Here's what I'm going to say , This problem No, A consistent answer . It depends on your business requirements and what you want to meet SLA( Guarantee of availability of website services ).

The following statistics will help you make the right capacity planning decisions , Including but not limited to :

  • The number of documents that need to be indexed per second

  • Single document size

  • Queries per second

  • The growth pattern of data sets

Benchmarking with a small amount of data can help you make the right decisions ( Focus on ).

It is highly recommended that you understand your data and index 、 Search requirements , To create a balanced and efficient cluster .

2、 Total available memory size

If your Elasticsearch The cluster node is out of disk space , The cluster performance will be affected .

Once the available storage space falls below a certain threshold limit , It will start blocking write operations , And then affect the data into the cluster .

Many students may encounter the following mistakes :

ElasticsearchStatusException[Elasticsearch exception [type=cluster_block_exception, reason=blocked by: [FORBIDDEN/12/index read-only / allow

This is the protection mechanism prompt when the disk is nearly full .

Again , Three default warning watermarks for disks .

  • Low warning water level

cluster.routing.allocation.disk.watermark.low 

Default to disk capacity 85%.Elasticsearch No new partitions will be assigned to the disk, and the utilization rate exceeds 85% The node of . It can also be set to absolute byte value ( Such as 500mb), To prevent Elasticsearch Allocate shards when less than a specified amount of free space . This setting does not affect the primary slice of the newly created index , Especially the pieces that have never been allocated before .

  • High warning water level

cluster.routing.allocation.disk.watermark.high 

Default to disk capacity 90%.Elasticsearch Will try to use more disk than 90% The node of the partition is redistributed ( Transfer the data of the current node to other nodes ). It can also be set to absolute byte value , In order to reallocate a node from a node when it is less than the specified amount of free space . This setting will affect the allocation of all tiles , Whether previously assigned or not .

  • Flood warning water level

cluster.routing.allocation.disk.watermark.flood_stage 

Default to disk capacity 95%.Elasticsearch Enforce read-only index blocks for each index (index.blocks.read_only_allow_delete). This is the last resort to prevent nodes from running out of disk space . Read only mode when disk space is sufficient , It needs to be removed manually .

therefore , It is important to monitor the available storage space in the cluster .

3、 Deleted documents

Elasticsearch The document in cannot be modified , And it's immutable (immutable).Elasticsearch Deleting or updating a document first marks the document as deleted ( Logical deletion ), It will not be immediately removed from Elasticsearch In physics delete . As you continue to index more data , These documents will be cleaned up in the background . A logically deleted document is not visible during a search operation , But they continue to take up disk space .

  • If disk space becomes a bottleneck , The segment merge operation can be enforced . Segment merging will merge small segments into large ones and clean up deleted documents .

POST my_index/_forcemerge
  • If you pass reindex Re index the document to a new index , You can delete the old index (delete index), Deleting the index will physically delete the document .

  • If your index is updated regularly , The number of documents to be deleted will be large .

therefore , It's best to develop appropriate policies to clean up logically deleted documents before disk space bottlenecks occur .

4、 Master node index

In the production environment , I suggest that you Elasticsearch A dedicated master node is configured in the cluster .

  • The master node manages activities by monitoring the cluster ( for example : Track all nodes in the cluster 、 Index and slice ) To improve the stability of the cluster .

  • The master node also monitors the health of the cluster , To ensure that the data nodes are not overloaded , And make the cluster fault tolerant .

Another suggestion is : For the scenario of large cluster size , It is recommended that there be at least three master nodes . This ensures that during a fault event , The necessary arbitration is in place , You can select a new master node in the cluster .

You can check the master node's CPU / Memory utilization and JVM The memory usage percentage determines the configuration of the master instance .

Here are :cerebro monitor Screenshot .

Generally speaking , Because the master node focuses on the cluster state , So it usually needs to have a lower CPU / Memory resources of the computer .

5、 Data node indicators

Data node hosting Elasticsearch The cluster contains fragments of index documents . Data nodes also perform all data operations related to search and aggregation , And handle client requests .

Compared with the master node , Data nodes need to have higher CPU / Memory resource server .

If your cluster does not have a dedicated master node , Then one of the data nodes will start to act as the master . This will cause CPU and JVM The imbalance of use .

Document addition 、 Delete 、 Change 、 Search and search operations take up a lot of CPU and IO, Therefore, it is important to monitor the data node utilization index .

from CPU / From the memory point of view , You should make sure that the data nodes are balanced and not overloaded .

6、 Data write performance metrics

If you try to write a large number of documents to Elasticsearch in , You can monitor data write latency and data indexing rate indicators , To verify that the index throughput meets the needs of the enterprise .

There are several ways to increase the speed of data writing . It can be summarized as the following four measures :

6.1 bulk Batch operations or multithreaded writes

utilize Elasticsearch The batch offered API(bulk) To index a batch of documents at the same time . You can also use multithreading to write Elasticsearch To maximize the use of all cluster resources .

Please note that , Document size and cluster configuration may affect data writing speed . In order to find the best throughput of the cluster , You need to run performance tests and try to use different batch sizes and concurrent thread value sizes .

6.2 Adjust the refresh rate reasonably

Elasticsearch refresh Refresh is the process of making documents searchable .

By default , Refresh once per second . If the main goal is to adjust the index of uptake rate , Then you can put Elasticsearch The default refresh interval is from 1 Second changed to 30 second .30 Seconds later , This will make the document visible for searching , To optimize the indexing speed .

Update the refresh rate of the specified index , The implementation is as follows :

PUT my_index/_settings
{
  "index": {
    "refresh_interval": "30s"
  }
}

In business scenarios where writing is heavy or indexing speed is more critical than search performance , It could be a good practice .

6.3 Dynamically size copies before and after writing

The replica can improve the high availability of the cluster, and as the backup of the primary partition data, it can prevent data loss to some extent , But there are corresponding costs .

During initial data loading , You can disable replicas for higher index write speeds .

PUT my_index/_settings
{
  "index": {
    "number_of_replicas": 0
  }
}

To ensure the high availability of the cluster , Once the initial load is complete , You can re enable the replica .

6.4 Reasonable data modeling

Don't index fields that don't need to be searched at the business level . Storage space can be saved by not indexing redundant fields ( give an example : Set up index:false).

The following example , Can be cont Field index Property value set to false, such ,cont Fields will not be searched .

PUT blog_index
{
  "mappings": {
    "properties": {
      "cont": {
        "type": "text",
        "index": false
      }
    }
  }
}

If you insist on searching cont Data on the field , Will report an error and return the following :

  "reason": "Cannot search on field [cont] since it is not indexed."

A picture is worth a thousand words , When modeling, go through the following figure carefully .

therefore , It is strongly recommended that you use the actual business scenario , To minimize storage 、 To maximize the cluster write and search performance as the premise of reasonable data modeling 、 Reasonable setting Mapping The type of each field in the .

recommend :

On Elasticsearch The importance of data modeling

Elasticsearch In depth interpretation of internal data structure

7、 Data search performance metrics

Elasticsearch The search request in will be sent to all tiles in the index ( The master or the copy ). then , The node that receives the request will aggregate the results of all Shards , And return the result to the calling application .

Fragmentation will consume CPU / Memory resources . therefore , If there are too many slices , May reduce query performance .

If the cluster updates frequently , May affect the search SLA. By properly configuring and horizontally extending the cluster , It can improve the performance of data writing and cluster retrieval .

7.1 Use filtering to limit the number of returned documents

Based on my experience in search performance tuning , It is strongly recommended that you add appropriate filters (filters) To limit the number of documents returned from a search query . After applying the filter , Score only for a limited set of documents , This will improve query performance .

You should also monitor search latency and search rate metrics , To investigate performance issues related to search capabilities .

The core of this is : Don't always return full data , It's about taking advantage of inverted indexing and searching , According to the business scenario, the data that meets the given conditions is minimized .

7.2 Enable slow query log

I suggest that you Elasticsearch Enable slow query log in the cluster , To address performance issues and capture queries that run longer or exceed set thresholds .

for example , If your search SLA by 2 second , You can configure the search query as follows , Any query that exceeds this threshold will be logged .

PUT my_index/_settings
{
    "index.search.slowlog.threshold.query.warn" : "2s"
}

8、 Summary

In this paper , This paper introduces some optimizations from the perspective of search and index Elasticsearch The most critical indicator of performance . To sum up , The key points are as follows :

  • There are dedicated master nodes and data nodes in the cluster , To ensure the best cluster performance .

  • By adding data nodes to the cluster and increasing the number of replica partitions to improve the high availability of the cluster .

  • The search query must hit each fragment ( Main slice or copy slice ), Therefore, too much fragmentation will slow down the search .

  • Slow query and index logs can be used to solve search and index performance problems .

  • Make sure your Elasticsearch Clusters are slicing 、 The number of data nodes and master nodes is reasonable and correct .

  • By using batch requests 、 Use multithreading to write and scale the cluster horizontally to optimize Elasticsearch Index performance .

notes : This is a translation article , Original address :

https://iamondemand.com/blog/top-5-elasticsearch-metrics-to-monitor/ 

I have expanded and perfected it in combination with my business practice .

Recommended reading

  1. Elasticsearch Segment merge , This article is thorough  

  2. Elasticsearch The underlying logic of cluster size and capacity planning

  3. Elasticsearch Advanced tuning methodology —— Cure slow query !

  4. Elasticsearch Performance optimization guide

  5. Give Way Elasticsearch Fly up !—— Performance Optimization Practice dry goods

more short time more Learn quickly more More dry !

China  40%+ Elastic Certified engineers come from !

And global 800+ Elastic Enthusiasts fight together Elasticsearch!

版权声明
本文为[Mingyi world]所创,转载请带上原文链接,感谢

Scroll to Top