0、 A thing for it
Hello! , At the moment I need to index it read-only segment Merge , I have a few questions to ask for
1、 segment It's not the best combination , And max_num_segments=1
2、 When merging , adopt
Will it eat up all the machine resources , The service is temporarily unavailable (optimize?max_num_segments=1 It will eat up all the resources ), But I didn't find it in the official documents _forcemerger Resource consumption in this way .
3、 stay es 6.7 And above index.merge Do you need to pay special attention to and adjust the relevant parameters ？
（ At present, I use all the default values ）
screwing Elasticsearch Knowledge of the planet http://t.cn/RmwM3N9
This involves the basic concept , To ensure the accuracy of the statement , Official links and other source addresses will be posted below , In order to have a deeper understanding of the relevant knowledge of good segment merging .
1、 What is a paragraph ？
picture source ：https://medium.com/@hansrajchoudhary_88463/
As shown in the figure , Look from the top down ,
A cluster contains 1 Nodes or nodes ;
A node contains 1 One or more indexes ;
An index ： similar Mysql Database in ;
Each index consists of one or more slices ;
Each slice is a Lucene Index instance , You can think of it as a stand-alone search engine , It can be to Elasticsearch A subset of the data in the cluster is indexed and related queries are processed ;
Each fragment contains more than one segment（ paragraph ）, every last segment It's all an inverted index .
At the time of inquiry , Will take all of segment The query results are summarized and merged into the final fragment query results .
2、 Why? Segments are immutable ？
stay lucene in , In order to achieve high indexing speed , So it is used. segment Segmented architecture storage .
A batch of written data is stored in a segment , Each segment is a single file on disk .
Because the file operation between two writes is very heavy , So make a segment immutable , So that all subsequent writes go to New paragraph .
3、 What is segment merging ？
Because the auto refresh process creates a new segment every second （ By dynamically configuring parameters ：refresh_interval decision ）, This will lead to a sudden increase in the number of segments .
And too many segments can cause a lot of trouble .
Consumption of resources ： Each segment consumes a file handle 、 Memory and cpu Operation cycle ;
Search slows down ： Each search request must take turns checking each segment ; So the more segments , The slower the search .
Elasticsearch Solve this problem by merging segments in the background .
Small segments are merged into large segments , Then these large segments are combined into larger segments .
4、 What does segment merging do ？
Segment merging will remove old deleted documents from the file system .
Deleted documents （ Or the old version of the document being updated ） It will not be copied to a new large segment .
You don't have to do anything to start merging . Indexing and searching will be done automatically .
When indexing , Refresh （refresh） The operation creates a new segment and opens it for search .
The merge process selects a small number of segments of similar size , And merge them into larger segments in the background . It doesn't interrupt indexing and searching .
5、 Why merge segments ？
The more number of index segments , The lower the search performance and the more memory it consumes .
Index segments are immutable , You can't physically delete information from it .
You can physically delete document, But it's just marked for deletion , There's no physical deletion .
When segments merge , The documents marked for deletion are not copied into the new index segment , such , Reduce the... In the final index segment document number .
6、 What are the benefits of segment merging ？
Reduce the number of index segments and improve the retrieval speed ;
Reduce the capacity of the index （ Number of documents ）
reason ： Segment merging removes those documents that are marked as deleted .
7、 Possible problems with segment merging ？
disk IO The cost of operation ;
In slow systems , Segment merging can significantly affect performance .
8、 About the size of the merge segment （ Usually it is 1 individual ）—— Aiming at problems 1
Earlier versions of the documentation have the following instructions ：
optimize API（ Now it's abandoned , Consistent principle ）
optimize API It can be regarded as Force a merger API. It will force a fragment into max_num_segments Parameter specifies the number of segments of the size .
The intention is to reduce the number of segments （ It's usually reduced to one ）, To improve search performance .
9、 About segment merging resource consumption —— Aiming at problems 2
The official interpretation of resource consumption
orce merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.
In a word ： Causes the disk to io Consumption and impact on retrieval performance .
Force merge API
The following is the interpretation of the old version of the document , The principle is consistent, and we can refer to ,api Deprecated .
Please note that , Use optimize API The operation of triggering segment merging is not restricted by any resource .
This may consume all of your nodes I/O resources , Make it not “ rich ” Resources to handle search requests , This may make the cluster unresponsive .
If you want to execute on the index optimize, You need to use shards first （ see Migrate old indexes ） Move the index to a secure node , Re execution .
Yes , It's very resource intensive , It is recommended to operate in non business intensive practices .
My online environment , I was in the early hours of the morning 1 Point segment merge （ Script control , No one operates the system at night ）
10、 Recommended parameters —— Aiming at problems 3
Reduce the frequency of segment generation , modify refresh_inteval: Default 1s, If the timeliness requirement is not high , It is suggested to change it to 30s.
index.merge.scheduler.max_thread_count： according to cpu Audit revision
The reference value of parameter modification in the old version is not great , It is also suggested to have a look at ：
Index performance tips
more short time more Learn quickly more More dry ！
China 40%+Elastic Certified engineers come from ！