The content of this article comes from the dead beat Elasticsearch An assignment on knowledge planet ：
Elasticsearch What are the basic but very important functions ？
0, Safety is more important than streaking ！
1, Templates template Than mapping important .
2, Explicit mapping strict mapping Biimplicit mapping important ！
3, Nicknames matter ！
4, It's more important to combine business selection and even custom word segmentation than to use default ！
Please leave a message and write down your thoughts .
Yes 20 Multiplayer participation , We're going to frame ourselves 、 Development 、 The core problems and suggestions encountered in the actual operation and maintenance are all written out in a flash .
I did an extended comb , Believe in architecture 、 Development 、 Operation and maintenance have certain help ！
1、 Cluster planning level
Pay attention to evaluate the node's hard disk space .
combination esrally Third party tools are used to evaluate the write of cluster resources 、 Index such as the throughput of the search .
Reasonably configure the partition number of each index .
2、 Data preprocessing level
Data into Elasticsearch Before cleaning .
Elasticsearch Good at retrieval and uncomplicated aggregation , Other live for relational databases or third-party big data open source libraries, such as ：clickhouse etc. .
3、 Data modeling level
Compared to the strict model , I prefer dynamic mapping, Type mapping by prefix of field name , Since using this set of rules , Field conflicts kibana The problem of being unable to make reports has been swept away , It's really not too fragrant .
Whether you need to score , Do you need to sort 、 polymerization 、 Filter , If not, then (doc_values(dvm、dvd) norm(nvd、nvm)) Properties need to be closed and so on .
Templates template Than mapping More flexible , It is recommended to use dynamic templates in combination with aliases , In particular, the daily increment of data volume is huge business scenarios .
The fields are very specific and fixed 、 And no new fields will be added in the future , consider mapping Set when creating ："dynamic": "strict", To strictly control Mapping flood .
Combine business to choose word segmentation, even custom word segmentation .
4、 Retrieval level
If you need to consider the optimization of query speed , And the sorting field is basically fixed , You can consider putting indexSort Deserve to go up , Query will be interrupted in advance .
indexSort Can effectively avoid global scanning by pre sorting , Interrupt query ahead of time , Improve query performance , For queries, sort by a column （ Note that it's not suitable for correlation ranking ） The scene is very suitable .
Query according to the actual business considerations , It's better to put Wildcard Fuzzy query 、*.* And so on will cause the large amount of data query to be limited .
Limit limit +offset, Limit query_string Wait for the length of the text query , Limit term length , Keep an eye on the slow query log .es It's powerful , But it depends on how you use , You never know how to tune your interface …
5、 Hardware resource level
5.1 Disk level
Whether the disk size is sufficient , The compression format uses the default speed Compression? still Best Compression?
5.2 Memory level
By default NIOFS still MMAP, use MMAP What needs to be pre cached out of the heap .
6、 Cluster management level
Remember to configure the delay slicing index.unassigned.node_left.delayed_timeout.
refresh、flush The time is adjusted according to the actual business needs .
The more comprehensive the performance monitoring of the cluster, the better , Discover slow queries in time , Evaluate usage as comprehensively as possible based on the business , And can discover and upgrade configuration in bottleneck period .
Multi node cluster , Reasonable division of node roles , Especially to separate ： Master node 、 Data nodes 、 Coordinate nodes .
7、 Safety and disaster recovery
It's more important to disable bulk delete index than default random delete .
Periodic or incremental backup is more important than no backup ( If conditions permit ).
Security is a must , We encrypt the core fields when the log is clear ,elk The entire technology stack only allows intranet access , The external service interface should also be soft token Of .
take ES For business R & D to use , What's more, we need to consider control rights , Lower the threshold , It is best to encapsulate a layer of network management for business research and development , And then go and share more training , Improve business side R & D to ES The cognitive .
8、 Performance optimization level
Shutdown system swap.
If you have a lot of data , Use as much as possible bulk The batch operation .
（1） Writing level bulk operation , Including but not limited to ：bulk API Perform bulk write 、 to update 、 Delete multiple document operation .
（2） Retrieval level bulk operation , Including but not limited to ：Multi Get（mget）, Scorll, MultiSearch.
It is recommended to open the slow query log according to the earlier setting of business requirements .
Heap memory size should not exceed 32GB.
Use script Script time , Consider the possible slowness 、 Safety risk （ Early versions ） And other negative effects .
under certain conditions , Enforce a forced merger segment, The query speed will be improved a lot .
From all levels, it enumerates what we care about and often ignore in actual combat Tips, Don't ask for perfection , But to be useful .
because Tips It involves a lot of content , There are no points to unfold . For more details, please leave a message .
We have better in actual combat Tips, Also welcome to leave a message to exchange .
thank ： The players are fighting Tips Discuss .
China passes Elastic Circle with the largest number of certified engineers ！
Learn more dry goods in less time and faster ！