编程知识 cdmana.com

Please read this article first!

. :

Elasticsearch It's been studying for a while , Now we will Elasticsearch Related core knowledge 、 The principle is known from beginners 、 The angle of learning , From the following 9 To sort out in detail . Welcome to discuss ……

0. Go on the road with questions ——ES How did it come about ?

(1) reflection : How to retrieve large-scale data ?

Such as : When the amount of data in the system goes up 10 Billion 、100 When it's 100 million , When we do the system architecture, we usually consider the problem from the following angles :
1) What kind of database to use ?(mysql、sybase、oracle、 Reach a dream 、 Supernatural power 、mongodb、hbase…)
2) How to solve a single point of failure ;(lvs、F5、A10、Zookeep、MQ)
3) How to ensure data security ;( Hot standby 、 Cold standby 、 Different live )
4) How to solve the retrieval problem ;( Database agent middleware :mysql-proxy、Cobar、MaxScale etc. ;)
5) How to solve the problem of statistical analysis ;( offline 、 Near real time )

(2) Solutions for traditional databases

For relational data , We usually use the following or similar architecture to solve the query bottleneck and write bottleneck :
Solve the point :
1) Master slave data security is solved by backup ;
2) Through the database agent middleware heartbeat monitoring , Solve the single point of failure problem ;
3) The query statements are distributed to each through the proxy middleware slave Query by node , And summarize the results
 Picture description here

(3) The solution of non relational database

about Nosql database , With mongodb For example , Other principles are similar :
Solve the point :
1) Ensure data security through replica backup ;
2) Through the node election mechanism to solve the single point problem ;
3) First retrieve the fragment information from the configuration library , The request is then distributed to each node , Finally, the routing nodes merge and summarize the results
 Picture description here

To open or find a new path or snap course —— How about putting all the data into memory ?

We know , It's unreliable to put data completely in memory , It's not really realistic either , When our data reaches PB When level , According to each node 96G Memory computing , When the memory is full of data , What machines do we need :1PB=1024T=1048576G
Number of nodes =1048576/96=10922 individual
actually , Considering data backup , The number of nodes is usually in the 2.5 Around ten thousand . The huge cost determines its unreality !

From the previous discussion, we learned that , Put the data in memory , It's better not to put it in memory , Can't solve the problem completely .
All in memory speed problem is solved , But the cost is up .
To solve the above problems , Analysis from the source , We usually look for ways from the following :
1、 Store data in order ;
2、 Separate data from index ;
3、 compressed data ;
This leads to Elasticsearch.

1. ES All in one

1.1 ES Definition

ES=elaticsearch Abbreviation , Elasticsearch It is an open source and highly extensible distributed full-text retrieval engine , It can store almost in real time 、 Retrieving data ; It's very extensible , It can be extended to hundreds of servers , Handle PB Level of data .
Elasticsearch Also used Java Develop and use Lucene As its core to achieve all index and search functions , But its purpose is through simple RESTful API To hide Lucene Complexity , So that full-text search becomes simple .

1.2 Lucene And ES Relationship ?

1)Lucene It's just a library . Want to use it , You have to use Java As a development language and integrate it directly into your application , What's worse is ,Lucene Very complicated , You need to learn more about retrieval to understand how it works .

2)Elasticsearch Also used Java Develop and use Lucene As its core to achieve all index and search functions , But its purpose is through simple RESTful API To hide Lucene Complexity , So that full-text search becomes simple .

1.3 ES The main solution is :

1) Retrieve relevant data ;
2) Return Statistics ;
3) Speed up .

1.4 ES working principle

When ElasticSearch After the node of , It will use multicast (multicast)( Or unicast , If the user changes the configuration ) Look for other nodes in the cluster , And connect with it . This process is shown in the figure below :
 Picture description here

1.5 ES The core concept

1)Cluster: colony .

ES Can be used as a separate single search server . however , In order to process large data sets , Achieve fault tolerance and high availability ,ES It can run on many cooperative servers . These clusters are called collections of servers .

2)Node: node .

Each server that forms a cluster is called a node .

3)Shard: Fragmentation .

When there's a lot of documentation , Due to memory limitations 、 Insufficient disk processing capacity 、 Unable to respond to client requests quickly enough , One node may not be enough . In this case , Data can be divided into smaller pieces . Each slice is placed on a different server .
When the index of your query is distributed over multiple slices ,ES The query is sent to each relevant slice , And put the results together , And the application doesn't know the existence of shards . namely : This process is transparent to users .

4)Replia: copy .

To improve query throughput or achieve high availability , You can use sharded copies .
A copy is a piecemeal exact copy , Each slice can have zero or more copies .ES There can be many of the same shards in , One of them is selected to change the index operation , This special slice is called the main slice .
When the primary slice is lost , Such as : When the data of the partition is not available , The cluster promotes the replica to the new primary partition .

Full text search is to index an article , You can search by keyword , Be similar to mysql Inside like sentence .
Full text index is to divide the content according to the meaning of words , Then create the index separately , for example ” What's your passion for ” May be segmented into :“ You “,” passion “,“ What thing “,” Come on “ etc. token, So when you search for “ You ” perhaps “ passion ” Will find out this sentence .

1.6 ES The main concepts of data architecture ( And relational databases Mysql contrast )

 Picture description here
(1) Database in relational database (DataBase), Equivalent to ES Index in (Index)
(2) There's... Under a database N A watch (Table), Equivalent to 1 An index Index There is N Multiple types (Type),
(3) A database table (Table) The data below consists of multiple lines (ROW) Multiple columns (column, attribute ) form , Equivalent to 1 individual Type By multiple documents (Document) And many Field form .
(4) In a relational database ,schema Defines the table 、 The fields of each table , There is also the relationship between tables and fields . Corresponding , stay ES in :Mapping Define the... Under the index Type Field processing rules of , That is, how the index is built 、 Index type 、 Whether to save the original index JSON file 、 Whether to compress the original JSON file 、 Whether word segmentation is needed 、 How to deal with word segmentation .
(5) Add to the database insert、 Delete delete、 Change update、 check search The operation is equivalent to ES In addition to PUT/POST、 Delete Delete、 Change _update、 check GET.

1.7 ELK What is it? ?

ELK=elasticsearch+Logstash+kibana
elasticsearch: Background distributed storage and full-text retrieval
logstash: Log processing 、“ hamal ”
kibana: Data visualization .
ELK The architecture is data distributed storage 、 Visual query and log parsing create a powerful management chain . The three cooperate with each other , Learn from others' strong points and close the gap , Work together to complete distributed big data processing .

2. ES Features and advantages

1) Distributed real-time file storage , Each field can be indexed , Make it retrievable .
2) Distributed search engine for real-time analysis .
Distributed : The index is partitioned into multiple partitions , Each fragment can have zero or more copies . Each data node in the cluster can host one or more slices , And coordinate and handle various operations ;
In most cases, rerouting and load balancing are done automatically .
3) It can be extended to hundreds of servers , Handle PB Level of structured or unstructured data . It can also run on a single computer PC On ( Tested )
4) Support plug-in mechanism , Word segmentation plugin 、 Sync plugin 、Hadoop plug-in unit 、 Visual plug-ins, etc .

3、ES performance

3.1 Performance results display

(1) hardware configuration :
CPU 16 nucleus AuthenticAMD
Memory Total amount :32GB
Hard disk Total amount :500GB Not SSD

(2) On the basis of the above hardware indicators, the test performance is as follows :
1) Average index throughput : 12307docs/s( Size per document :40B/docs)
2) Average CPU Usage rate : 887.7%(16 nucleus , Average per core :55.48%)
3) Build index size : 3.30111 GB
4) Total writes : 20.2123 GB
5) Total test time : 28m 54s.

3.2 performance esrally Tools ( recommend )

The use of reference :http://blog.csdn.net/laoyang360/article/details/52155481

4、 Why use ES?

4.1 ES Excellent cases at home and abroad

1) 2013 Beginning of the year ,GitHub Abandoned Solr, take ElasticSearch To do it PB Level search . “GitHub Use ElasticSearch Search for 20TB The data of , Include 13 Billion documents and 1300 One hundred million lines of code ”.

2) Wikipedia : Start with elasticsearch Core search architecture based on .
3)SoundCloud:“SoundCloud Use ElasticSearch by 1.8 Billion users provide instant and accurate music search services ”.
4) Baidu : Baidu is now widely used ElasticSearch As text data analysis , Collect all kinds of index data and user-defined data on all Baidu servers , Multi dimensional analysis and display of various data , Auxiliary location analysis instance exception or business level exception . At present, Baidu's internal coverage 20 Multiple lines of business ( Include casio、 Cloud analysis 、 Net alliance 、 forecast 、 library 、 Baidu Zhida 、 wallet 、 Risk control, etc ), Single cluster is the largest 100 Taiwan machine ,200 individual ES node , Import... Every day 30TB+ data .

4.2 We also need

Actual project development , Almost every system has a search function , When the search reaches a certain level , Maintenance and expansion will become more difficult , A lot of companies search independently , use ElasticSearch And so on to achieve .

In recent years, ElasticSearch Rapid development , Has gone beyond its original pure search engine role , Data aggregation analysis has now been added (aggregation) And the characteristics of visualization , If you have millions of documents that need to be located by keywords ,ElasticSearch It must be the best choice . Of course , If your document is JSON Of , You can also ElasticSearch As a kind of “NoSQL database ”, application ElasticSearch Data aggregation analysis (aggregation) Characteristics of , Multi dimensional analysis of data .

【 You know : Pan Fei, architect of Reku 】ES Replacing tradition in some scenarios DB
Personally think Elasticsearch It's good for internal storage , The efficiency is basically satisfied , Replacing tradition in some ways DB It's OK, too , The premise is that your business does not have special requirements for operational matters ; And the authority management need not be so detailed , because ES This is not perfect .
Because we are right ES The application scenario of is only to aggregate data in a certain period of time , No large number of single document requests ( Such as through userid To find a user's document , Be similar to NoSQL Application scenarios of ), So can it be replaced NoSQL You need your own tests .
If I had a choice , I'll try to use it ES To replace the traditional NoSQL, Because its scale out mechanism is so convenient .

5. ES What is the application scenario of ?

Usually we have two problems :

1) Try to use the new system development ES As a storage and retrieval server ;
2) The existing system upgrade needs to support full-text retrieval services , Need to use ES.
The use of the above two architectures , The following links elaborate .
http://blog.csdn.net/laoyang360/article/details/52227541

First tier company ES Use scenarios :

1) Sina ES How to analyze and handle 32 100 million real-time logs http://dockone.io/article/505
2) Ali ES Build your own log collection and analysis system http://afoo.me/columns/tec/logging-platform-spec.html
3) I like it ES Business log processing http://tech.youzan.com/you-zan-tong-ri-zhi-ping-tai-chu-tan/
4)ES Realize on-site search http://www.wtoutiao.com/p/13bkqiZ.html

6. How to deploy ES?

6.1 ES Deploy ( No installation required )

1) Zero configuration , Open the box
2) No cumbersome installation and configuration
3)java Version for : The minimum 1.7
I use 1.8
[root@laoyang config_lhy]# echo $JAVA_HOME
/opt/jdk1.8.0_91
4) Download address :
https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/zip/elasticsearch/2.3.5/elasticsearch-2.3.5.zip
5) start-up
cd /usr/local/elasticsearch-2.3.5
./bin/elasticsearch
bin/elasticsearch -d( Background operation )

6.2 ES Necessary plug-ins

Necessary Head、kibana、IK( Chinese word segmentation )、graph Detailed installation and use of plug-ins .
http://blog.csdn.net/column/details/deep-elasticsearch.html

6.3 ES windows Next button installation

Self writing bat Script implementation windows Next button installation .
1) A key to install ES And necessary plug-ins (head、kibana、IK、logstash etc. )
2) Run as a service after installation ES.
3) Save at least than yourself 2 Hour time , Very efficient .
Script description :
http://blog.csdn.net/laoyang360/article/details/51900235

7. ES External interface ( Developer focus )

1)JAVA API Interface

http://www.ibm.com/developerworks/library/j-use-elasticsearch-java-apps/index.html

2)RESTful API Interface

Common increase 、 Delete 、 Change 、 Implementation of search operation :
http://blog.csdn.net/laoyang360/article/details/51931981

8.ES How to deal with problems ?

1) Abroad :https://discuss.elastic.co/
2) At home :http://elasticsearch.cn/

Reference resources :

[1] http://www.tuicool.com/articles/7fueUbb
[2] http://zhaoyanblog.com/archives/495.html
[3]《Elasticsearch Server development 》
[4]《 actual combat Elasticsearch、Logstash、Kibana》
[5]《Elasticsearch In Action》
[6]《 some ES Daniel PPT》

9、 anything else ?

《 screwing Elasticsearch methodology 》: The average programmer is efficient and sophisticated 10 Big trick !( Free full version )
https://blog.csdn.net/laoyang360/article/details/79293493
——————————————————————————————————
more ES Experience sharing of dry goods in actual combat , Please scan below 【 Mingyi world 】 WeChat official account for two-dimensional code .
( Update at least one article per week !)

 Picture description here
Be with you , screwing Elasticsearch
——————————————————————————————————

2016-08-18 21:10 Thinking in front of bed at home

author : Mingyi world
Reprint please indicate the source , Original address :
http://blog.csdn.net/laoyang360/article/details/52244917
If you feel this article is helpful , Please click on ‘ The top ’ support , Your support is my biggest motivation for writing , thank you !

版权声明
本文为[Mingyi world]所创,转载请带上原文链接,感谢

Tags read article
Scroll to Top