We know 「 Master slave replication is the cornerstone of high availability 」, When the slave library is down, the request can still be sent to the master library or other slave libraries , however Master Downtime , Can only respond to read operations , The write request can no longer be executed .
So the master-slave replication architecture faces a serious problem , The main library is down , Unable to execute 「 Write operations 」, Can't automatically select one Slave Switch to a Master, That is, it can't fail over automatically .
Late at night with my girlfriend ……( Omit here 10000 word ), Sudden downtime , You can't lift your pants up from the bed and switch between master and slave by hand , Then inform other programmers to change the address to the new main library online .
After such a toss, I have been switched from my girlfriend to my ex boyfriend , I can't do it . So we have to have a highly available solution , So ,Redis The government provides a highly available solution —— sentry （Sentinel）.
Redis The principle of sentry group
The opening remarks
“ The iteration of technology is very fast , But the thinking precipitated from technology benefits for life . So don't worry about midlife crisis , People who are worried about midlife crisis usually have a hard time growing up . As long as we grow up , As long as our cognition is constantly breaking through , You don't have to worry about midlife crisis , The world always needs those talents . ”
What is a sentry （Sentinel）
“65 Brother ： Margo , Although I don't have a girlfriend , however , Prepare for a rainy day, I want to master this sentinel mode , To prevent me from being disturbed with my girlfriend in the middle of the night , Let's talk about the realization principle of sentry . ”
Three sentries are used to form a cluster , Three data nodes （ One master and two slaves ） Way to build , As shown in the figure below ：
Redis The sentry cluster
65 Brother, you've heard of 「 Wudang sect 」 Founder Zhang San is crazy ？Redis Master slave architecture is like Wudang , It's the leader Master. If the leader hangs up , You need to choose an able person from the seven swordsmen of Wudang to be the leader . This requires a department to monitor the life and death of the leader and the life status of other Wudang disciples , And can vote from Wudang disciples to elect a capable person as the new leader , Then a press conference will be held to announce the new leader's message to the world . This 「 department 」 It's the sentry .
Sentinels will encounter the following problems in electing a new leader ：
- How to judge whether the leader is really dead , It's possible to feign death ;
- Which one of Wudang's children to choose as the new leader ？
- Inform all Wudang disciples about the new leader through the press conference （slave and master） And the whole Wulin （ client ）.
The main task of the sentinel department is ： Monitoring the whole Wudang 、 Choose a new leader , Inform the whole Wudang and the whole Wulin .
The main task of the sentinel mechanism
The sentry is Redis A mode of operation of , It's focused on Redis example （ Master node 、 From the node ） Monitoring the operation status of , When the master node fails, a series of mechanisms can be used to select the master and switch between the master and the slave , Achieve failover , Make sure that the whole Redis Availability of the system . combination Redis Of Official documents :https://redis.io/topics/sentinel, You can know Redis Sentinels have the following capabilities ：
- monitor ： Continuous monitoring master 、slave Whether it is in the expected working state .
- Switch the main library automatically ： When Master Operational failure , Sentinels start the auto recovery process ： from slave Choose one of them as the new master.
- notice ： Give Way slave perform replicaof , With the new master Sync ; And inform the client with the new master Establishing a connection .
Sentinel is also a Redis process , It's just that we don't provide external reading and writing services , Usually, the sentry should be configured as an odd number , Why? ？ And listen to 「 Code byte 」 Analyze slowly .
“65 Brother ： In the end 「 sentry 」 How does this mysterious department realize these three abilities ？ ”
Let's look at the Sentinels from the whole picture , A brief understanding of the whole operation process , Then we will analyze each task in detail . Start with monitoring …...
Sentinel It's just a special department of Wudang disciples , By default ,Sentinel Pass the message to all Wudang disciples once a second through flying pigeons 、 The leader and the sentry （ Include Master、Slave、 other Sentinel , ） send out PING command , If slave Did not respond within the specified time 「 sentry 」 Of PING command ,「 sentry 」 I thought this guy might be belching , He will be recorded as 「 Offline status 」;
If master The leader didn't respond at the specified time 「 sentry 」 Of PING command , The sentry decided that the leader was off the line , Start execution 「 Automatic switch master representative or leader in a certain field 」 The process of .
PING There are two ways to reply to an order ：
- Valid responses ： return +PONG、-LOADING、-MASTERDOWN Any kind of ;
- Invalid response ： A reply other than a valid reply , Or return any reply within a specified time .
“65 Brother ： How do sentinels judge 「 representative or leader in a certain field 」 Hiccups ？ What should I do if the leader swindles the corpse ？ ”
In order to prevent the leader from 「 Feign death 」,「 sentry 」 Designed 「 Subjective offline 」 and 「 Objective offline 」 Two signals .
Sentinels use PING Command to detect the leader 、 slave The state of life . If it's an invalid reply , The sentry marked this guy as 「 Subjective offline 」. It's Wudang boy detected , That is to say slave role . Then mark it directly 「 Subjective offline 」.
because master The leader is still ,slave My belch has little influence on Wudang . It's still open for meetings , Martial arts and swordsmanship 、 Eat and drink hot …...
If it's detected to be master The leader is finished , At this time, the Sentry can't simply mark 「 Subjective offline 」, Open a new leader election .
Because there may be misjudgment , The leader didn't belch , Once the leader switch is activated , Subsequent electors 、 Call for a press conference ,slave Take time with new master Synchronizing data consumes a lot of resources .
therefore 「 sentry 」 To reduce the probability of miscarriage of justice , Miscalculation usually occurs when the cluster network is under great pressure 、 Network congestion , Or when the main reservoir itself is under high pressure .
Since it's easy for a person to misjudge , Let's vote together . The sentry mechanism is similar , The cluster mode composed of multiple instances is adopted for deployment , This is the sentry group . Introduce several sentinel examples to judge together , You can avoid a single sentry because your network is not good , And misjudge that the main database is offline .
meanwhile , The probability of multiple sentinel networks being unstable at the same time is small , They make decisions together , The miscalculation rate can also be reduced .
Judge master There can't be only one 「 sentry 」 The final say , Only half of the Sentinels judged master already 「 Subjective offline 」, Only at this time can master Marked as 「 Objective offline 」, That is to say, it is an objective fact , The leader is really belching , Hua Tuo can't be cured in his second life .
Only master Judged as 「 Objective offline 」, It will further trigger the sentry to start the master-slave switching process .
The difference between subjective offline and objective offline
Simply speaking , Subjective offline is that the sentinel thinks the node is down , And the objective offline is not only the sentinel thinks that the node is down , And after the sentry communicates with other sentries , Up to a certain number of sentinels think it's time for the man to belch .
there 「 A certain amount of 」 It's a legal quantity （Quorum）, It's determined by the sentinel monitoring configuration , Explain the configuration ：
# sentinel monitor <master-name> <master-host> <master-port> <quorum> # Examples are as follows ： sentinel monitor mymaster 127.0.0.1 6379 2
This configuration item is used to tell the sentinel which master node to listen on ：
- sentinel monitor： On behalf of monitoring .
- mymaster： Represents the name of the master node , You can customize .
- 192.168.11.128： Represents the master node of monitoring ip,6379 For port .
- 2： Legal quantity , Represents when only two or more sentinels think the master node is unavailable , That's what makes master Set to objective offline state , Then proceed failover operation .
「 Objective offline 」 The standard is , When there is N A sentinel instance , Want to have N/2 + 1 Let's take an example to judge master by 「 Subjective offline 」, In order to finally determine Master by 「 Objective offline 」, It's more than half the mechanism .
Switch the main library automatically
“65 Brother ： Since judgment master I'm off the line , Then it's time to choose a new leader . ”
「 sentry 」 My second task , Select new master representative or leader in a certain field . You need to choose a new leader from Wudang disciples according to certain rules , After selecting the leader , new master Lead all the disciples to eat and drink together .
According to a certain 「 filter 」 + 「 Scoring 」 Strategy , elect 「 The strongest King 」 As the leader , That is to say, through some conditions of audition filtering some 「 The incompetent 」, Then we will score and rank all the beauties who have passed the audition , Choose the highest as the new master.
As shown in the figure ：
new master choice
It's not a good idea for a pretty guy who is often disconnected from the Internet , Would you , Even if it becomes master, But soon the network broke down , You have to choose a new one master, It's not for fun , We have to rule out ！
“65 Brother ： What are the screening criteria ？ ”
- From the current online state of the library , The offline ones are discarded directly ;
- Evaluate previous network connection status
down-after-milliseconds \* 10： If the slave database is always disconnected from the master database , And the number of disconnection times exceeds a certain threshold （10 Time ）, We have reason to believe that , The network condition of this slave database is not very good , You can sift this out of the library .
Filter out inappropriate slave after , Then enter the scoring link . There are three rules for three rounds of scoring , The rules are ：
- slave priority , adopt slave-priority Configuration item , Set different priorities for different slaves （ There's someone backstage who can't help it ）, Those with higher priority will be promoted directly to new master representative or leader in a certain field .
master_repl_offsetProgress gap （ The closer one's martial arts is to the previous leader's, the more powerful one will be ）, If it's all the same , Let's move on to the next rule . It's just a comparison slave And the old master Copy progress gap ;
- slave runID, With the same priority and replication schedule ,ID The one with the smallest number gets the highest score from the library , Will be selected as the new master library .（ arrange in order of seniority , according to runID To determine when , Early superior ）;
“65 Brother ： Why hold a press conference ？ ”
Re elect a new master Such things as headmaster , What a big deal , How can we not tell the world . What's more slave I also need to know who the new leader is , Follow the new leader to be popular and drink spicy health care together .
The last task ,「 sentry 」 Will be new 「master representative or leader in a certain field 」 The connection information is sent to other slave Wudang disciples , And let slave perform replacaof command , New 「master representative or leader in a certain field 」 Establishing a connection , And copy the data to learn all the martial arts of the new leader .
besides ,「 sentry 」 You also need to inform the whole Wulin of the connection information of the new leader （ client ）, Make everyone want to visit 、 Those who seek advice can find the new leader , In this way, many matters can be handed over to the new leader for decision （ Transfer the read / write request to the new master）.
The main task of the sentry is to achieve the goal
Sentinels carry out tasks and targets
How sentinel clusters work
「 sentry 」 The Department is not alone , Many people work together to form a 「 The sentry cluster 」, Even though there are some 「 sentry 」 I was killed by Lao Wang , Other 「 sentry 」 We can still work together to complete the monitoring 、 New leader election and notice slave 、master And everyone in the Wulin （ client ）.
When deploying sentry clusters , Sentinel configuration is only set up to monitor master IP and port, There is no connection information configured for other sentinels .
sentinel monitor <master-name> <ip> <redis-port> <quorum>
How do sentinels know each other ？ How do you know slave And monitor their ？ By which 「 sentry 」 To perform master-slave switching ？
With these questions , follow 「 Code byte 」 Let's go back to the source together , Deep into the heart of the sentinel cluster .
pub/sub Communication and discovery between sentinels slave
“65 Brother ： How do sentinels know each other ？ ”
Sentinels can communicate with each other, date and do things , Mainly due to Redis Of
pub/sub Release / Subscribe mechanism .
The sentry and master Establish communication , utilize master Provide release / The subscription mechanism publishes its own information , Like height and weight 、 Are you single? 、IP、 port ……
master There is one
__sentinel__:hello A dedicated channel for , Used to publish and subscribe messages between sentinels . It's like
__sentinel__:hello Wechat group , Sentinels use master Set up a wechat group to release their own news , At the same time, follow the news from other sentinels .
Redis pub/sub Mechanism
When multiple sentinel instances have done publish and subscribe operations on the main database , They can know each other's IP Address and port , To discover and connect with each other .
Redis Manage messages separately through channels , The channels here are actually different wechat groups . such as “ Codebyte reader Technology Group ” It's a technology sharing group . Friends can pay attention to the official account , The background to reply “ Add group ”, Growing up together .
“65 Brother ： The Sentinels are connected , But we need to talk to slave Establishing a connection , Otherwise, we can't monitor them , How do you know slave And monitor their ？ ”
You bet , It's not enough to connect sentinels to form a cluster , I need to follow slave Establishing a connection , Or you can't monitor them , Unable to make heartbeat judgment on master-slave Library .
besides , If there is a master-slave switch, you have to notify slave Follow the new master Set up a connection to perform data synchronization . The principle of data synchronization in master-slave architecture can be changed step by step 《Redis High availability ： You call this master-slave architecture data consistency synchronization 》.
The key is to use master To achieve , The sentry turned to master send out
INFO command , master The leader naturally knows what he has salve My little brother's . therefore master After receiving the command , It will be slave The list tells the sentry .
The sentry is based on master Responsive slave List information with every salve Establishing a connection , And continuously monitor the sentry based on this connection .
As shown in the figure , sentry 2 towards Master send out
INFO command ,Master Just put slave The list goes back to the sentinel 2, sentry 2 According to slave List connection information with each slave Establishing a connection , And realize continuous monitoring based on this connection .
The rest of the Sentinels also monitor based on this .
INFO Command acquisition slave Information
Select sentry to switch between master and slave
“65 Brother ：master After belching , There are so many sentinels , Which Sentry is going to carry out the new master Switching ？ ”
It's the sentry's judgment master “ Objective offline ” similar , It was also elected by vote .
Any sentinel judge master “ Subjective offline ” after , Will send to other sentinel friends
is-master-down-by-addr command , Good friends are based on their own master The state of connection between them responds to
Y To vote for ,
N It's against .
If a sentinel gets the majority of sentinels “ Affirmative vote ” after , You can mark master by “ Objective offline ”, The Yes vote is through the sentinel profile quorum Configuration item settings .
sentinel monitor <master-name> <ip> <redis-port> <quorum>
For example, a total of 3 A group of sentinels , that quorum Can be configured to 2, When a sentry gets 2 Yes, yes , You can mark master “ Objective offline ”, Of course, this vote includes your own one .
A sentinel with a majority vote can send orders to other sentinels , State that you want to perform master-slave switching . And let the other sentinels vote , The voting process is called “Leader The election ”.
Want to be “Leader” It's not that simple , You have to have two brushes . The following conditions need to be met ：
- More than half of the other sentinel friends voted for it ;
- The number of affirmative votes should be greater than or equal to that of the configuration file quorum Value .
If the sentry group has 2 An example , here , A sentinel wants to be Leader, Must obtain 2 ticket , instead of 1 ticket . therefore , If a sentinel goes down , that , At this time, the cluster is unable to switch between master and slave databases . therefore , Usually we will at least configure 3 A sentinel example .
This is also the reason why sentry clusters are deployed in an odd number , Even numbers are unnecessary and wasteful .
The election process is shown in the figure below ：
Redis Sentinels perform master-slave switching
adopt pub/sub Implement client event notification
“65 Brother ： new master It's chosen , How to publicize the world ？ ”
A press conference, of course , Invite news related media reports to spread , Interested people naturally pay attention to subscription related events , And act on events .
stay Redis It's similar , adopt pub/sub Mechanisms release different events , Let the client subscribe to the message here . The client can subscribe to sentry messages , The sentinel has a lot of subscription channels , Different channels contain different key events in the process of master-slave switch .
That is to say, in different “ Wechat group ” Publish different events , Let the people who are interested in the event into the group .
master Offline events
- +sdown： Get into “ Subjective offline ” state ;
- -sdown： sign out “ Subjective offline ” state ;
- +odown： Get into “ Objective offline ” state ;
- -odown： sign out “ Objective offline ” state ;
slave Reconfigure Events
- +slave-reconf-sent： The sentry sent replicaof Command to reconfigure the slave Library ;
- +slave-reconf-inprog：slave New master, But it's not synchronized yet ;
- +slave-reconf-done：slave New master, And with the new master Complete data synchronization ;
New main library switch
+switch-master：master The address has changed .
After knowing these channels , So that the client can subscribe to the message from the sentry . After the client reads the Sentinel's configuration file , You can get the sentry's address and port , Network with the sentry .
then , We can execute subscription commands on the client side , To get different event messages .
Take a chestnut ： The following commands subscribe to “ Events in which all instances enter the objective offline state ”
Notes and configuration instructions
Did you find out ,Redis Of pub/sub The publish subscribe mechanism is particularly important , With pub/sub Mechanism , Between the sentry and the sentry 、 Between the sentry and the slave 、 The connection can be established between the sentry and the client , The release of various events is also realized through this mechanism .
Sentinel In the configuration file down-after-milliseconds Option specifies Sentinel Determine the length of time it takes for the instance to enter the subjective logoff ： If an example is in down-after-milliseconds In milliseconds , In succession Sentinel Return invalid reply , that Sentinel The data corresponding to this instance will be modified , This indicates that the instance has entered the subjective offline state .
Make sure that the configuration of all sentinel instances is consistent , Especially the subjective judgment value down-after-milliseconds. Because this value is not configured consistently on different sentinel instances , As a result, the sentinel cluster has not reached a consensus on the failed main database , So we didn't switch the main database in time , The end result of cluster service instability .
down-after-milliseconds * 10
down-after-milliseconds It is the maximum connection timeout that we determine that the master-slave database is disconnected . If in down-after-milliseconds In milliseconds , The master and slave nodes are not connected through the network , We can think that the master-slave node is disconnected . If the disconnection occurs more than 10 Time , This shows that the network condition of the slave database is not good , Not suitable as a new master library .
The main task of the sentry is
Redis The sentinel mechanism is to achieve Redis One of the high availability means of uninterrupted service . Data synchronization of master-slave architecture cluster , It is the basic guarantee of data reliability ; Main library down , Automatic execution of master-slave switching is the key support for uninterrupted service .
Redis Sentry mechanism realizes the automatic switch between master and slave , I'm not afraid to be with my female friend any more master It's down. ：
- monitor master And slave Running state , Judge whether it is objective ;
- master After the objective offline , Select a slave Switch to master;
- notice slave And client new master Information .
The principle of sentry group
In order to avoid the failure of master-slave switch after single sentry failure , And to reduce the miscarriage of justice , And the sentinel group was introduced ; Sentinel cluster needs some mechanisms to support its normal operation ：
- be based on pub/sub Mechanism to realize the communication between sentry clusters ;
- be based on INFO Command acquisition slave list , help The sentry and slave Establishing a connection ;
- Through the sentry's pub/sub, Realize the event notification between client and sentry .
Master slave switch , It's not a random choice of a sentry to execute , It's arbitration by vote , Select a Leader, By this Leader Responsible for master-slave switching .
- [redis Design and implementation ] Huang Jianhong
- [redis Core technology and actual combat ] https://time.geekbang.org/column/article/274483
- [redis Deep Adventure ： Core principles and practical application ] https://juejin.cn/book/6844733724618129422/section/6844733724722987021
- [redis project ： An in-depth interpretation of the sentinel model ] https://juejin.cn/post/6934984432273063967#heading-0
- [redis The sentinel principle , I've put up with you for a long time ！] https://www.modb.pro/db/25926
This article is from WeChat official account. - High performance server development （easyserverdev）
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the firstname.lastname@example.org Delete .
Original publication time ： 2021-04-01
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .