编程知识 cdmana.com

Redis sentry mode

Redis Sentinel mode

As we have seen before Redis High concurrency , Master slave architecture with read-write separation . But I didn't think about it redis High availability , What to do if the main service process is dead ?redis As an excellent nosql database , Has provided us with solutions : sentry (Sentinel) Mechanism .

redis The sentinel system can monitor one or more master services and the slave services corresponding to the master services . When the main service process is detected to be down , The sentinel system will select a replacement position from the corresponding slave service of the main service . Old main service reconnects , Slave service modified to the new master service .

SentinelState

Initialization status

Sentinel It's a special redis service , When Sentinel Service startup , The service initializes according to the configuration SentinelState The structure is as follows: , The structure will hold information about Sentinel The relevant status of the service .

Every sentinelRedisInstance The structure represents a quilt Sentinel Monitored Redis Server instance (instance), This instance can be the primary server 、 From the server , Or another one Sentinel.

image-20201110205808568

Update status

After initialization , about Sentinel Perception of subsequent changes in master-slave Services , To update SentinelState The structure should be from Sentinel Will establish a network connection with the master-slave service .

The main service

By default Sentinel It's going to happen every ten seconds , Send... Via command connection to the monitored master server INFO command , And by analyzing INFO Command to get the current information of the master server .

The message consists of two parts :

  1. Information about the main service : Information about the main server itself , Include run_id Domain records server running ID, as well as role Server role for domain records ;
  2. The slave service information under the master service : About service from ip Address 、 Port number ,offset Etc .

Through this information :

  1. Update the main service sentinelRedisInstance Structure related information , for example runId;
  2. Update main service structure slaves The field corresponds to the slave service of the dictionary sentinelRedisInstance structure , Being is updating , There is no new building .
image-20201110225115586

From the service

By default Sentinel It's going to happen every ten seconds , Send to the slave server through a command connection INFO command . And by analyzing INFO Command to get the current information from the service server , To update the corresponding slave service sentinelRedisInstance structure .

Information mainly includes :

  1. Run from server ID run_id.
  2. From the role of the server role.
  3. Of the main server IP Address master_host, And the port number of the primary server master_port.
  4. The connection status of the master and slave servers master_link_status.
  5. Priority from server slave_priority.
  6. Copy offset from server slave_repl_offset.

Sentinel colony

Generally, in order to ensure the high availability of a service , Will evolve from stand-alone version to cluster version .Sentinel So is the service .

Sentinel In the cluster Sentinel Mutual discovery , It's through redis Of pub/sub System implementation . Every two seconds , Every sentinel goes to __sentinel__:hello This channel Send a message in , Now watch the same redis Service Sentinel Can consume this news , And sense the presence of other sentinels .

The message is : Oneself host、ip and runid And for this master Monitoring configuration of .

Sentinel Receive the message to judge runId And myself :

  1. identical , Ignore the message ;
  2. inequality , To parse the message , And to the corresponding master server sentinelRedisInstance Structure update .
image-20201110225407084

High availability

In order to sense the online status of the service , By default ,Sentinel Will take the initiative to establish a network connection with it at a rate of per second master,slave,Sentinel Service delivery ping command . When a valid reply is not received within the specified time :+PONG、-LOADING、-MASTERDOWN Any one of them , The service is considered to be down .

Subjective downtime (sdown)

stay Sentinel.conf The configuration file , It specifies Sentinel Determine the length of time it takes for an instance to enter a subjective outage , The unit of time is ms, The default value is 30000.

# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000

The above configuration , When the one under this configuration Sentinel The service is 3000 In milliseconds , Have not received master service A Effective response to , You'll decide master service A Downtime .Sentinel Will modify master service A The corresponding sentinelRedisInstance Instance structure , Put the structural flags Property changed to SRI_S_DOWN identification , This shows that the instance has entered the downtime state .

down-after-milliseconds The value of is not only applicable to master Down state , It also applies to the master All under service slave, And watch it master Other Sentinel.

every last Sentinel Services are configuring sentinel down-after-milliseconds Can be different , So there will be SentinelA To configure 3000ms,SentinelB To configure 5000ms, stay 3000ms No valid response received ,SentinelA Will think master Downtime , and SentinelB Don't think master Downtime . This situation is identified as subjective downtime .

Objective downtime (odwon)

SentinelA Find out master After subjective downtime , But I'm not sure master Is it really down . So you need to take the initiative to ask , Also monitoring this master Other Sentinel brother , When a certain number of Sentinel The services all agree that master Downtime , It was decided that master Objective downtime , Start failover .

stay Sentinel.conf The configuration file

#  sentry sentinel Monitored redis The master node  ip port 
# master-name   You can name the master node by yourself   Only letters A-z、 Numbers 0-9 、 These three characters ".-_" form .
# quorum  When these quorum Number sentinel The sentry thought master Primary node lost connection   Then at this time   It is objectively believed that the primary node is disconnected 
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 192.168.1.108 6379 2

The above configuration quorum by 2, stay 5 individual Sentinel In the cluster , Yes 2 individual Sentinel cognizance master Downtime , That is, objective downtime .

Downtime identification process :

  1. Sentinel No valid response received , It is judged as subjective downtime .
  2. Ask about other Sentinel situation . other Sentinel To determine the state of downtime , Return the decision message .
  3. Received other Sentinel Reply to , Statistics identify subjective downtime Sentinel Number , If exceeded quorum Number of configurations , It is judged as objective downtime .
  4. Sentinel modify master service A The corresponding sentinelRedisInstance Instance structure , Put the structural flags Property changed to SRI_O_DOWN identification .
  5. Start failover .

The lead Sentinel

At the beginning of the failover , Just one Sentinel To carry out this work . So for Sentinel colony , You need to elect a leader Sentinel To do it .

The election process :

  1. every last Sentinel Both have the right to vote and to be elected , And there's only one chance in a round of elections .
  2. Every discovery master The downtime Sentinel Will send commands to other Sentinel, I hope to set myself as a local leader Sentinel.
  3. Set local leader Sentinel Rules are the guide, first served ,Sentinel You're going to take the first one from the order runId Set as your local leader Sentinel; And return the message .
  4. Receive the return message and parse , Judge whether you have been set as a local leader Sentinel.
  5. When one Sentinel By more than half Sentinel Local service leader Sentinel, Then it is decided that Sentinel To lead Sentinel service .
  6. If not elected , Then we start a new round of elections .

Fail over

Sentinel The cluster chooses the leader Sentinel after , Start to fail over , It consists of three parts :

  1. Elect a new master.
  2. modify slave Copy target of .
  3. old master Go online again and change it to slave.

Elect a new master

Sentinel One of the remaining slave services will be elected as a new one master, But you can't just come , To comply with the following election rules :

  1. Exclude part from service : Offline / Broken wire 、 Disconnected from the primary service for more than 10*down-after-milliseconds And no reply leader in five seconds Sentinel Of info Command service .
  2. Select the highest priority slave service .
  3. The priority is the same , Select a service with a large copy offset .
  4. The offset is the same , choice runId Small from service .

Transfer

After the election of a new main service , The lead Sentinel It will be sent to others from the service saveOf command , Let them replicate the new master. When the old master After re launching , Be modified from the service and copy the new master.

Data loss

There's nothing absolutely perfect , So in Sentinel Mode when switching between the master and the standby , There will be data loss .

Asynchronous replication

Data replication between master and slave services is asynchronous , When some commands are written to the main service , The master service fails to come down and the command propagates to the slave server , Then this part of the command will be lost .

Split brain

When master Services that break away from the normal network are considered to be down , Switch to new master, There are two master, The so-called cleft brain . however client It's not here to switch master, Continue to write data , When the old one is back online, it is changed from service , Will empty the old master Service data to replicate new master, Then some data will be lost .

image-20201112204415129

Solution

min-slaves-to-write 1
min-slaves-max-lag 10

The above two configurations can reduce the data loss caused by asynchronous replication and brain split

It requires at least 1 individual slave, The delay of data replication and synchronization cannot exceed 10 second ; If it's all slave, The latency of data replication and synchronization exceeds 10 Second , that master No more requests will be received

  1. With min-slaves-max-lag This configuration , once slave Copy data and ack Delay too long , Then refuse to write the request , In this way, we can put master Some data is not synchronized to slave The resulting data loss is reduced to a controllable range .

  2. If one master There's a cleft in the brain , With others slave Lost connection , Then the above two configurations can ensure that , If you can't continue to give the specified number of slave send data , and slave exceed 10 Second didn't give itself ack news , Then reject the client's write request directly . This is the old after the brain crack master Will not accept client New data for , It also avoids data loss .

  3. The above configuration ensures , If with any one slave Lost connection , stay 10 Seconds later, I found no slave Give yourself ack, Then reject the new write request

    So in the split brain scenario , The most you can lose 10 Second data .

Reference resources :

  1. The traffic volume of Huperzia chinensis is 100 million
  2. Redis Design and implementation

版权声明
本文为[GGuoLiang]所创,转载请带上原文链接,感谢

Scroll to Top