编程知识 cdmana.com

How to ensure the consistency of distributed data in Apache zookeeper - leader election

 Insert picture description here


Pre

Apache ZooKeeper - The election Leader Deep analysis of source code flow

stay ZooKeeper In the cluster , The server is divided into Leader The server 、 Follower Servers and Observer The server .

We can think of it like this ,Leader Elections are a process , In the process ZooKeeper We have mainly done two important works , One is data synchronization , The other is the election of new Leader The server .

Today, let's move on to ZooKeeper Data synchronization in cluster .


flow chart

 Insert picture description here


Leader The coordination process

CAP The theorem is that a distributed system cannot satisfy consistency at the same time 、 Usability , And partition fault tolerance .

Today is about consistency . Actually ZooKeeper The consistency of the implementation in is not strong , That is, the data on each server in the cluster is consistent all the time . stay ZooKeeper in , It's the ultimate consistent feature , After a period of time ,ZooKeeper Finally, the data on the cluster server is consistent .

stay ZooKeeper In the cluster ,Leader The server is mainly responsible for handling transactional requests , When receiving a transactional request operation from a client ,Leader The server will first ask each machine in the cluster to vote for the session .

If you want to achieve ZooKeeper Ultimate consistency in a cluster , Let's first determine what will happen to ZooKeeper Cluster services produce inconsistencies .

 Insert picture description here

When the cluster initialization starts , The first step is to synchronize the data on each server in the cluster . And in clusters Leader When the server crashes , A new... Needs to be elected Leader In this process, the data on each server will be inconsistent , So when a new one is elected Leader The server needs to synchronize the data .


ZK How is it realized

ZooKeeper The majority principle is adopted in the cluster , That is, when a transactional request causes the data on the server to change ,ZooKeeper Just make sure that the data of most machines on the cluster has changed correctly , To ensure the consistency of system data .

This is because in a ZooKeeper In the cluster , every last Follower Servers can be seen as Leader A copy of the server's data , You need to ensure that most of the machine data in the cluster is consistent , In this way, when there is an individual machine failure in the cluster ,ZooKeeper The cluster can still guarantee stable operation .

stay ZooKeeper During the running process of cluster service , The process of data synchronization is shown in the figure below . When the session request for data change is finished , The server in the cluster needs to be synchronized .

 Insert picture description here


Broadcast mode

ZooKeeper In the implementation of the code layer, a HashSet Variable of type , Used to manage... In a cluster Follower The server , Then call getForwardingFollowers Function to get the Follower The server

public class Leader(){
    

 HashSet<LearnerHandler> forwardingFollowers;

 public List<LearnerHandler> getForwardingFollowers() {
    

   synchronized (forwardingFollowers) {
    

       return new ArrayList<LearnerHandler>(forwardingFollowers);

 }

}

stay ZooKeeper After the cluster server votes a transaction request operation and passes it ,Leader Server execution isQuorumSynced Method to judge the ZooKeeper In the cluster Follower The connection state of the node , because isQuorumSynced Methods can be called by multiple threads , So in the process of operation, we have to go through forwardingFollowers Lock the field .

Then traverse through the cluster Follower The server , According to server zxid、 And data synchronization status and other conditions to determine whether the server's execution logic is successful . And then statistics Follower Server's sid And back to .

public boolean isQuorumSynced(QuorumVerifier qv) {
    

  synchronized (forwardingFollowers) {
    

   for (LearnerHandler learnerHandler: forwardingFollowers){
    

       if(learnerHandler.synced()){
    

         ids.add(learnerHandler.getSid());

       }

   }

  }

}

Through the introduction above ,Leader The server in the cluster has been determined Follower Server status and other preparations before synchronizing data ,

Next Leader The server will go through request.setTxn Method to the cluster Follower The server sends session requests for data changes .

In the process , We can Leader The server is seen as ZooKeeper Clients in services , And it goes to the cluster Follower The server sends a data update request , In the cluster Follower When the server receives the request, it processes the session , After that, the data change operation is carried out .

This is shown in the following code , In the underlying implementation , By calling request Ask for setTxn Method direction Follower The server sends the request , stay setTxn The parameter we pass in the function has the operation type field CONFIG_NODE, Indicates that the operation is a data synchronization operation .

request.setTxn(new SetDataTxn(ZooDefs.CONFIG_NODE, request.qv.toString().getBytes(), -1));    


Recovery mode

Introduction after Leader How nodes are managed Follower After the server synchronizes the data , Now let's take a look at when Leader After the server crashes ZooKeeper How does the cluster recover and synchronize data .

When ZooKeeper One of the clusters Leader When the server fails , Will be back in Follower Select a new server from the server as Leader The server .

and ZooKeeper Services are often used in high concurrency scenarios , If there are new transactional request operations in the process , How to deal with it ? Because there is no such thing in the cluster at this time Leader The server , Theoretically ZooKeeper The request will be lost directly , The conversation is not processed , But this is obviously not possible in actual production , that ZooKeeper How do you do that ?

stay ZooKeeper in , Re election Leader The server goes through a period of time , So in theory ZooKeeper There will be no... In the cluster for a short time Leader The server , In this case, when a transactional request operation is received ,ZooKeeper The service will suspend the session first , A suspended session does not calculate the timeout of the session , After the Leader After the server is generated, the system will perform these session operations synchronously .


The source code to achieve

There was a mention of LearnerHandler class , At that time, we simply analyzed the role of this class from the perspective of communication and cooperation between servers . and LearnerHandler Classes can actually be seen as all Learner The processor working inside the server , Its responsibilities include : Conduct Follower、Observer The server and Leader Server data synchronization 、 The forwarding of transactional session requests and Proposal Proposed voting and other functions .

LearnerHandler Is a multithreaded class , stay ZooKeeper In the process of cluster service running , One Follower or Observer The server corresponds to a LearnerHandler . In the process of cluster servers coordinating with each other ,Leader The server will work with every Learner The server maintains a long connection , And start a separate LearnerHandler Thread to process .

This is shown in the following code , stay LearnerHandler Thread class , The core method is run Method , Call the data synchronization function . First, through syncFollower Function to determine whether the data synchronization mode is snapshot mode . If it's snapshot mode , will Leader Data operation log on the server dump Send it to Follower Wait for the server , stay Follower After the server receives the data operation log , Execute the log locally , Finally complete the data synchronization operation .

public void run() {
    

  boolean needSnap = syncFollower(peerLastZxid, leader.zk.getZKDatabase(), leader);

  if(needSnap){
    

    leader.zk.getZKDatabase().serializeSnapshot(oa);

    oa.writeString("BenWasHere", "signature");

    bufferedOutput.flush();

  }

}


Summary

Here we are ZooKeeper The principle and implementation of data consistency are introduced in detail . Let's summarize ,ZooKeeper When dealing with the consistency problem, the cluster basically adopts two ways to coordinate the work of the servers in the cluster , They are recovery mode and broadcast mode .

  • Recovery mode : When ZooKeeper In the cluster Leader After the server crashes ,ZooKeeper The cluster works in a recovery mode , In this project ,ZooKeeper The cluster will start with Leader Reselection of node servers , After that, they were elected Leader The server synchronizes all the servers in the system to ensure the consistency of the data on the servers in the cluster .

  • Broadcast mode : When ZooKeeper There are Leader The server , And can work normally , There are new ones in the cluster Follower Server join ZooKeeper To participate in the work , This often happens when system performance reaches a bottleneck , And then the dynamic expansion of the system . under these circumstances , If you don't do anything , So the new server will be used as Follower The server , The data on it and ZooKeeper Data on other servers in the cluster is inconsistent . When a new query session request is sent to ZooKeeper Cluster processing , And it happened that the request was actually distributed to the new Follower Machine processing , This will lead to data that clearly exists in the cluster , We can't find it on this server , The situation that leads to inconsistent data query . therefore , When there is a new Follower Server join ZooKeeper When you're in a cluster , The server will start in recovery mode , And find... In the cluster Leader Node server , And the same Leader The server synchronizes the data .

go ~

 Insert picture description here

版权声明
本文为[Little craftsman]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201225043318270I.html

Scroll to Top