编程知识 cdmana.com

Consistent hash algorithm and Java implementation

Uniformity hash Algorithm is a commonly used and easy-to-use partition algorithm in distributed computing 、 Or database sub database sub table algorithm . In the current Internet Service Architecture , To avoid a single point of failure 、 Improve processing efficiency 、 Horizontal expansion and other reasons , Distributed system has become a necessary deployment mode for home travel , So we also produced several methods of data fragmentation :
1. modulus ,2. Segmenting ,3. Uniformity hash
One of the big problems with the first two is the fixed number of nodes , That is, the number of nodes cannot be changed , You can't hang up a node or add a node in real time , If you change the fragmentation rules, you need to change , There's a lot of data that needs to be migrated .
So consistency hash How to solve this problem ?
Uniformity hash: For nodes and data , Do it all once hash operation , Then compare the nodes and the data hash value , The node whose data value is closest to the node is the processing node . In order to be more evenly distributed , By using virtual nodes , Each node calculates n individual hash value , Put evenly on hash In this way, the data can be more evenly distributed to each node .
1、 principle
(1) ring Hash Space
According to the usual hash The algorithm will correspond to key Hash to one with 2^32 In the space of the next bucket , namely 0~(2^32)-1 In the digital space .
Now we can connect these numbers head to tail , Think of it as a closed ring . Here's the picture
 Picture description here
(2) Pass the data through a certain hash After the algorithm is processed, it is mapped to the ring
Now we will object1、object2、object3、object4 Four objects pass through a specific Hash The function calculates the corresponding key value , Then hash to Hash On the ring . Here's the picture :
Hash(object1) = key1;
Hash(object2) = key2;
Hash(object3) = key3;
Hash(object4) = key4;
 Picture description here
(3) Pass the machine through hash The algorithm maps to the ring
Add new machines to a distributed cluster using a consistent hash algorithm , It works by using the same as object storage Hash The algorithm also maps machines into rings
( Generally speaking, for machines hash The calculation is machine based IP Or the unique alias of the machine as the input value ), Then calculate in a clockwise direction , Store all objects in the nearest machine .
Suppose there are now NODE1,NODE2,NODE3 Three machines , adopt Hash The algorithm gets the corresponding KEY value , Map to ring , The schematic diagram is as follows :
Hash(NODE1) = KEY1;
Hash(NODE2) = KEY2;
Hash(NODE3) = KEY3;
 Picture description here
You can see from the above figure that the object and the machine are in the same hash space , Turn clockwise in this way object1 It's stored in NODE1 in ,object3 It's stored in NODE2 in ,object2、object4 It's stored in NODE3 in .
In such a deployment environment ,hash The ring will not change , therefore , By calculating the hash Value can be quickly located in the corresponding machine , This will find the real storage location of the object .
2、 Deletion and addition of machines
Ordinary hash The most improper part of the remainder algorithm is that a large number of object storage locations will be invalid after adding or deleting machines . Let's analyze how the consistent hash algorithm is handled .
(1) node ( machine ) The deletion of
Take the distribution above as an example , If NODE2 The fault has been removed , So follow the clockwise migration method ,object3 Will be moved to NODE3 in , It's just object3 The mapping location of has changed , There are no changes to other objects . Here's the picture :
 Picture description here
(2) node ( machine ) The addition of
If you add a new node to the cluster NODE4, Through the corresponding hash algorithm KEY4, And map it to the ring , Here's the picture :
 Picture description here
By moving clockwise , that object2 Was moved to NODE4 in , Other objects remain in their original storage locations .
Through the analysis of adding and deleting nodes , The consistency hash algorithm keeps monotonicity at the same time , Or the migration of data has reached the minimum , This algorithm is very suitable for distributed cluster , Avoid a lot of data migration , Reduce the pressure on the server .
3、 Balance – Virtual node
According to the diagram above , The consistency hash algorithm satisfies the monotonicity and load balancing characteristics as well as the general hash The dispersion of the algorithm , But this is not the reason why it is widely used ,
Because of the lack of balance . The following will analyze how the consistency hash algorithm satisfies the balance .
hash Algorithms are not guaranteed to be balanced , If there is only NODE1 and NODE3 The situation of (NODE2 Deleted image ),object1 It's stored in NODE1 in , and object2、object3、object4 It's all stored in NODE3 in , This creates a very unbalanced state . In the consistent hash algorithm , In order to satisfy the balance as much as possible , It introduces virtual nodes .
——“ Virtual node ”( virtual node ) Is the actual node ( machine ) stay hash A replica of space ( replica ), An actual node ( machine ) There are several “ Virtual node ”, This corresponding number also becomes “ Number of copies ”,“ Virtual node ” stay hash In space hash Value permutation .
So there are only NODE1 and NODE3 The situation of (NODE2 Deleted image ) For example , The previous objects are very unevenly distributed on the machine , Now let's use 2 Copies ( Number of copies ) For example , So the whole hash There is... In the ring 4 Virtual nodes , Finally, the diagram of object mapping is as follows :
 Picture description here
According to the above figure, we can know the mapping relationship of objects :object1->NODE1-1,object2->NODE1-2,object3->NODE3-2,object4->NODE3-1. Through the introduction of virtual nodes , The distribution of objects is more balanced . So in practice , How does real object query work ? Objects from hash The conversion from virtual node to actual node is shown in the figure below :
 Picture description here
“ Virtual node ” Of hash The calculation can use the IP The way to add a number suffix to an address . For example, suppose NODE1 Of IP The address is 192.168.1.100. introduce “ Virtual node ” front , Calculation cache A Of hash value :
Hash(“192.168.1.100”);
introduce “ Virtual node ” after , Calculation “ Virtual section ” spot NODE1-1 and NODE1-2 Of hash value :
Hash(“192.168.1.100#1”); // NODE1-1
Hash(“192.168.1.100#2”); // NODE1-2

Two 、 Uniformity hash Algorithm Java Realization .
1、 Without virtual nodes

package hash;  

import java.util.SortedMap;  
import java.util.TreeMap;  

/** *  Consistency without virtual nodes Hash Algorithm  *  a key :1. How to make one hash Ring ,2. How to map server nodes on Hash rings ,3. How to find the corresponding node  */  
public class ConsistentHashingWithoutVirtualNode {   

    // To be added Hash List of servers for the ring  
    private static String[] servers = { "192.168.0.0:111", "192.168.0.1:111",  
            "192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111" };  

    //key Represents the... Of the server hash value ,value Presentation server  
    private static SortedMap<Integer, String> sortedMap = new TreeMap<Integer, String>();  

    // Program initialization , Put all the servers in sortedMap in  
    static {  
        for (int i=0; i<servers.length; i++) {  
            int hash = getHash(servers[i]);  
            System.out.println("[" + servers[i] + "] Join the group ,  Its Hash The value is " + hash);  
            sortedMap.put(hash, servers[i]);  
        }  
        System.out.println();  
    }  

    // Should be routed to the node  
    private static String getServer(String key) {  
        // Get it key Of hash value  
        int hash = getHash(key);  
        // The gain is greater than the Hash All that's worth Map 
        SortedMap<Integer, String> subMap = sortedMap.tailMap(hash);  
        if(subMap.isEmpty()){  
            // If there's no better than key Of hash Worth a lot of money , From the first node Start  
            Integer i = sortedMap.firstKey();  
            // Return the corresponding server  
            return sortedMap.get(i);  
        }else{  
            // first Key Is to go clockwise and leave node The nearest node  
            Integer i = subMap.firstKey();  
            // Return the corresponding server  
            return subMap.get(i);  
        }  
    }  

    // Use FNV1_32_HASH The algorithm calculates the server's Hash value , No rewriting is used here hashCode Methods , The end result doesn't make any difference  
    private static int getHash(String str) {  
        final int p = 16777619;  
        int hash = (int) 2166136261L;  
        for (int i = 0; i < str.length(); i++)  
            hash = (hash ^ str.charAt(i)) * p;  
        hash += hash << 13;  
        hash ^= hash >> 7;  
        hash += hash << 3;  
        hash ^= hash >> 17;  
        hash += hash << 5;  

        //  If the calculated value is negative, take its absolute value  
        if (hash < 0)  
            hash = Math.abs(hash);  
        return hash;  
        }  

    public static void main(String[] args) {  
        String[] keys = {
  " The sun ", " The moon ", " The stars "};  
        for(int i=0; i<keys.length; i++)  
            System.out.println("[" + keys[i] + "] Of hash The value is " + getHash(keys[i])  
                    + ",  Nodes are routed to [" + getServer(keys[i]) + "]");  
    }  
} 

Execution results :

[192.168.0.0:111] Join the group ,  Its Hash The value is 575774686
[192.168.0.1:111] Join the group ,  Its Hash The value is 8518713
[192.168.0.2:111] Join the group ,  Its Hash The value is 1361847097
[192.168.0.3:111] Join the group ,  Its Hash The value is 1171828661
[192.168.0.4:111] Join the group ,  Its Hash The value is 1764547046

[ The sun ] Of hash The value is 1977106057,  Nodes are routed to [192.168.0.1:111]
[ The moon ] Of hash The value is 1132637661,  Nodes are routed to [192.168.0.3:111]
[ The stars ] Of hash The value is 880019273,  Nodes are routed to [192.168.0.3:111]

2、 With virtual nodes

package hash;  

import java.util.LinkedList;  
import java.util.List;  
import java.util.SortedMap;  
import java.util.TreeMap;  

import org.apache.commons.lang.StringUtils;  

/** *  Consistency with virtual nodes Hash Algorithm  */  
 public class ConsistentHashingWithoutVirtualNode {   

     // To be added Hash List of servers for the ring  
     private static String[] servers = {
  "192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111",  
             "192.168.0.3:111", "192.168.0.4:111"};  

     // Real node list , Considering that the server is online 、 Offline scene , Add 、 Deleted scenes will be more frequent , Use here LinkedList Will be better  
     private static List<String> realNodes = new LinkedList<String>();  

     // Virtual node ,key Representing virtual nodes hash value ,value Represents the name of the virtual node  
     private static SortedMap<Integer, String> virtualNodes = new TreeMap<Integer, String>();  

     // Number of virtual nodes , Here to write die , To demonstrate the need for , A real node corresponds to 5 Virtual nodes  
     private static final int VIRTUAL_NODES = 5;  

     static{  
         // First add the original server to the list of real nodes  
         for(int i=0; i<servers.length; i++)  
             realNodes.add(servers[i]);  

         // Add virtual nodes , Traverse LinkedList Use foreach The cycle efficiency will be higher  
         for (String str : realNodes){  
             for(int i=0; i<VIRTUAL_NODES; i++){  
                 String virtualNodeName = str + "&&VN" + String.valueOf(i);  
                 int hash = getHash(virtualNodeName);  
                 System.out.println(" Virtual node [" + virtualNodeName + "] Be added , hash The value is " + hash);  
                 virtualNodes.put(hash, virtualNodeName);  
             }  
         }  
         System.out.println();  
     }  

     // Use FNV1_32_HASH The algorithm calculates the server's Hash value , No rewriting is used here hashCode Methods , The end result doesn't make any difference  
     private static int getHash(String str){  
         final int p = 16777619;  
         int hash = (int)2166136261L;  
         for (int i = 0; i < str.length(); i++)  
             hash = (hash ^ str.charAt(i)) * p;  
         hash += hash << 13;  
         hash ^= hash >> 7;  
         hash += hash << 3;  
         hash ^= hash >> 17;  
         hash += hash << 5;  

         //  If the calculated value is negative, take its absolute value  
         if (hash < 0)  
             hash = Math.abs(hash);  
         return hash;  
     }  

     // Should be routed to the node  
     private static String getServer(String key){  
        // Get it key Of hash value  
         int hash = getHash(key);  
         //  The gain is greater than the Hash All that's worth Map 
         SortedMap<Integer, String> subMap = virtualNodes.tailMap(hash);  
         String virtualNode;  
         if(subMap.isEmpty()){  
            // If there's no better than key Of hash Worth a lot of money , From the first node Start  
            Integer i = virtualNodes.firstKey();  
            // Return the corresponding server  
            virtualNode = virtualNodes.get(i);  
         }else{  
            // first Key Is to go clockwise and leave node The nearest node  
            Integer i = subMap.firstKey();  
            // Return the corresponding server  
            virtualNode = subMap.get(i);  
         }  
         //virtualNode The virtual node name needs to be intercepted  
         if(StringUtils.isNotBlank(virtualNode)){  
             return virtualNode.substring(0, virtualNode.indexOf("&&"));  
         }  
         return null;  
     }  

     public static void main(String[] args){  
         String[] keys = {
  " The sun ", " The moon ", " The stars "};  
         for(int i=0; i<keys.length; i++)  
             System.out.println("[" + keys[i] + "] Of hash The value is " +  
                     getHash(keys[i]) + ",  Nodes are routed to [" + getServer(keys[i]) + "]");  
     }  
 }

Execution results :

 Virtual node [192.168.0.0:111&&VN0] Be added , hash The value is 1686427075
 Virtual node [192.168.0.0:111&&VN1] Be added , hash The value is 354859081
 Virtual node [192.168.0.0:111&&VN2] Be added , hash The value is 1306497370
 Virtual node [192.168.0.0:111&&VN3] Be added , hash The value is 817889914
 Virtual node [192.168.0.0:111&&VN4] Be added , hash The value is 396663629
...
 Virtual node [192.168.0.4:111&&VN0] Be added , hash The value is 586921010
 Virtual node [192.168.0.4:111&&VN1] Be added , hash The value is 184078390
 Virtual node [192.168.0.4:111&&VN2] Be added , hash The value is 1331645117
 Virtual node [192.168.0.4:111&&VN3] Be added , hash The value is 918790803
 Virtual node [192.168.0.4:111&&VN4] Be added , hash The value is 1232193678

[ The sun ] Of hash The value is 1977106057,  Nodes are routed to [192.168.0.2:111]
[ The moon ] Of hash The value is 1132637661,  Nodes are routed to [192.168.0.4:111]
[ The stars ] Of hash The value is 880019273,  Nodes are routed to [192.168.0.3:111]

original text :
Uniformity hash Algorithm and java Realization
Make a little progress every day —— Five minutes to understand the consistency hash algorithm (consistent hashing)
For consistency Hash Algorithm ,Java In depth research on code implementation

版权声明
本文为[He Yuhua]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/01/20210131204345904k.html

Scroll to Top