## 编程知识 cdmana.com

### Consistent hash algorithm and Java implementation

Uniformity hash Algorithm is a commonly used and easy-to-use partition algorithm in distributed computing 、 Or database sub database sub table algorithm . In the current Internet Service Architecture , To avoid a single point of failure 、 Improve processing efficiency 、 Horizontal expansion and other reasons , Distributed system has become a necessary deployment mode for home travel , So we also produced several methods of data fragmentation ：
1. modulus ,2. Segmenting ,3. Uniformity hash
One of the big problems with the first two is the fixed number of nodes , That is, the number of nodes cannot be changed , You can't hang up a node or add a node in real time , If you change the fragmentation rules, you need to change , There's a lot of data that needs to be migrated .
So consistency hash How to solve this problem ？
Uniformity hash： For nodes and data , Do it all once hash operation , Then compare the nodes and the data hash value , The node whose data value is closest to the node is the processing node . In order to be more evenly distributed , By using virtual nodes , Each node calculates n individual hash value , Put evenly on hash In this way, the data can be more evenly distributed to each node .
1、 principle
(1) ring Hash Space
According to the usual hash The algorithm will correspond to key Hash to one with 2^32 In the space of the next bucket , namely 0~(2^32)-1 In the digital space .
Now we can connect these numbers head to tail , Think of it as a closed ring . Here's the picture

(2) Pass the data through a certain hash After the algorithm is processed, it is mapped to the ring
Now we will object1、object2、object3、object4 Four objects pass through a specific Hash The function calculates the corresponding key value , Then hash to Hash On the ring . Here's the picture ：
Hash(object1) = key1;
Hash(object2) = key2;
Hash(object3) = key3;
Hash(object4) = key4;

(3) Pass the machine through hash The algorithm maps to the ring
Add new machines to a distributed cluster using a consistent hash algorithm , It works by using the same as object storage Hash The algorithm also maps machines into rings
（ Generally speaking, for machines hash The calculation is machine based IP Or the unique alias of the machine as the input value ）, Then calculate in a clockwise direction , Store all objects in the nearest machine .
Suppose there are now NODE1,NODE2,NODE3 Three machines , adopt Hash The algorithm gets the corresponding KEY value , Map to ring , The schematic diagram is as follows ：
Hash(NODE1) = KEY1;
Hash(NODE2) = KEY2;
Hash(NODE3) = KEY3;

You can see from the above figure that the object and the machine are in the same hash space , Turn clockwise in this way object1 It's stored in NODE1 in ,object3 It's stored in NODE2 in ,object2、object4 It's stored in NODE3 in .
In such a deployment environment ,hash The ring will not change , therefore , By calculating the hash Value can be quickly located in the corresponding machine , This will find the real storage location of the object .
2、 Deletion and addition of machines
Ordinary hash The most improper part of the remainder algorithm is that a large number of object storage locations will be invalid after adding or deleting machines . Let's analyze how the consistent hash algorithm is handled .
(1) node （ machine ） The deletion of
Take the distribution above as an example , If NODE2 The fault has been removed , So follow the clockwise migration method ,object3 Will be moved to NODE3 in , It's just object3 The mapping location of has changed , There are no changes to other objects . Here's the picture ：

(2) node （ machine ） The addition of
If you add a new node to the cluster NODE4, Through the corresponding hash algorithm KEY4, And map it to the ring , Here's the picture ：

By moving clockwise , that object2 Was moved to NODE4 in , Other objects remain in their original storage locations .
Through the analysis of adding and deleting nodes , The consistency hash algorithm keeps monotonicity at the same time , Or the migration of data has reached the minimum , This algorithm is very suitable for distributed cluster , Avoid a lot of data migration , Reduce the pressure on the server .
3、 Balance – Virtual node
According to the diagram above , The consistency hash algorithm satisfies the monotonicity and load balancing characteristics as well as the general hash The dispersion of the algorithm , But this is not the reason why it is widely used ,
Because of the lack of balance . The following will analyze how the consistency hash algorithm satisfies the balance .
hash Algorithms are not guaranteed to be balanced , If there is only NODE1 and NODE3 The situation of （NODE2 Deleted image ）,object1 It's stored in NODE1 in , and object2、object3、object4 It's all stored in NODE3 in , This creates a very unbalanced state . In the consistent hash algorithm , In order to satisfy the balance as much as possible , It introduces virtual nodes .
——“ Virtual node ”（ virtual node ） Is the actual node （ machine ） stay hash A replica of space （ replica ）, An actual node （ machine ） There are several “ Virtual node ”, This corresponding number also becomes “ Number of copies ”,“ Virtual node ” stay hash In space hash Value permutation .
So there are only NODE1 and NODE3 The situation of （NODE2 Deleted image ） For example , The previous objects are very unevenly distributed on the machine , Now let's use 2 Copies （ Number of copies ） For example , So the whole hash There is... In the ring 4 Virtual nodes , Finally, the diagram of object mapping is as follows ：

According to the above figure, we can know the mapping relationship of objects ：object1->NODE1-1,object2->NODE1-2,object3->NODE3-2,object4->NODE3-1. Through the introduction of virtual nodes , The distribution of objects is more balanced . So in practice , How does real object query work ？ Objects from hash The conversion from virtual node to actual node is shown in the figure below ：

“ Virtual node ” Of hash The calculation can use the IP The way to add a number suffix to an address . For example, suppose NODE1 Of IP The address is 192.168.1.100. introduce “ Virtual node ” front , Calculation cache A Of hash value ：
Hash(“192.168.1.100”);
introduce “ Virtual node ” after , Calculation “ Virtual section ” spot NODE1-1 and NODE1-2 Of hash value ：
Hash(“192.168.1.100#1”); // NODE1-1
Hash(“192.168.1.100#2”); // NODE1-2

Two 、 Uniformity hash Algorithm Java Realization .
1、 Without virtual nodes

``````package hash;

import java.util.SortedMap;
import java.util.TreeMap;

/** *  Consistency without virtual nodes Hash Algorithm  *  a key ：1. How to make one hash Ring ,2. How to map server nodes on Hash rings ,3. How to find the corresponding node  */
public class ConsistentHashingWithoutVirtualNode {

// To be added Hash List of servers for the ring
private static String[] servers = { "192.168.0.0:111", "192.168.0.1:111",
"192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111" };

//key Represents the... Of the server hash value ,value Presentation server
private static SortedMap<Integer, String> sortedMap = new TreeMap<Integer, String>();

// Program initialization , Put all the servers in sortedMap in
static {
for (int i=0; i<servers.length; i++) {
int hash = getHash(servers[i]);
System.out.println("[" + servers[i] + "] Join the group ,  Its Hash The value is " + hash);
sortedMap.put(hash, servers[i]);
}
System.out.println();
}

// Should be routed to the node
private static String getServer(String key) {
// Get it key Of hash value
int hash = getHash(key);
// The gain is greater than the Hash All that's worth Map
SortedMap<Integer, String> subMap = sortedMap.tailMap(hash);
if(subMap.isEmpty()){
// If there's no better than key Of hash Worth a lot of money , From the first node Start
Integer i = sortedMap.firstKey();
// Return the corresponding server
return sortedMap.get(i);
}else{
// first Key Is to go clockwise and leave node The nearest node
Integer i = subMap.firstKey();
// Return the corresponding server
return subMap.get(i);
}
}

// Use FNV1_32_HASH The algorithm calculates the server's Hash value , No rewriting is used here hashCode Methods , The end result doesn't make any difference
private static int getHash(String str) {
final int p = 16777619;
int hash = (int) 2166136261L;
for (int i = 0; i < str.length(); i++)
hash = (hash ^ str.charAt(i)) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;

//  If the calculated value is negative, take its absolute value
if (hash < 0)
hash = Math.abs(hash);
return hash;
}

public static void main(String[] args) {
String[] keys = {
" The sun ", " The moon ", " The stars "};
for(int i=0; i<keys.length; i++)
System.out.println("[" + keys[i] + "] Of hash The value is " + getHash(keys[i])
+ ",  Nodes are routed to [" + getServer(keys[i]) + "]");
}
} ``````

Execution results ：

``````[192.168.0.0:111] Join the group ,  Its Hash The value is 575774686
[192.168.0.1:111] Join the group ,  Its Hash The value is 8518713
[192.168.0.2:111] Join the group ,  Its Hash The value is 1361847097
[192.168.0.3:111] Join the group ,  Its Hash The value is 1171828661
[192.168.0.4:111] Join the group ,  Its Hash The value is 1764547046

[ The sun ] Of hash The value is 1977106057,  Nodes are routed to [192.168.0.1:111]
[ The moon ] Of hash The value is 1132637661,  Nodes are routed to [192.168.0.3:111]
[ The stars ] Of hash The value is 880019273,  Nodes are routed to [192.168.0.3:111]``````

2、 With virtual nodes

``````package hash;

import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;

import org.apache.commons.lang.StringUtils;

/** *  Consistency with virtual nodes Hash Algorithm  */
public class ConsistentHashingWithoutVirtualNode {

// To be added Hash List of servers for the ring
private static String[] servers = {
"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111",
"192.168.0.3:111", "192.168.0.4:111"};

// Real node list , Considering that the server is online 、 Offline scene , Add 、 Deleted scenes will be more frequent , Use here LinkedList Will be better
private static List<String> realNodes = new LinkedList<String>();

// Virtual node ,key Representing virtual nodes hash value ,value Represents the name of the virtual node
private static SortedMap<Integer, String> virtualNodes = new TreeMap<Integer, String>();

// Number of virtual nodes , Here to write die , To demonstrate the need for , A real node corresponds to 5 Virtual nodes
private static final int VIRTUAL_NODES = 5;

static{
// First add the original server to the list of real nodes
for(int i=0; i<servers.length; i++)

// Add virtual nodes , Traverse LinkedList Use foreach The cycle efficiency will be higher
for (String str : realNodes){
for(int i=0; i<VIRTUAL_NODES; i++){
String virtualNodeName = str + "&&VN" + String.valueOf(i);
int hash = getHash(virtualNodeName);
System.out.println(" Virtual node [" + virtualNodeName + "] Be added , hash The value is " + hash);
virtualNodes.put(hash, virtualNodeName);
}
}
System.out.println();
}

// Use FNV1_32_HASH The algorithm calculates the server's Hash value , No rewriting is used here hashCode Methods , The end result doesn't make any difference
private static int getHash(String str){
final int p = 16777619;
int hash = (int)2166136261L;
for (int i = 0; i < str.length(); i++)
hash = (hash ^ str.charAt(i)) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;

//  If the calculated value is negative, take its absolute value
if (hash < 0)
hash = Math.abs(hash);
return hash;
}

// Should be routed to the node
private static String getServer(String key){
// Get it key Of hash value
int hash = getHash(key);
//  The gain is greater than the Hash All that's worth Map
SortedMap<Integer, String> subMap = virtualNodes.tailMap(hash);
String virtualNode;
if(subMap.isEmpty()){
// If there's no better than key Of hash Worth a lot of money , From the first node Start
Integer i = virtualNodes.firstKey();
// Return the corresponding server
virtualNode = virtualNodes.get(i);
}else{
// first Key Is to go clockwise and leave node The nearest node
Integer i = subMap.firstKey();
// Return the corresponding server
virtualNode = subMap.get(i);
}
//virtualNode The virtual node name needs to be intercepted
if(StringUtils.isNotBlank(virtualNode)){
return virtualNode.substring(0, virtualNode.indexOf("&&"));
}
return null;
}

public static void main(String[] args){
String[] keys = {
" The sun ", " The moon ", " The stars "};
for(int i=0; i<keys.length; i++)
System.out.println("[" + keys[i] + "] Of hash The value is " +
getHash(keys[i]) + ",  Nodes are routed to [" + getServer(keys[i]) + "]");
}
}``````

Execution results ：

`````` Virtual node [192.168.0.0:111&&VN0] Be added , hash The value is 1686427075
Virtual node [192.168.0.0:111&&VN1] Be added , hash The value is 354859081
Virtual node [192.168.0.0:111&&VN2] Be added , hash The value is 1306497370
Virtual node [192.168.0.0:111&&VN3] Be added , hash The value is 817889914
Virtual node [192.168.0.0:111&&VN4] Be added , hash The value is 396663629
...
Virtual node [192.168.0.4:111&&VN0] Be added , hash The value is 586921010
Virtual node [192.168.0.4:111&&VN1] Be added , hash The value is 184078390
Virtual node [192.168.0.4:111&&VN2] Be added , hash The value is 1331645117
Virtual node [192.168.0.4:111&&VN3] Be added , hash The value is 918790803
Virtual node [192.168.0.4:111&&VN4] Be added , hash The value is 1232193678

[ The sun ] Of hash The value is 1977106057,  Nodes are routed to [192.168.0.2:111]
[ The moon ] Of hash The value is 1132637661,  Nodes are routed to [192.168.0.4:111]
[ The stars ] Of hash The value is 880019273,  Nodes are routed to [192.168.0.3:111]``````

https://cdmana.com/2021/01/20210131204345904k.html

Scroll to Top