编程知识 cdmana.com

Comprehensive analysis of advanced core knowledge in Java -- redis (distributed lock [introduction, implementation], redlock distributed lock, hyperlog [introduction, principle, implementation, use])

One 、 In depth research on distributed locks

1. Introduction to distributed locks

lock It is used to solve multiple execution threads Access shared resources Tools for error or data inconsistency problems .

If Compare a server to a house , Then thread is like the resident inside , When they want to access a shared resource together , For example, in the toilet , If there is no lock on the toilet door … What's more, the toilet doesn't have a door … This is a matter of principle …

With the lock on , It's much more reassuring for everyone to use , The essence is Only one resident is allowed to use .

And with the development of the Internet world , Single application has been increasingly unable to meet the high concurrency requirements of complex Internet , It's slowly moving towards distributed , Slowly evolved into A larger household . So the same , We need to introduce distributed locks to solve the concurrent problem of accessing shared resources among distributed applications .

1) Why do I need distributed locks

In general , There are two scenarios in which we use distributed locks :

  1. Avoid different nodes repeating the same work : For example, if a user performs an operation, different nodes may send multiple emails ;
  2. Avoid destroying the correctness of data : If two nodes operate on the same data at the same time , May cause data errors or inconsistencies ;

2)Java Common ways of implementing in

Above we use a simple metaphor to illustrate the nature of locks : Only one user is allowed to operate at the same time . So theoretically , We can all use the tools that can meet this demand ( That's what other apps can do to lock us ):

  1. be based on MySQL In the lock :MySQL It has its own pessimism lock for update keyword , You can also achieve pessimism by yourself / Optimistic lock to achieve the goal ;
  2. be based on Zookeeper Ordered node :Zookeeper Allow temporary creation of ordered child nodes , So when the client gets the node list , It can determine whether the lock can be obtained by the sequence number in the current list of child nodes ;
  3. be based on Redis The single thread : because Redis A single thread , So the command will be executed serially , And it provides itself with something like SETNX(set if not exists) Such instructions , They are mutually exclusive ;

Each scheme has its own advantages and disadvantages , for example MySQL Although intuitive understanding is easy , But it needs extra consideration Lock timeout Plus business etc. , And performance is limited to databases , We won't discuss such things here , Focus on Redis.

3)Redis The problem with distributed locks

①、 Lock timeout

Let's say now we have two parallel services A B, among A The service is After getting the lock Because of the unknown mysterious power Hang up , that B The service will never get the lock :

So we need to set an extra timeout , To ensure the availability of services .

But then came another question : If the logic between locking and releasing locks is too long , So that the lock timeout limit is exceeded , There will also be problems . Because at this time the first thread holds the lock out of date , And the logic of the critical area is not finished , At the same time, the second thread has the lock ahead of time , The code that causes the critical area cannot be strictly serially executed .

To avoid this problem ,Redis Distributed locks should not be used for longer tasks . If there's a problem sometimes , The small data disorder may require human intervention .

There is a slightly safer solution Lock it value The value is set to a random number , Whether the random number is consistent when releasing the lock , And then delete it key, This is for Make sure that the lock occupied by the current thread is not released by other threads , Unless the lock is automatically released by the server due to expiration .

But match value And delete key stay Redis It's not an atomic operation , There are no instructions to guarantee atomicity , So you may need to use something like Lua Such a script to deal with , because Lua Scripts can Ensure the atomic execution of multiple instructions .

Extended discussion :GC Possible security issues

Martin Kleppmann Worked with Redis The father of Antirez Just Redis The security of implementing distributed lock has been discussed in depth , One of the problems involves GC.

be familiar with Java My classmates must be right GC No stranger , stay GC When it happens STW(Stop-The-World), This is to ensure the normal operation of the garbage collector , But it may cause the following problems :

service A Got lock and set timeout , But service A There is STW And for a long time , This results in the timeout release of the distributed lock , During this period of service B Got the lock , To be served A STW After that, the lock was restored , And that leads to this service A And the service B At the same time, we get the lock , At this time, distributed locks are not safe .

It's not limited to Redis,Zookeeper and MySQL Same problem .

②、 Single point / More questions

If Redis Adopt stand-alone deployment mode , That means to be Redis It's broken down , It will make the whole service unavailable .

And if the master-slave mode is adopted , Let's imagine a scene like this : service A After applying for a lock , If it's a host Redis It's down. , that service B When you apply for the lock, you will get the lock from the slave , To solve this problem ,Redis The author puts forward a kind of RedLock Red lock The algorithm of (Redission Same as Jedis):

//  Three  Redis  colony  
RLock lock1 = redissionInstance1.getLock("lock1"); 
RLock lock2 = redissionInstance2.getLock("lock2"); 
RLock lock3 = redissionInstance3.getLock("lock3"); 

RedissionRedLock lock = new RedissionLock(lock1, lock2, lock2); 
lock.lock(); 
// do something.... 
lock.unlock();

Two 、Redis Implementation of distributed lock

Distributed locks are similar to “ Occupy the pit ”, and SETNX(SET if Not eXists) Instruction is such an operation , Only one client is allowed to possess , Let's see Source code (t_string.c/setGenericCommand) Well :

// SET/ SETEX/ SETTEX/ SETNX  Bottom level implementation  
void setGenericCommand(client *c, int flags, robj *key, robj *val, robj *expire, 
int unit, robj *ok_reply, robj *abort_reply) {
   
    
	long long milliseconds = 0; /* initialized to avoid any harmness warning */ 
	//  If you define  key  The expiration time of is saved to the variables defined above  
	//  If the expiration time is set incorrectly, an error message will be returned  
	if (expire) {
   
    
		if (getLongLongFromObjectOrReply(c, expire, &milliseconds, NULL) != C_OK) 
			return; 
		if (milliseconds <= 0) {
   
    
			addReplyErrorFormat(c,"invalid expire time in %s",c->cmd->name); 
			return; 
		}
		if (unit == UNIT_SECONDS) milliseconds *= 1000; 
	}
	
	// lookupKeyWrite  Function is to extract for write operation  key  Value object of  
	//  The criterion here is : 
	// 1. If set  NX( non-existent ), And found... In the database  key  value  
	// 2. Or set up  XX( There is ), And we didn't find this in the database  key 
	// =>  So reply  abort_reply  To the client  
	if ((flags & OBJ_SET_NX && lookupKeyWrite(c->db,key) != NULL) || 
		(flags & OBJ_SET_XX && lookupKeyWrite(c->db,key) == NULL)) 
	{
   
    
		addReply(c, abort_reply ? abort_reply : shared.null[c->resp]); 
		return; 
	}
	
	//  Set the key to... In the current database  key  The value is  value  The data of  
	genericSetKey(c->db,key,val,flags & OBJ_SET_KEEPTTL); 
	//  Every time the server changes one  key  It will be revised after  dirty  value  
	server.dirty++; 
	if (expire) setExpire(c,c->db,key,mstime()+milliseconds); 
	notifyKeyspaceEvent(NOTIFY_STRING,"set",key,c->db->id); 
	if (expire) notifyKeyspaceEvent(NOTIFY_GENERIC, 
		"expire",key,c->db->id); 
	addReply(c, ok_reply ? ok_reply : shared.ok); 
}

As described above , In fact, in the previous version of Redis in , because SETNX and EXPIRE Not at all Atomic directive , So there's a problem with execution together .

Maybe you'll think of using Redis Business to solve , But not here , because EXPIRE Command depends on SETNX The results of the implementation of , And there is no such thing as if-else The branch logic of , If SETNX No lock ,EXPIRE It should not be carried out .

In order to solve this difficult problem ,Redis There are many distributed locks in the open source community library, In order to control the chaos , Later on Redis 2.8 In the version of the , Joined the SET Extension parameters of the instruction , bring SETNX You can talk to EXPIRE The instructions were executed together :

> SET lock:test true ex 5 nx 
OK
... do something critical ... 
> del lock:test

You just need to match SET key value [EX seconds | PX milliseconds] [NX | XX] [KEEPTTL] That's the format .

in addition , Official documents are also in SETNX This idea is mentioned in the document : hold SETNX Corresponding key Of value Set to <current Unix time + lock timeout + 1>, In this way, when other clients access, they can judge whether they can get the next value Lock for the above format .

1) Code implementation

The following is used Jedis To simulate the following , The key codes are as follows :

private static final String LOCK_SUCCESS = "OK"; 
private static final Long RELEASE_SUCCESS = 1L; 
private static final String SET_IF_NOT_EXIST = "NX"; 
private static final String SET_WITH_EXPIRE_TIME = "PX"; 

@Override 
public String acquire() {
   
    
	try {
   
   
		//  The timeout for obtaining the lock , After this time, the lock is abandoned  
		long end = System.currentTimeMillis() + acquireTimeout; 
		//  Randomly generate one  value 
		String requireToken = UUID.randomUUID().toString(); 
		while (System.currentTimeMillis() < end) {
   
    
			String result = jedis 
				.set(lockKey, requireToken, SET_IF_NOT_EXIST, 
SET_WITH_EXPIRE_TIME, expireTime); 
			if (LOCK_SUCCESS.equals(result)) {
   
    
				return requireToken; 
			}
			try {
   
   
				Thread.sleep(100); 
			} catch (InterruptedException e) {
   
    
				Thread.currentThread().interrupt(); 
			} 
		} 
	} catch (Exception e) {
   
    
		log.error("acquire lock due to error", e); 
	}
	
	return null; 
}

@Override 
public boolean release(String identify) {
   
    
	if (identify == null) {
   
    
		return false; 
	}
	
	String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return 
redis.call('del', KEYS[1]) else return 0 end"; 
	Object result = new Object(); 
	try {
   
   
		result = jedis.eval(script, Collections.singletonList(lockKey), Collections.singletonList(identify)); 
		if (RELEASE_SUCCESS.equals(result)) {
   
    
			log.info("release lock success, requestToken:{}", identify); 
			return true; 
		} 
	} catch (Exception e) {
   
    
		log.error("release lock due to error", e); 
	} finally {
   
    
		if (jedis != null) {
   
    
			jedis.close(); 
		} 
	}
	
	log.info("release lock failed, requestToken:{}, result:{}", identify, result); 
	return false; 
}

episode :
More Ali 、 tencent 、 Meituan 、 Jingdong and other first-line Internet companies Java The real question of the interview ; contain : Basics 、 Concurrent 、 lock 、JVM、 Design patterns 、 data structure 、 Reflection /IO、 database 、Redis、Spring、 Message queue 、 Distributed 、Zookeeper、Dubbo、Mybatis、Maven、 Facial Scripture, etc .
more Java Programmer technology advanced tips ; For example, efficient learning ( How to learn and read code 、 In the face of boring and large amount of knowledge ) Communicate effectively ( Communication methods and skills 、 Communication technology )
more Java Some career sharing documents shared by Daniel



Please click on Add here 》》》》》》》》》 community , Free access


Better opponents than you are learning , Your enemy is sharpening his knife , Your best friend is losing weight , Old Wang next door is practicing waist , We must keep learning , Otherwise, we will be surpassed by learners !
young , Work hard , Give future self an account !

Two 、Redlock Distributed lock

1. What is? RedLock

Redis This article on the official station puts forward a kind of authority based on Redis The way to implement distributed locks is called Redlock, This method is more secure than the original single node method . It guarantees the following characteristics :

  1. Safety features : Exclusive access , That is, there will always be only one client Can get the lock
  2. Avoid deadlock : Final client You can get the lock , There will be no deadlock , Even if a resource is locked in client crash Or network partition
  3. Fault tolerance : As long as most Redis If the node survives, it can provide services normally

2. How to implement distributed locks on a single node

SET resource_name my_random_value NX PX 30000

Mainly relying on the above order , This command only if Key When there is no (NX Guarantee )set value , And set expiration time 3000ms (PX Guarantee ), value my_random_value It has to be all client And all lock requests that are unique during the occurrence of the lock request , The logic to release the lock is :

if redis.call("get",KEYS[1]) == ARGV[1] then 
	return redis.call("del",KEYS[1]) 
else
	return 0 
end

The above implementation avoids releasing another client Created lock , If only del If you order , So if client1 Get lock1 After that, some operations were blocked for a long time , here Redis End lock1 Has expired and has been reassigned to client2, that client1 At this point to release the lock will cause client2 The lock originally obtained was client1 Released for no reason , But now for each client Allocate one unique Of string Value can avoid this problem . As for how to generate this unique string, There are many ways to choose one at will .

3.Redlock Algorithm

The algorithm is easy to understand , rise 5 individual master node , Distributed in different computer rooms to ensure availability as much as possible . To get the lock ,client Will do the following :

  1. Get the current time , Microsecond unit
  2. Try to be in order 5 Apply for lock on instance , Of course, you need to use the same key and random value, Here is a client We need to set up and master Node communication timeout size , Avoid long hours with a fail It's a waste of time
  3. When client At greater than or equal to 3 individual master When the lock is successfully applied on , And it calculates how much time it takes to apply for a lock , The time consumed in this part is obtained by subtracting the time stamp obtained in the first step from the current time when the lock is obtained , If the duration of the lock (lockvalidity time) If it's more than time goes by , Then the lock is really acquired .
  4. If the lock is applied , So the lock really lock validity time Should be origin(lock validity time) - The time passed during the lock application period
  5. If client Failed to apply for lock , Then it will be successful in a small number of applications master The lock release operation is performed on the node , Reset state

4. Failure to retry

If one client Failed to apply for lock , Then it needs to wait a little while and try again to avoid multiple client At the same time, apply for lock , At best, there is a client It needs to be done almost at the same time 5 individual master Initiate lock application . The other is if client Failed to apply for a lock. It needs to apply for the lock as soon as possible master On the implementation unlock operation , It's convenient for other client Get this lock , Avoid the waste of time caused by the expiration of these locks , Of course, if the network partition makes client Can't get in touch with these master, So this waste is the price that has to be paid .

5. Release lock

The lock release operation is very simple , Just release the locks on all nodes in turn

6. performance 、 Crash recovery and fsync

If our nodes don't have persistence ,client from 5 individual master Medium 3 I got a lock at one place , And then one of them rebooted , This is attention In the whole environment 3 individual master For another client Apply for the same lock ! Violation of mutex . If we turn it on AOF Persist, and things will get a little better , because Redis The expiration mechanism is implemented at the semantic level , So in server Time is still passing when I hang up , After restart, the lock state will not be contaminated . But after considering the power failure ,AOF Some commands were lost before they could be flashed back to disk , Unless we configure the brush back policy to fsnyc = always, But it can damage performance . The solution to this problem is , When a node restarts , We stipulate that max TTL In the meantime, it's not available , In this way, it will not interfere with the lock that has been applied , Wait until it crash The previous part of the lock has expired , There is no history lock in the environment , Then add this node to work properly .

3、 ... and 、 How to make reliable distributed locks ,Redlock Is it really possible

If you're just for performance , It doesn't have to be Redlock, It's complex and costly , You only use one Redis Enough examples , At most, add a slave to prevent the master from hanging up . Of course , You use single node Redis So power off or in some cases , You'll lose the lock , But your goal is to speed up performance and power outages, which don't happen very often , It's not a big problem . And if you use a single node Redis, So obviously, the lock granularity you need for this application is very fuzzy and rough , It's not going to be an important service .

So whether Redlock It's appropriate for a scene that requires correctness ?Martin Several scenarios are listed to prove Redlock This algorithm is unreliable .

1. Protect resources with locks

In this section Martin First the Redlock It's about how distributed locks work in general . In a distributed environment , Lock ratio mutex This kind of complexity , Because it involves different nodes 、 Network communication and they can be asymptomatic at any time fail .Martin Suppose a scene , One client To modify a file , It applied for the lock first , Then modify the file and write back , Release lock . the other one client Re apply for … The code flow is as follows :

// THIS CODE IS BROKEN 
function writeData(filename, data) {
   
       
	var lock = lockService.acquireLock(filename); 
	if (!lock) {
   
       
		throw 'Failed to acquire lock'; 
	}
	
	try {
   
      
		var file = storage.readFile(filename); 
		var updated = updateContents(file, data); 
		storage.writeFile(filename, updated); 
	} finally {
   
       
		lock.release(); 
	} 
}

Unfortunately, even if your lock service is perfect , The above code may still kneel , The flow chart below will show you why :

2. Use Fencing( fence ) Make the lock safe

The way to fix the problem is also very simple : You need to add a fencing token. In this scene ,fencing token It can be an increasing number (lock service It can be done ), Every time client The application lock is incremented once :

client1 Apply for the lock and get it at the same time token33, Then it goes into a long pause, and the lock has expired .client2 Get the lock and token34 Write data , Then client1 Try to write data when you're alive , Oneself token33 Than 34 Small, so the write operation was rejected . Note that this requires the storage layer to check token, But it's not hard to achieve . If you use Zookeeper As lock service Then you can use zxid As an incremental number .

But for Redlock You need to know , Nothing is generated fencing token The way , And how to modify Redlock The algorithm enables it to produce fencing token Well ? It doesn't seem so obvious . Because of the production of token It needs to be monotonically increasing , Except on a single node Redis But it's not very reliable , It seems that you need to introduce consistency protocols to make Redlock Produce reliable fencing token.

3. Use time to resolve consistency

Redlock Can't produce fencing token It should have been a reason to abandon it in the context of requirement correctness , But there are still some things to discuss .

There's a saying in academia , The algorithm makes no assumptions about time : Because the process may pause A span 、 Packets may arrive late due to network delay 、 The clock may be wrong at all . And reliable algorithms still have to do the right thing under the above assumptions .

about failure detector Come on ,timeout It can only be used to guess a node fail The basis of , Because of network delay 、 Local clock is not correct and other reasons . in consideration of Redis Use gettimeofday, Instead of a monotonous clock , Will be affected by the system time , It may suddenly move forward or backward for a period of time , This will lead to a key Expire faster or slower .

so ,Redlock Depending on a lot of time assumptions , It assumes that all Redis All nodes can be connected to the same Key Hold it for about the same time before it expires 、 Compared with the expiration time, the network latency is very small 、 Compared to the expiration time, the process pause Very short .

4. Break with unreliable time Redlock

This section Martin For example, it's a matter of time ,Redlock Unreliable examples .

  1. client1 from ABC Three nodes apply for locks ,DE The request did not arrive due to network reasons
  2. C The clock of the node is pushed forward , Lead to lock Be overdue ’
  3. client2 stay CDE Got a lock at ,AB The request did not arrive due to network reasons
  4. here client1 and client2 All got locks

stay Redlock This is also mentioned in the official documents , But is C When it breaks down ,Redlock The authorities themselves know Redlock The algorithm is not completely reliable , In order to solve this problem, the government proposes to use delayed start . however Martin The analysis here is more comprehensive , It is pointed out that the delay start also depends on the correctness of the clock ?

Next Martin And the process Pause Not when the clock is unreliable :

  1. client1 from ABCDE Got a lock at
  2. When you get the lock response It hasn't arrived yet client1 when client1 Get into GC pause
  3. The lock has expired during the pause
  4. client2 stay ABCDE Got a lock at
  5. client1 GC Finish receiving the lock response, Now two client Got the same lock again

At the same time, long network delay may lead to the same problem .

5.Redlock The synchronicity hypothesis of

These examples illustrate , Only if you assume a synchronous system model ,Redlock To work properly , That is, the system can satisfy the following properties :

  1. Network delay boundary , That is to say, the packet must arrive within a certain maximum delay
  2. Process pause boundary , That is, the process must pause within a certain maximum time
  3. Clock error boundary , Not from a bad one NTP Get the time at the server

6. Conclusion

Martin Think Redlock Not really a good choice , It is too heavy and expensive for distributed lock applications that require performance ; It's not safe enough for applications that require correctness . Because it makes unreliable assumptions about high-risk clocks or other situations listed above , If your application only needs high-performance distributed locks, it doesn't require much correctness , So single node Redis Enough is enough ; If your app wants to keep it right , So don't suggest Redlock, It is recommended to use an appropriate consistency and coordination system , for example Zookeeper, And guarantee the existence of fencing token.


episode :
More Ali 、 tencent 、 Meituan 、 Jingdong and other first-line Internet companies Java The real question of the interview ; contain : Basics 、 Concurrent 、 lock 、JVM、 Design patterns 、 data structure 、 Reflection /IO、 database 、Redis、Spring、 Message queue 、 Distributed 、Zookeeper、Dubbo、Mybatis、Maven、 Facial Scripture, etc .
more Java Programmer technology advanced tips ; For example, efficient learning ( How to learn and read code 、 In the face of boring and large amount of knowledge ) Communicate effectively ( Communication methods and skills 、 Communication technology )
more Java Some career sharing documents shared by Daniel



Please click on Add here 》》》》》》》》》 community , Free access


Better opponents than you are learning , Your enemy is sharpening his knife , Your best friend is losing weight , Old Wang next door is practicing waist , We must keep learning , Otherwise, we will be surpassed by learners !
young , Work hard , Give future self an account !

Four 、 magical HyperLoglog Solve statistical problems

1.HyperLogLog brief introduction

HyperLogLog Is the earliest by Flajolet And colleagues at 2007 A kind of Approximate optimal algorithm for estimating cardinality . But unlike the original paper , It seems that many books include Redis The author calls it a kind of New data structure (new datastruct) ( The algorithm is correct A specific data structure is needed to implement ).

1) About cardinality Statistics

Base Statistics (Cardinality Counting) It is usually used to count the number of elements that are not repeated in a collection .

Think about a scene like this : If you are responsible for developing and maintaining a large website , One day the boss asked the product manager for every page of the website UV( Independent visitor , Each user records only once a day ), Then let you develop the statistics module , How will you achieve it ?

If Statistics PV( Browse volume , The user didn't click once to record ), That's very easy to do , Configure a separate... For each page Redis The counter will do , Take this counter key Suffix with date of day . So every request , Is executed INCRBY Command once , In the end, we can count all the PV Data. .

however UV Different , It's going to be weightless , Multiple access requests of the same user in a day can only be counted once . This requires that every web page request should be accompanied by the user's ID, Whether it's a logged in user or an unregistered user , All need one and only ID To mark .

You may think of one right away A simple solution : That's it Set up a separate... For each page set aggregate To store all users who visited this page that day ID. But like this problem Namely :

  1. Huge storage space : If the website has a large number of visitors , You need to store set The assembly will be very large , If there are more pages … For a de duplication function, you can use the resources directly The boss killed you ;
  2. The statistics are complicated : So much set Set if you want to aggregate statistics , Another complicated thing ;

2) Common methods of cardinality Statistics

For the above needs Base Statistics Things about , Generally speaking, there are two kinds of ratio set Assemble better solutions :

①、 The first one is :B Trees

B The biggest advantage of trees is that they are very efficient in inserting and searching , If you use B The tree stores the data to be counted , It can quickly judge whether the new data exists , And quickly insert elements into B Trees . To calculate the base value , You just have to calculate B Just the number of nodes in the tree .

But will B Tree structure is maintained in memory , Be able to solve the problems of statistics and calculation , however No memory savings .

②、 The second kind :bitmap

bitmap It can be understood as passing through a bit A data structure used to store specific data , every last bit Bits can contain information independently ,bit Is the smallest unit of data storage , So it can save a lot of space , You can also put the whole bit One time data load To memory computing . If you define a big bit Array , In basic statistics Each element corresponds to bit A bit in an array , for example :

bitmap Another obvious advantage is It's easy to combine multiple statistics , It's just a matter of finding differences or differences among multiple results , Can also greatly reduce storage memory . You can do a simple calculation , If you want Statistics 1 Billion Base number of data , About the memory needed 100_000_000/ 8/ 1024/ 1024 ≈ 12 M , If you use 32 bit Of int representative every last Statistical data , About memory 32 * 100_000_000/ 8/ 1024/ 1024 ≈ 381 M

You can see bitmap The memory savings are obvious , But still not enough . To count the base value of an object, you need 12 M , If Statistics 1 Ten thousand objects , You need to get close to 120 G , The big data scenario is still not applicable .

3) Probability algorithm

In fact, we haven't found anything better yet Big data scenario in Calculate accurately Efficient algorithm of cardinality , So without pursuing absolute accuracy , Using probability algorithm is a good solution .

Probability algorithm No direct storage The data set itself , Through certain The probability statistics method estimates the base value , This method can save a lot of memory , At the same time, ensure that the error is controlled within a certain range . At present, the probability algorithms for Radix counting include :

  • Linear Counting(LC): Early Radix estimation algorithms ,LC It's not good at spatial complexity , actually LC The complexity of space and the simplicity in the above bitmap The method is the same ( But there's a constant level reduction ), All are O(Nmax)
  • LogLog Counting(LLC):LogLog Counting Compared with LC More memory saving , The complexity of space is only O(log2(log2(Nmax)))
  • HyperLogLog Counting(HLL):HyperLogLog Counting Is based on LLC Optimization and improvement of , In the same space complexity , Be able to compare LLC The error of cardinality estimation is smaller

among ,HyperLogLog It's amazing , We used the simple calculation above bitmap Storage 1 $ Statistics probably need 12 M Memory , And in the HyperLoglog in , Need less than 1 K Memory can do ! stay Redis Implemented in HyperLoglog It just needs 12 K Memory , stay Standard error 0.81% Under the premise of , Be able to count 264 Data

How does this work ?! Let's get to know !

2.HyperLogLog principle

Let's think about a coin toss game : You keep throwing n The second coin , Then say one of them The maximum number of consecutive heads , Let me guess how many times you've tossed .

It's easy to understand , for example : You said this time At most, it appears continuously 2 Time positive , Then I can know that you don't throw many times this time , therefore I might guess it was 5 Or some smaller number , But if you say this time At most, it appears continuously 20 Time positive , Although I don't think it's possible , But I still know that you spend a lot of time , therefore I said, GUN….

During this period I may ask you to repeat the experiment , And then I get more data and I get a better estimate . Let's put the game in another way

This picture means , We give a series of random integers , Record the maximum length of the low continuous zero K, It's the maxbit, Through this K We can estimate the number of random numbers N.

1) Code experiments

We can simply write code to do an experiment , Let's explore K and N The relationship between :

public class PfTest {
   
          

	static class BitKeeper {
   
          
	
		private int maxbit; 
		
		public void random() {
   
          
			long value = ThreadLocalRandom.current().nextLong(2L << 32); 
			int bit = lowZeros(value); 
			if (bit > this.maxbit) {
   
          
				this.maxbit = bit; 
			} 
		}
		
		private int lowZeros(long value) {
   
          
			int i = 0; 
			for (; i < 32; i++) {
   
          
				if (value >> i << i != value) {
   
          
					break; 
				} 
			}
			return i - 1; 
		} 
	}
	static class Experiment {
   
          
	
		private int n; 
		private BitKeeper keeper; 
		
		public Experiment(int n) {
   
          
			this.n = n; 
			this.keeper = new BitKeeper(); 
		}
		
		public void work() {
   
          
			for (int i = 0; i < n; i++) {
   
          
				this.keeper.random(); 
			} 
		}
		
		public void debug() {
   
          
			System.out 
				.printf("%d %.2f %d\n", this.n, Math.log(this.n) / Math.log(2), this.keeper.maxbit); 
			} 
		}
		
		public static void main(String[] args) {
   
          
			for (int i = 1000; i < 100000; i += 100) {
   
          
				Experiment exp = new Experiment(i); 
				exp.work(); 
				exp.debug(); 
			} 
		} 
	}

It's consistent with the process in the figure above , Why is it called PfTest Well , Include Redis The command in also has a PF Prefix , Remember , because HyperLogLog As mentioned above , It's called Philippe Flajolet .

Capture part of the output to view :

//n n/log2 maxbit 
34000 15.05 13 
35000 15.10 13 
36000 15.14 16 
37000 15.18 17 
38000 15.21 14 
39000 15.25 16 
40000 15.29 14 
41000 15.32 16 
42000 15.36 18

Will find K and N There is a significant linear correlation between the logarithm of :N About equal to 2 Of k Power

2) A step closer : Average in barrels


public class PfTest {
   
          
	
	static class BitKeeper {
   
          
		//  unchanged ,  Code ellipsis  
	}
	
	static class Experiment {
   
          
	
		private int n; 
		private int k; 
		private BitKeeper[] keepers; 
		
		public Experiment(int n) {
   
          
			this(n, 1024); 
		}
		
		public Experiment(int n, int k) {
   
          
			this.n = n; 
			this.k = k; 
			this.keepers = new BitKeeper[k]; 
			for (int i = 0; i < k; i++) {
   
          
				this.keepers[i] = new BitKeeper(); 
			} 
		}
		
		public void work() {
   
          
			for (int i = 0; i < this.n; i++) {
   
          
				long m = ThreadLocalRandom.current().nextLong(1L << 32); 
				BitKeeper keeper = keepers[(int) (((m & 0xfff0000) >> 16) % keepers.length)];
				keeper.random(); 
			} 
		}
		
		public double estimate() {
   
          
			double sumbitsInverse = 0.0; 
			for (BitKeeper keeper : keepers) {
   
          
				sumbitsInverse += 1.0 / (float) keeper.maxbit; 
			}
			double avgBits = (float) keepers.length / sumbitsInverse; 
			return Math.pow(2, avgBits) * this.k; 
		} 
	}
	public static void main(String[] args) {
   
          
		for (int i = 100000; i < 1000000; i += 100000) {
   
          
			Experiment exp = new Experiment(i); 
			exp.work(); 
			double est = exp.estimate(); 
			System.out.printf("%d %.2f %.2f\n", i, est, Math.abs(est - i) / i); 
		} 
	} 
}

This process is a little It's similar to the ratings in a talent show , A bunch of professional judges score , But there are some judges who are high because they like it very much , Some of the judges are down again , So in general The highest score for shielding and Lowest score , then Then calculate the average , Such scores are almost fair .

The above code has 1024 individual “ Judges ”, And in calculating the average , Adopted Harmonic mean , That's the average of the reciprocal , It can effectively smooth the effect of outliers :

avg = (3 + 4 + 5 + 104) / 4 = 29 
avg = 4 / (1/3 + 1/4 + 1/5 + 1/104) = 5.044

Watch the output of the script , The percentage error rate is controlled in single digits :

100000 94274.94 0.06 
200000 194092.62 0.03 
300000 277329.92 0.08 
400000 373281.66 0.07 
500000 501551.60 0.00 
600000 596078.40 0.01 
700000 687265.72 0.02 
800000 828778.96 0.04 
900000 944683.53 0.05

Actual HyperLogLog It's a little more complicated than the example code above , It's more accurate . The above algorithm will make a division by zero error when the number of random times is very small , because maxbit = 0 You can't count backwards .

3) Actual HyperLogLog

There's a fantastic website , It allows you to observe... Dynamically HyperLogLog How is the algorithm implemented :http://content.research.neustar.biz/blog/hll.html

Some of these concepts are explained here , You can click step To observe :

  • m It means the number of barrels : As you can see from the diagram , It's divided into 64 A barrel ;
  • Blue bit Indicates the position in the barrel : For example 101110 In fact, it means binary 46 , So this element is counted in the middle big table Register Values The winner of the bid is red 46 In a bucket ;
  • Green bit Represents the first 1 Position of appearance : You can see the green marked bit in , From right to left , The first is 1, So in Register Values The first 46 Write... In barrels 1;
  • Red bit It means green bit The sum of the values of : The next one is on the 46 The element values of buckets will be accumulated ;

①、 Why to count Hash The first of the values 1 Position of appearance ?

②、PF Why is the memory usage of 12 KB?

3.Redis Medium HyperLogLog Realization

We are right from above HyperLogLog We have a certain understanding of the algorithm and thought of , And I know a HyperLogLog The space actually occupied is about 12 KB , but Redis The optimization of memory is abnormal , When The count is relatively small When , Most barrels are counted as zero , This is the time Redis Will save space properly , Convert to another Sparse storage , In contrast , The normal storage mode is called Dense storage , This way, it will occupy 12 KB .

1) Dense storage structure

Dense storage structure is very simple , Namely 16384 individual 6 bit String in succession String bitmap of :

We all know , A byte is made up of 8 individual bit Composed of , such 6 bit The structure of the arrangement leads to , Some buckets will Across byte boundaries , We need to Make proper shift splicing for one or two bytes Then we can get the specific count value .

Suppose the number of the barrel is index , This 6 bity The start byte offset of the count value is offset_bytes Express , It is used to offset the actual bit position of this byte offset_bits Express , So we have :

offset_bytes = (index * 6) / 8 
offset_bits = (index * 6) % 8

The former is business , The latter is the remainder . such as bucket 2 The byte offset of is 1, That is the first. 2 Bytes . Its bit offset is 4, That is the first. 2 The... Of bytes 5 The beginning of a bit is bucket 2 The count of . It should be noted that The byte order is the left low bit order and the right high bit order , And usually the bytes we use are high left and low right .

There are two situations involved here , If offset_bits Less than or equal to 2, Explain this 6 bit Inside a byte , You can directly use the following expression to get the count value val

val = buffer[offset_bytes] >> offset_bits #  Shift right 

If offset_bits Greater than 2, Then it will involve Across byte boundaries , We need to concatenate bits of two bytes :

#  Low value  
low_val = buffer[offset_bytes] >> offset_bits 
#  The number of low order  
low_bits = 8 - offset_bits 
#  Splicing , Keep it low 6 position  
val = (high_val << low_bits | low_val) & 0b111111

But below Redis The source code should be a little obscure , Looking at the form, it seems to only consider the case of crossing the byte boundary . This is because if 6 bit In a single byte , In the above code high_val The value of is zero , So this code can take care of both single byte and double byte :

//  Get the count value of the specified barrel  
#define HLL_DENSE_GET_REGISTER(target,p,regnum) do {
   
          \ 
	uint8_t *_p = (uint8_t*) p; \ 
	unsigned long _byte = regnum*HLL_BITS/8; \
	unsigned long _fb = regnum*HLL_BITS&7; \ # %8 = &7 
	unsigned long _fb8 = 8 - _fb; \ 
	unsigned long b0 = _p[_byte]; \ 
	unsigned long b1 = _p[_byte+1]; \ 
	target = ((b0 >> _fb) | (b1 << _fb8)) & HLL_REGISTER_MAX; \ 
} while(0) 

//  Set the count value of the specified barrel  
#define HLL_DENSE_SET_REGISTER(p,regnum,val) do {
   
          \ 
	uint8_t *_p = (uint8_t*) p; \ 
	unsigned long _byte = regnum*HLL_BITS/8; \ 
	unsigned long _fb = regnum*HLL_BITS&7; \ 
	unsigned long _fb8 = 8 - _fb; \ 
	unsigned long _v = val; \ 
	_p[_byte] &= ~(HLL_REGISTER_MAX << _fb); \ 
	_p[_byte] |= _v << _fb; \ 
	_p[_byte+1] &= ~(HLL_REGISTER_MAX >> _fb8); \ 
	_p[_byte+1] |= _v >> _fb8; \ 
} while(0)

2) Sparse storage structure

Sparse storage is suitable for many cases where the count value is zero . The figure below shows the state of the general sparse storage count values :

When The number of consecutive barrels is zero when ,Redis There are several different forms of expression :

  • 00xxxxxx : Two zeros in the prefix indicate the next 6bit The whole number plus 1 It's the number of zero counters , Pay attention to the addition of 1 Because if the quantity is zero, it doesn't make sense . such as 00010101 Means continuous 22 Zero counter .
  • 01xxxxxx yyyyyyyy :6bit At most, it can only mean continuous 64 Zero counter , This extended 14bit It can mean at most continuous 16384 Zero counter . It means HyperLogLog In the data structure 16384 The initial state of a barrel , All the counters are zero , You can use it directly 2 In bytes .
  • 1vvvvvxx: middle 5bit Indicates the count value , The tail 2bit Means several barrels in a row . It means continuous (xx +1) Every count is (vvvvv + 1). such as 10101011 Means continuous 4 Every count is 11.

Be careful The third way above The maximum value of can only be expressed as 32 , and HyperLogLog For dense storage of single count values 6bit Express , The maximum can be expressed as 63 . When a count value of sparse storage needs to be adjusted to be greater than 32 when ,Redis It's going to immediately switch HyperLogLog Storage structure of , Convert sparse storage to dense storage .

3) Object head

HyperLogLog Except for the need to store 16384 Beyond the count of barrels , It also has some additional fields to store , For example, total count cache 、 Storage type . So it uses an extra object header to represent :

struct hllhdr {
   
          
	char magic[4]; /*  Magic string "HYLL" */ 
	uint8_t encoding; /*  Storage type  HLL_DENSE or HLL_SPARSE. */ 
	uint8_t notused[3]; /*  Keep three bytes for future use  */ 
	uint8_t card[8]; /*  Total count cache  */ 
	uint8_t registers[]; /*  Counters for all barrels  */ 
};

therefore HyperLogLog The whole internal structure is HLL Object head add 16384 Bucket count bitmap . It's in Redis The internal structure of is a string bitmap . You can take HyperLogLog Objects are treated as normal strings

> PFADD codehole python java golang 
(integer) 1 
> GET codehole 
"HYLL\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80C\x03\x84MK\x80P\xb8\x80^\x f3"

however Can not be Use HyperLogLog Order to Manipulate ordinary strings , Because it needs to check whether the magic string of the object header is "HYLL".

4.HyperLogLog Use

HyperLogLog Two instructions are provided PFADD and PFCOUNT, The literal meaning is to increase , The other is to get the count .PFADD and set A collection of SADD The usage is the same , One user ID, Just put the user ID Just tuck it in , PFCOUNT and SCARD The usage of is consistent , Get the count directly :

> PFADD codehole user1 
(interger) 1 
> PFCOUNT codehole 
(integer) 1 
> PFADD codehole user2 
(integer) 1 
> PFCOUNT codehole 
(integer) 2 
> PFADD codehole user3 
(integer) 1 
> PFCOUNT codehole 
(integer) 3 
> PFADD codehole user4 user 5 
(integer) 1 
> PFCOUNT codehole 
(integer) 5

We can use Java Write a script to try HyperLogLog What is the accuracy of :

public class JedisTest {
   
          
	public static void main(String[] args) {
   
          
		for (int i = 0; i < 100000; i++) {
   
          
			jedis.pfadd("codehole", "user" + i); 
		}
		long total = jedis.pfcount("codehole"); 
		System.out.printf("%d %d\n", 100000, total); 
		jedis.close(); 
	} 
}

The output is as follows :

100000 99723

Find out 10 Ten thousand pieces of data are only poor 277 , The percentage error rate is 0.277%, For a huge amount of UV In terms of needs , This error is not high .

Of course , Except for the top PFADD and PFCOUNT outside , A third... Is also provided PFMEGER Instructions , Used to add up several count values to form a new pf value :

> PFADD nosql "Redis" "MongoDB" "Memcached" 
(integer) 1 

> PFADD RDBMS "MySQL" "MSSQL" "PostgreSQL" 
(integer) 1 

> PFMERGE databases nosql RDBMS OK> PFCOUNT databases 
(integer) 6

Reference material :《Java Comprehensive analysis of middle and advanced core knowledge 》 limited 100 Share , Some people have already obtained it from my previous articles !
The quota is limited on a first come, first served basis !!!
Students who want to get this learning material can Click here for free 》》》》》》》

版权声明
本文为[osc_ kurqu050]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224113503071r.html

Scroll to Top