编程知识 cdmana.com

Skip table of redis data structure

1、 brief introduction

Let's not talk about Redis, Take a look at the skip watch .

1.1、 Business scenario

The scene comes from Xiao Hui's algorithmic journey , We need to build an auction house system , Used to view and sell props in the game , Similar to the auction houses in world of Warcraft , There are also the following needs :

  1. Auction houses need to support four kinds of ranking of goods , Namely : According to the price 、 By grade 、 Press the remaining time 、 By seller ID Sort , Sort queries as fast as possible .
  2. It also supports precise query of item name and full query without input name .

How to design the data structure needed by such a business scenario ? The list of auction house items is linear , The most easily expressed linear structures are arrays and linked lists . If you use an ordered array , Although you can use dichotomy when searching ( Time complexity O(logN)), But the time complexity of the insertion is O(N), The overall time complexity is O(N); And if you want to use an ordered list , Although the time complexity of insertion is O(1), But the time complexity of the search is O(N), In general, it's still O(N).

Is there a data structure , When looking for , There is the efficiency of dichotomy , It's simple to insert a linked list ? yes , we have , Namely Jump watch .

1.2、skiplist

skiplist, I'm jumping the watch , It's also called a jump watch , It's also a data structure , It is used to solve the search problem in algorithmic problems .

Search in general problems can be divided into two categories , One is based on various balancing techniques , The time complexity is O(logN), One is based on hash table , Time complexity O(1). however skiplist A special , It's not in here

2、 Jump watch

2.1、 Introduction to jump watch

Jump list is also a kind of linked list , It is developed on the basis of linked list , We all know , The insertion and deletion of linked list only need to change the pointer , The time complexity is O(1), But insertion and deletion must be accompanied by a search , And the search needs to start from scratch / Tail traversal , The time complexity is O(N), As shown in the following figure is an ordered list ( The gray on the far left represents an empty head node )( Picture from the Internet , The following are the same as ):

image-20201110214951860

In the list , Each node points to the next node , Want to access the next node , It has to go through the next node , That is, node access cannot be skipped , hypothesis , Now look for 22, We have to look for 3->7->11->19->22, It takes five searches .

But if we can implement skipping some node access , Can improve the efficiency of the search , So make some changes to the linked list , Here's the picture :

image-20201110215429303

Every node of us , Will save the pointer to the next node , So we can skip a node and access , such , We actually construct two linked lists , Half of the original list after the new list .

Let's call the original linked list the first level , The new list is the second level , The second layer is based on the first layer . hypothesis , Now it's time to find 22, Let's look at the second level first , from 7 Start ,7 Less than 22, Further back ,19 Less than 22, Further back ,26 Greater than 22, So from the node 19 Go to the first floor , eureka 22, Look for 7->19->26->22, It only takes four searches .

And so on , If you extract another layer of linked list , Isn't it more efficient to find , Here's the picture :

image-20201110220408220

Now? , There's a third layer of linked list , The third layer is based on the second layer , Suppose you're still looking for 22, Let's start with layer three , from 19 Start ,19 Less than 22, Further back , The discovery is empty , Then go to the second level ,19 hinder 26 Greater than 22, Go to the first floor ,19 The back is 22, Look for 19->26>22, It only takes three times to find .

It can be seen from the above example that , When looking up , Skip multiple nodes , Can greatly improve the search efficiency ,skiplist Based on this principle .

In the example above , It's the next half of each layer , The search process is a bit like dichotomy , The time complexity of the search is O(logN), But there is a fatal flaw in the multi-layer linked list in the example , Once a node is inserted or deleted , The number of nodes in the list is 2:1 Structure , If you want to continue , After inserting or deleting nodes , Make a single readjustment of all the following nodes , thus , Insert / The time complexity of deletion becomes O(N).

2.2、 The relationship between skip table levels

As mentioned above , In order to solve the problem of subsequent nodes readjustment caused by inserting and deleting nodes , The random number of layers is introduced . The number of nodes between adjacent layers is no longer strict 2:1 Structure , Instead, a random number of layers is assigned to each newly inserted node . The following figure shows how to form a skip table step by step :

image-20201110220408220

The number of layers of each node is determined by a random algorithm , Inserting a new node does not affect the number of layers of other nodes , therefore , The insert operation only needs to modify the pointer before and after the insertion node , It avoids the readjustment of subsequent nodes . This is a very important feature of jump table , It is also due to the balance tree that the hop table performance is obvious , Because the balance tree also needs to adjust its balance after losing its balance .

In the last jump table above , We need to find nodes 22, Then the nodes traversed are :7->37->19->22, so , This kind of jump table with random levels may not be found 2:1 The efficiency of the structure , But it solved the problem of insertion / The problem of deleting nodes .

2.3、 The complexity of the skip table

The time complexity of hop table search is average O(logN), The worst O(N), Spatial complexity O(2N), namely O(N)

3、Redis Jump watch in

Understanding Redis Before the jump table of , Let's first recall Redis Ordered set of (sorted set) operation

  • A collection of non repeating but ordered string elements ;
  • Each element is associated with a double Type of score,Redis according to score Sort from small to large ;
  • score Can be repeated , Repeat in order of insertion ;

Examples are as follows :

redis 127.0.0.1:6379> ZADD runoobkey 1 redis
(integer) 1
redis 127.0.0.1:6379> ZADD runoobkey 2 mongodb
(integer) 1
redis 127.0.0.1:6379> ZADD runoobkey 3 mysql
(integer) 1
redis 127.0.0.1:6379> ZADD runoobkey 3 mysql
(integer) 0
redis 127.0.0.1:6379> ZADD runoobkey 4 mysql
(integer) 0
redis 127.0.0.1:6379> ZRANGE runoobkey 0 10 WITHSCORES

"redis"
"1"
"mongodb"
"2"
"mysql"
"4"

This is Redis The basic operation of ordered list in , We can see that , In a sequential list , There is a floating point number as score, When corresponding to a value , According to score Exact search and range search , And it's very efficient

Redis The underlying implementation of this operation is skip table .

It's a jump watch , Let's see Redis It's much easier to skip the watch in , The implementation of skip table is in Redis Source directory redis.h In file

3.1、zskiplistNode

zskiplistNode Represents a node of the hop table , The statement is as follows :

typedef struct zskiplistNode {
    robj *obj;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned int span;
    } level[];
} zskiplistNode;

robj The type is Redis of use C Language implements a set data structure , It can mean string、hash、list、set and zset Five types of data , No details here , In the hop table node , This type of pointer represents the member object of the node

score It means the score , Used for sorting and range lookup

level It's a flexible array , It represents the hierarchy of nodes , Each layer has a forward pointer forward, Used to point to the next node at the same level pointing to the tail of the table , and span It indicates the span of the current node from the next node in the current level , The distance between two nodes .

At first glance , It's easy to think that the span is related to traversing nodes , It's not , Traversal operations only use the forward pointer , Span is used to calculate rankings (rank) Of : In the process of finding a node , The spans of all layers visited along the way add up , It is the ranking of the target node in the hop table .

The following figure , Find members o3, It's only one level , The ranking is 3

image-20201110232105792

stay Redis in , The hierarchy of each node is based on the power law (power law, The bigger the tree, the less likely it will appear ) Randomly generated , It is 1~32 A number between , As level Size of array , Height

The figure below shows the three heights respectively 1、3、5 The nodes of the layer

image-20201110231806856

backward It's a backward pointer , Each node has a , The next node in the heading direction of the current node , Used to traverse from the end of the table

3.2、zskiplist

zskiplist It means a skip watch , The statement is as follows :

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

header and tail The pointer points to the header and tail nodes respectively

length The number of nodes is recorded

level It records the level of the highest level node among all nodes , The height of the header node is not included

The following is an example of a skip table , On the far left is a zskiplist structure , On the right are four zskiplistNode node , From left to right there are 32 layer 、4 layer 、2 layer 、5 layer . The pointer to the right of each node is the forward pointer forward, BW Indicates the back pointer backward, Each node depends on its score score Arrange

image-20201110232246288

版权声明
本文为[sherlock_lin]所创,转载请带上原文链接,感谢

Scroll to Top