编程知识 cdmana.com

When the interviewer asked me about the redis data types, I answered eight

interviewer : Xiao Ming ,redis There are several data structures ?

Xiao Ming :8 Kind of

interviewer : What are the differences ?

Xiao Ming :raw,int,ht,zipmap,linkedlist,ziplist,intset,skiplist,embstr

interviewer : forehead , What are you talking about ?

Xiao Ming : I'm answering your question , I have studied this problem before , No mistake

interviewer : ok , This is the first interview for today , You go back and wait for the notice

Xiao Ming :...


The conversation that happened above , It's the interviewer who has a problem , Or Xiao Ming has a problem ? In fact, they all have problems , The interviewer's questions are not accurate , Xiao Ming's answer is not accurate either .

But you can see , The level of the interviewer is average , Because I don't know what Xiaoming said when I heard these nouns redis The underlying encoding type , And then missed the opportunity to tap the potential of Xiaoming's technology . And Xiao Ming is also a bit clever , Ignore the knowledge that the interviewer wants to investigate , Take out some of the fur you've seen recently and show it , It turned out to be a misunderstanding .

With the introduction above , Let's talk about ,redis The data structure in the .

redis Source code selection version : 3.0.0

The goal of this article is : know redis The concept of encoding type of , And according to the depth of the source code level to understand why to set different encoding types , But it doesn't expand too much on the details of various underlying data structures

redis Object type and encoding type of

redis Object type of , That's what you often test in an interview redis What are the data types The exact answer to this question is , For those of us who can only interview but not develop , It's so familiar , It's just a string 、 Hash 、 list 、 aggregate 、 Ordered set , This is in redis The exact definition can be found in the source code :

redis.c

/* Object types */
#define REDIS_STRING 0
#define REDIS_LIST 1
#define REDIS_SET 2
#define REDIS_ZSET 3
#define REDIS_HASH 4

Many people are right. redis The understanding of data structure may stop here , But it's just redis The abstract structure of exposure , The underlying implementation depends on its encoding type to determine the data structure corresponding to the encoding type .

If an object type has only one implementation of the underlying data structure , Then the encoding type is completely redundant , In the early redis There is no such concept . But then in order to optimize performance , One object type may correspond to many different encoding implementations , So it's about redis Knowledge points of the underlying data structure , It's starting to get complicated . The encoding type is in redis There are also exact definitions in the source code :

redis.c

/* Objects encoding. Some kind of objects like Strings and Hashes can be
 * internally represented in multiple ways. The 'encoding' field of the object
 * is set to one of this fields for this object. */
#define REDIS_ENCODING_RAW 0     /* Raw representation */
#define REDIS_ENCODING_INT 1     /* Encoded as integer */
#define REDIS_ENCODING_HT 2      /* Encoded as hash table */
#define REDIS_ENCODING_ZIPMAP 3  /* Encoded as zipmap */
#define REDIS_ENCODING_LINKEDLIST 4 /* Encoded as regular linked list */
#define REDIS_ENCODING_ZIPLIST 5 /* Encoded as ziplist */
#define REDIS_ENCODING_INTSET 6  /* Encoded as intset */
#define REDIS_ENCODING_SKIPLIST 7  /* Encoded as skiplist */
#define REDIS_ENCODING_EMBSTR 8  /* Embedded sds string encoding */

We don't have to look for any additional secondary data to explain the role of coding types , Look at the source code in English comments can .

Object code ( The encoding type ): Some object types, such as strings 、 Hash , It can be implemented in many ways , One redis Object's encoding Field can set the following values to represent the underlying encoding type of the object

The same object type , There can be different encoding types as the underlying implementation . And the same encoding type , It can also support multiple object types at the upper level . Their relationship is as follows :

redis Object type and encoding type

Reading this, you must have at least three questions :

  • Why does an object type correspond to multiple encoding types , It's to solve the problem ?
  • redis How to know when to use this encoding type , When to use that encoding type , And the encoding type can be changed at any time ?
  • What are the implementation principles of various coding types ?( This chapter does not focus on , It will introduce some basic ideas throughout the paper , The specific implementation will be explained in other chapters )

Don't worry. , This part is just to let you know ,redis What is exposed to users is only an abstract data structure , It doesn't represent the concrete implementation of its underlying layer . And then I'll take you deeper .

Why does an object type correspond to multiple encoding types

Write redis Daniel is also a programmer , No, he added complexity to his code , It doesn't help performance again ? After all redis This kind of intermediate component must be superior to similar products in terms of performance . you 're right , Just for Performance improvement .

Intuitively feel the different types of coding

First of all, let's intuitively feel the scene that the same object corresponds to different coding types , It's used here object encoding xxx This redis Order to see a certain key Its value The type of encoding used by the object

127.0.0.1:6379> set number 100
OK
127.0.0.1:6379> object encoding number
"int"
127.0.0.1:6379> set number "100"
OK
127.0.0.1:6379> object encoding number
"int"
127.0.0.1:6379> set number abc
OK
127.0.0.1:6379> object encoding number
"embstr"
127.0.0.1:6379> set number aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OK
127.0.0.1:6379> object encoding number
"raw"
127.0.0.1:6379> set number 9999999999999999999999999
OK
127.0.0.1:6379> object encoding number
"embstr"
127.0.0.1:6379> set number 99999999999999999999999999999999999999999999999999999999999999
OK
127.0.0.1:6379> object encoding number
"raw"

We tested it with the string we use most often , I observed that the encoding type was set by me value The value varies , I organized the following table to show the above test results

value The encoding type
100 int
"100" int
abc embstr
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa raw
9999999999999999999999999 embstr
99999999999999999999999999999999999999999 raw

Of course , I know the condition of encoding type of string , These representative values are specially selected for testing , We can sum up a rule

  • Whether it's 100 still "100", The encoding type is int, explain redis When judging whether an object can be represented by an integer encoding type , Just see if the value can be converted into an integer
  • Shorter strings abc Encoded as embstr, Longer strings aaaaaaa..a Encoded as raw, The encoding types of long and short strings are different , From this we can guess redis It may be the short string storage optimization strategy ( Now, of course, it's just a reasonable guess , There may also be some optimization of long strings )
  • Integers 999...9 And longer integers 9999999...9 They are also converted to the corresponding string type , Indicates a value that can be represented as an integer encoded type , There is a certain size limit

redis On the optimization of string encoding type

In the above experiment, we learned that , There are really three encoding types for string objects :int,raw,embstr.

int Type analysis doesn't mean much , If you think about it, you can use integer storage , Try to use integer storage , It must be more space efficient than string . Let's analyze , The encoding types of long and short strings are distinguished , Why is that ?

It's not just string types , Include hash 、 List these object types , They all use a unified structure redisObject To represent the . His structure is as follows :

redis.h

typedef struct redisObject {
    unsigned type:4; //  object type 
    unsigned encoding:4; //  The encoding type 
    void *ptr; //  Pointer to value 
    ...( Omit some unimportant fields )
} robj;

Account for the 4 Bit type Express object type (5 Plant that ), It also takes up 4 Bit encoding Field representation The encoding type (8 Plant that ), Pointer field ptr Representing the actual value of Memory address .

If the encoding type of the object is Integers (encoding=REDIS_ENCODING_INT), So this ptr It's going to point to a long Variable of type .

util.c

if (!string2ll(s,slen,&llval))
    return 0;
...
*lval = (long)llval;
return 1;

object.c

...
o->ptr = (void*) value;

If the encoding type of the object is raw perhaps embstr, So this ptr It's going to point to a sdshdr Structural variables

sds.h

struct sdshdr {
    unsigned int len; //  String length 
    unsigned int free; // buf Free number 
    char buf[]; //  A character array 
};

Since they all point to the same structure , How is that optimized ? Then we have to enter the following two methods to have a specific look

object.c

robj *createStringObject(char *ptr, size_t len) {
    if (len <= 39)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

You see , This code is very clear , String length <=39 when , create embstr Type string object , Otherwise create raw Type string object . So the difference between the two creation methods , It must be hidden in these two methods , Let's go in !

embstr type

robj *createEmbeddedStringObject(char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr)+len+1);
    struct sdshdr *sh = (void*)(o+1);
    o->type = REDIS_STRING;
    o->encoding = REDIS_ENCODING_EMBSTR;
    o->ptr = sh+1;
    ... ( Some assignment operations )
    return o;
}

raw type

robj *createRawStringObject(char *ptr, size_t len) {
    return createObject(REDIS_STRING,sdsnewlen(ptr,len));
}

sds sdsnewlen(const void *init, size_t initlen) {
    ...
    struct sdshdr *sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
    ...
}

robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = REDIS_ENCODING_RAW;
    o->ptr = ptr;
    ...( Some assignment operations )
    return o;
}

For students who read more source code , The difference may be immediately perceived . It's very simple , Namely raw Type this way , by redisObject and sdshdr Structure respectively applied for memory space , and embstr Only applied for memory space once , Then put the two structures next to each other . There's no other difference . The visual diagram is as follows :

BjBxXj.png

BjBwpF.png

See this , It all makes sense , It's simple , It's just the difference between applying for memory . But for those of us who want to package simple things into high-end grandiose scripts , We have to find a way to install it , We concluded that using embstr Coding is compared to raw The benefits of coding :

  • embstr Only applied for memory once , and raw You need to apply twice , Therefore, the memory consumption of one application is saved
  • Release embstr You only need to free memory once , and raw It takes two times , Therefore, it saves the consumption of releasing memory once
  • embstr Of redisObject and sdshdr Put it in a continuous piece of memory , So it's better to use cache Advantages

What about? , Source level understanding , Plus the summary of the interviewer , That's interesting .

Conditions for different coding types

In the last part, we used strings , Different coding types were observed , Also understand why there are different implementations of encoding types . Let's summarize the other objects and encoding types , The principle is not in-depth analysis of the source code , And the basic idea of strings is the same .

The encoding type of the string

  • int:8 A long integer of bytes
  • embstr: Less than or equal to 39 Byte string
  • raw: Greater than 39 Byte string

The encoding type of the hash

  • ziplist: The number of elements is less than 512, And all values are less than 64 byte
  • hashtable: In addition to the above conditions

The encoding type of the list

  • ziplist: The number of elements is less than 512, And all values are less than 64 byte
  • hashtable: In addition to the above conditions

The encoding type of the set

  • intset: The number of elements is less than 512, And all values are integers
  • hashtable: In addition to the above conditions

The encoding type of an ordered set

  • ziplist: The number of elements is less than 128, And all values are less than 64 byte
  • hashtable: In addition to the above conditions

Because there is no explanation , The pure memory thing I think uses the cleanest way to describe to everybody , There's no excess . Details of the specific data structure , I'll use other articles to explain .


here , Xiao Ming, who has undergone some practice , I met a professional interviewer again

Professional interviewers : Xiao Ming ,redis There are several data knots ...

Xiaoming of evolution : The interviewer the interviewer , There are two cases of your question ,redis Object type of , This is what we often call the type of data exposed to the public , Yes 5 Kind of , Namely .... The underlying coding type , stay 3.0.0 Source code has 8 Kind of , Namely ....

Professional interviewers : Who let you answer ?

Xiaoming of evolution :...

Professional interviewers : All right , This is the first interview for today , You go back and wait for the notice

Xiaoming of evolution :...


End

版权声明
本文为[Sun]所创,转载请带上原文链接,感谢

Scroll to Top