编程知识 cdmana.com

Why does redis design simple strings as SDS?

2021 The first day of construction , Some of my friends wrote to me , He also shared with me a piece of his face redis topic ( This guy is better than he's got his year-end bonus ), I think it's very interesting after reading it , The subject is very simple , It's the typical one who doesn't understand , Problems that are often overlooked by people . Let's sort it out and share it , By the way, consolidate your foundation , I hope it's helpful to the brothers who are interviewing and want to interview .

The topic is like this

interviewer : understand redis Of String The underlying implementation of data structure ?

Iron seed : Of course I do , Is based on SDS Realized

interviewer :redis Yes, it is C Language development , Then why not use it directly C String , It's also designed separately SDS Such a structure ?

Iron seed :·····

“ In fact, I can see that the interviewer wants to see , Tiezi only stays in redis The use level of , Or did you have a more in-depth study of the underlying data structure , Interviewers like to ask like this. We all know .

We know redis Yes, it is C Written , But it's not completely direct use C String , Instead, I rebuilt a new string called simple dynamic string SDS(simple dynamic string) Abstract type of .

redis Also supports the use of C The traditional string of language , It's just used in places where you don't need to modify strings , For example, static character output .

And we use it in development redis, The value of the string is often modified , I'll use it at this time SDS To represent the value of a string . It's worth it Be careful : stay redis In the database ,key-value Key value pairs contain string values , It's all by SDS To achieve .

such as : stay redis Execute one of the simplest set command , At this time redis A new key value pair will be created .

127.0.0.1:6379> set xiaofu " Something inside the programmer "

In this case, the key value is right key and value It's all a string object , The underlying implementations of objects are two that hold strings xiaofu and Something inside the programmer Of SDS structure .

Another example : I press data into a list ,redis A new key value pair will be created .

127.0.0.1:6379> lpush xiaofu " Something inside the programmer " " Programmer Xiaofu "

In this case, the key of the key value pair is the same as above , It's also a result of SDS Implemented string object , The value of a key value pair is a list object containing two string objects , And the bottom layer of these two objects is also made up of SDS Realization .

SDS structure

One SDS Data structure of values , Mainly by lenfreebuf[] These three attributes make up .

struct sdshdr{

  int free; // buf[] The number of unused bytes in the array 

  int len; // buf[] The length of the string held by the array 

  char buf[]; //  An array of strings 
}

among buf[] To actually save the string char Type array ;free Express buf[] The number of unused bytes in the array ;len Express buf[] The length of the string held by the array .

For example, the image above shows buf[] The storage length is 6 A byte string , Number of unused bytes free by 0, But sharp eyed students will find that this is clearly 7 Characters , One more "\0" ah ?

It's mentioned above SDS Not completely direct use C String , Still use some C Characteristic , Like following C The rule that a string ends with a space character , You can also use some of them C String function . And for SDS Come on , A byte occupied by an empty string is not counted in the len In the attribute , He'll be given extra space .

Simple understanding SDS After structure , Let's take a look at SDS Compared with C What are the advantages of strings .

Efficient

for instance : Use in work redis, Often through STRLEN Command to get the length of a string , stay SDS In structure len Property records the length of the string , So we get the length of a string and take len Value , Complexity is O(1).

And if used C character string , When getting the length of a string , Need to traverse the entire string , Until the end of the traversal to the space character (C The space character encountered in represents a complete string ), The complexity at this point is O(N).

Traversing strings frequently in high concurrency scenarios , Getting the length of a string is likely to be redis Performance bottlenecks , therefore SDS Better performance .

Data overflow

Above mentioned C A string does not record its own length , Two adjacent strings may be stored as shown in the figure below , Appropriate memory space is allocated for the string .

If at this point I want to “ Something inside the programmer ” Change to “ Something inside the programmer 123”, The memory that can be allocated before is only 6 Bytes , The modified string needs 9 It's a byte to put it down , How make ?

There's no way but Embezzlement Space between adjacent strings , Self data overflow causes the contents of other strings to be modified .

and SDS It's a good way to avoid this , When we need to modify the data , First check the current SDS Space len Is it satisfactory? , If not, the space will be automatically expanded to the required size , And then make the changes , As shown in the figure below .

But there's a special thing , In the “ Something inside the programmer ” Of 6 Expand bytes to “ Something inside the programmer 123”9 After the bytes , Find out free The value of the property becomes the total length of the expanded string , This involves the following memory reallocation strategy .

Memory reallocation strategy

C The length of the string is fixed , So every time you grow or shorten a string , Do the reallocation of memory , The memory reallocation algorithm is usually a time-consuming operation , It's acceptable if the program doesn't modify the string very often .

But unfortunately ,redis As a database , Data is bound to be modified frequently , If you have to perform a memory reallocation every time you modify it , Then it will seriously affect the performance .

SDS Through two memory reallocation strategies , Good solution to the string in the growth and shortening of the memory allocation problem .

1. Space preallocation

Space pre allocation strategy is used to optimize SDS String growth operation , When you modify a string and you need to SDS When the space of is expanded , Not only for SDS Allocate the space necessary for modification , Also for the SDS Allocate additional unused space free, Check unused space next time free Is it satisfactory? , Satisfaction is not in the expansion space .

Through space pre allocation strategy ,redis Can effectively reduce the string continuous growth operation , The number of memory reallocations generated .

Allocate extra unused space free The rules of :

  • If the SDS After modifying the string ,len Less than 1M, Then allocate extra unused space at this time free The size and len equal .
  • If the SDS After modifying the string ,len A value greater than or equal to 1M, Then allocate extra unused space at this time free The size is 1M.

2. Inert space release

Inert space release strategy is used to optimize SDS String shortening operation , When shortening SDS After the string , No immediate memory reallocation is performed to reclaim the extra space , It's about using free Attribute to record these spaces , If there are subsequent growth operations , You can use .

Data format diversity

C Characters in a string must conform to certain encoding formats , And we also mentioned above ,C String to \0 The end of a null character marks the end of a string , So the string cannot contain \0 Of , Otherwise, it will be mistaken for multiple .

Because of this limitation , bring C Strings can only hold text data , Like audio and video 、 Data in binary format such as pictures cannot be stored .

redis Will operate in a binary way Buf Data in array , So there is no restriction on the data stored in it 、 Filter , Just save what you want , Take it out or something .

summary

It's just redis A little basic knowledge of data structure , It's easy , But with my interview experience , If asked such questions , Don't just say that the bottom is SDS, It's reasonable to explain why it is realized in this way .

On the one hand, I can show that I have solid basic skills , If the expression is clear , It's a great bonus ; In an interview to actively give up the idea of the interviewer to ask , Of course, I'm afraid of people who don't play according to the routine !

This article is from WeChat official account. - High performance server development (easyserverdev)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-03-31

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Fan Li]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/04/20210408111751212K.html

Scroll to Top