编程知识 cdmana.com

Deep analysis on the principle of MySQL index for tens of millions of data

Chapter one :mysql The essence of index

1. The importance of indexing

from mysql Search the database for books , It's like looking for a book from a library , If there are fewer books , Then the search speed will be faster , If there are more books , So the speed of searching will be very slow . At this time, the classification management of books is very important , Book classification management is similar to index ( Or it can be understood as setting a catalog for all books ).

When there are few books
 Insert picture description here

A lot of books
 Insert picture description here

2.mysql Index common interview questions

Question 1 : What is the most common slow query optimization method in database ?
answer : Indexing can optimize query .

Question two : Why adding index can optimize slow query ?
answer : Because index is a data structure for optimizing query , such as mysql The index in is used in B+ Trees achieve . and B+ Tree is a kind of data structure , Index can be used to find data quickly . So the query can be optimized .

Question three : Which data structures can improve query speed ?
answer : Hashtable Completely balanced binary trees B Trees B+ Trees, etc .

Question 4 : Since these data structures can optimize the query speed , that mysql Why choose to use B+ Trees ?
The answer to this question needs to come slowly !!

3.mysql The essence of index

mysql There is such a table in ,user surface , The table structure data is as follows

 Insert picture description here

If there is no index , When you perform the following sql sentence , So the execution speed is relatively slow .
Sql The statement is as follows :select * from user where id = 7

Without index , Why is the query speed slower ??
Because the data is stored on disk , Then execute sql At the time of statement , I will definitely go to the disk to read the relevant data , That would produce disks IO. If there is no index , When you check id=7 When the data is , Then it will go on 7 Secondary disk IO operation , It's time consuming .
 Insert picture description here
 Insert picture description here
 Insert picture description here

 Insert picture description here

When you need to read data from disk , The operating system will pass the data logical address to the disk , The control circuit of the disk translates the logical address into the physical address according to the addressing logic , That is, determine which track the data to be read is on , Which sector? . In order to read the data of this sector , You need to put the head above this sector . To achieve this , The head needs to move to align with the corresponding track . This process is called seeking , The time consumed is called seek time . Disk rotation then rotates the target sector under the head , The time consumed in this process is called rotation time .

Why use indexes ?

Its purpose is to reduce the number of disks IO operation , This can improve the speed of data query .

What is the nature of the index ?

Index is a data structure to help retrieve data quickly .
The function of index is to retrieve data quickly .
The essence of index is a kind of data structure . So you create an index , Actually mysql The database of will help you create a data structure , The data structure is then used to store the data , The data structure is the index .

Chapter two :mysql Underlying principles of indexing

1. Common data structures

Hashtable ( Hash table )、 Binary tree ( Completely balanced binary trees )、B Trees 、B+ Trees, etc. .

2. Hashtable

Hash table is a kind of key - value (key-value) Structure of data storage , We just need to enter the value to be found ( namely key), You can find the corresponding value ( namely Value). The idea of hash is very simple , Put values in an array , Use a hash function to put key Convert to a certain location , And then put value In this position of the array , namely idx = Hash(key). If there is a hash conflict , The zipper method is used to solve the problem .
Because the data stored in the hash table is not orderly , Therefore, it is not suitable for interval query , Applicable to scenarios with only equivalent queries .

If mysql Index using hash table data structure for . Index default primary key column .
 Insert picture description here

The disadvantages of hash tables : Because the data of nodes is easy to overlap , To query data, you need to get real data for comparison , It's going to be less efficient . And it's not suitable for range query .

3. Binary tree

Data structure with binary tree as index .
Binary tree is a relatively balanced ordered binary tree , Insert it , lookup , The operation performance such as deletion is quite good .
characteristic : Its left child has a smaller value than its parent , The value of the right node is larger than that of the parent node .
Test your own example
 Insert picture description here

The advantages of binary trees : You can optimize the disk IO The number of times . Nodes exist in order , You can query the scope .
The disadvantages of binary trees : Data insertion will be slow , Because the data structure will change . The problem of imbalance , Will produce a skewed binary tree .

4. Complete the balanced binary tree

Inserting data will balance , But it changes the structure of the tree when it is inserted . The inserted data is slow . And the level of the tree gets higher , Will add disks IO The number of times .
Binary trees are ordered , Range lookups are supported .
 Insert picture description here

5.B+ Trees

B+ Trees

B Trees or B+ Tree nodes can store multiple data , So, the height of a binary tree relative to a fully balanced binary tree is bound to be low , Then the disk will be reduced IO The number of times .
B+ Trees are relative to B Trees have data redundancy , The data in the leaf node is sequential . So it's very convenient to search in sequence , As long as the leaf nodes are traversed backward .

The third chapter :mysql Analysis of the underlying principle of index

1.mysql How to use B+ Treelike

mysql in Innodb and myisam By default, storage engines use B+ Indexed by tree data structure , The stored data structure is as follows :
 Insert picture description here

Why only leaf nodes store data , The non leaf node does not store data ?

Locality principle : When a data is used , Nearby data may also be used , So in order to improve the efficiency of the operating system , When reading data, it is often not strictly read on demand , It's read in advance every time , Even if only one byte is needed , The operating system will also start from this location , Sequentially read back a certain length of data and put it into memory , The length here is called page . That is, the basic unit of the operating disk of the computer operating system , The size of a page in a normal operating system is 4kb.
stay mysql You can use the following command to view Innodb The default size of the engine page
SHOW GLOBAL STATUS LIKE ‘Innodb_page_size’
Mysql The default page size of is 16kb,B+ Data design is very suitable for reading data .
If the node stores data and index again , There's a lot of data , There are fewer indexes , So the higher the tree is . Causes the disk to IO More , It's less efficient .

2.B+ Number height and number of stored data

A node can store 16KB The data of , This is a mysql The default node size , One page 16KB.
Suppose the size of a row of records is 1kb( It's actually quite big )
So a leaf node can store 16 Data .
Non leaf nodes store index values and pointers ,mysql The default index value size is 8B, The size of the pointer is 6B, Together 14B. Index that non leaf node can store + The number of pointers is :
16*1024/14 = 1170 individual

So if the height of the number is 2 layer , The number of leaf nodes is 1170 individual , The number of data that can be stored is :1170 * 16 = 18720 strip .

If the height of the tree is 3 layer , The number of pieces that each leaf node can store is :1170117016= 21902400 strip

3.myisam Engine storage structure

You can use the command to create a new table structure , Use myisam Engine to create .Myisam Nonclustered index adopted by the index of the engine , Index and table data are stored separately .
 Insert picture description here

user2.frm Files are files that create tables
user2.MYD File is the data file of the table
user2.MYI Files are index files

myisam The index structure of the engine .
Index file
 Insert picture description here

Data files
 Insert picture description here

Myisam When the engine looks up data through the index , Find the corresponding address through the index value , Find data by address .

If you have name Field set index , The effect is the same .

4.Innodb Engine storage structure

Data files
file

user.frm It's the file that creates the table
user.ibd Is the data + Index file
Indexes Innodb The engine is a clustered index , Index and data file together .
 Insert picture description here

If you have name Field set index , That's the secondary index . The data and primary key value of the column stored in the leaf node , Which means that you need to find the data through the primary key again .
 Insert picture description here

Learn more free courses on Architecture :java Architect free course
Every night 20:00 Live sharing advanced java Architecture Technology
Scan to add QQ Communication group 264572737
 Insert picture description here
Enter the group and get a large amount of free java Structure the interview questions
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here

版权声明
本文为[renlianggee]所创,转载请带上原文链接,感谢

Scroll to Top