The process of finding a piece of data
Let's take a look at InnoDB Logical storage structure ：
Table space ： You can view it as InnoDB The highest level of the logical structure of the storage engine , All the data is stored in the table space . There is a shared table space by default ibdata1. If enabled innodb_file_per_table Parameters , It should be noted that only data is stored in the table space of each table 、 Index and insert buffer Bitmap page , Other types of data , Such as rollback information 、 Insert buffered index page 、 System transaction information 、 The secondary write buffer is still stored in the original shared table space .
A table space is made up of segments , Common segments have data segments 、 Index segment 、 Rollback segments, etc . The data segment is B+ Tree leaf node (Leaf node segment), The index segment is B+ Trees are not leaf nodes (Non-leaf node segment)
District ： It's a space made up of consecutive pages , In any case, the size of each zone is 1MB. By default , The size of the storage engine page is 16KB, In other words, there are a total of continuous 64 Consecutive pages . And to ensure the continuity of the page ,InnoDB Storage engine requests from disk once 4-5 Districts .
page ( You can also weigh it ), yes InnoDB Minimum unit of disk management . The default size of each page 16KB.1.2x After the version, you can also use the parameter innodb_page_size Set to 4k、8k、16k
Look up a piece of data ：select * from user where id=5;
here id It's the primary key , We go through this B+ Tree to find , First, I'll find the root page , The root page position of each table is fixed in the table space file ; Find the root page through binary search method , Locate the id=5 The data should be in the pointer P5 Point to page , Then go further page number=5 Search for , It can also be found by dichotomy id=5 The record of ：
Calculate a tree B+ How many rows of data can a tree hold
You can also view it by command InnoDB Each page defaults to 16KB：
show variables like 'innodb_page_size';
First calculate the non leaf nodes , Suppose the primary key ID by bigint type , The length is 8 byte , And the size of the pointer is InnoDB The source code is set to 6 byte , This is a total of 14 byte
And how many such units can be stored in a page , In fact, it means how many hands there are , namely 16384/14=1170.
Then we can calculate the height of a tree as 2 Of B+ Trees , Can be stored 1170*16=18720 Data records like this .
According to the same principle, we can work out a height of 3 Of B+ Trees can store ：1170117016=21902400 Records like this .
So in InnoDB in B+ The height of the tree is generally 1-3 layer , It can satisfy tens of millions of data storage .
Index some concepts
- Cluster index (clustered index): It's about putting indexes and data together , Find the index and find the data ; As shown in the figure below, the leaf node stores all data in a row .
- Secondary index (Secondary Index Or non clustered index ): It's separating the data from the index , When searching, you need to find the index first , Then index back to the table to find the corresponding data .
Back to the table ： First, scan the database index to find the row where the data is located , And then through the row primary key id Take out the data not provided in the index , In other words, the query based on non primary key index needs to scan one more index tree .
Here's the picture , After auxiliary index search , It will go back to the cluster index , Finally, we find the data .
InnoDB There is and only one clustered index , and MyISAM All of them are nonclustered indexes .
- Joint index ： Refers to indexing multiple columns on a table .
The leftmost prefix matching principle of union index ： A composite index for multiple fields at the same time ( In order ,ABC,ACB They're two completely different kinds of joint indexes ) With the union index (a,b,c) For example , Building such an index is equivalent to building an index a、ab、abc Three indexes . In addition, the composite index is actually an index , Multiple indexes are not really created , It's just that the effect is equivalent to generating multiple indexes .
- Overlay index : That is, from the secondary index, you can get the query records , Instead of querying the records in the clustered index .
The benefits of using coverage ：
- The secondary index does not contain all the information of the entire record , So its size is much smaller than the cluster index , A lot less IO operation .
- For some statistics ( Such as count(id)) Statistics are not performed by querying cluster indexes , Reduce IO operation
unique index ： Index generated with unique Columns , Duplicate values are not allowed for this column , But you can have an empty value (NULL)
Index push down ：MySQL 5.6 Index push down optimization is introduced , During index traversal , Judge the fields included in the index first , Filter out unqualified records , Reduce the number of words back to the table .
Why B+ Trees , instead of B Trees
B Whether leaf node or non leaf node , Data will be saved , This results in a smaller number of pointers that can be saved in a non leaf node
Save a large amount of data with few pointers , Can only increase the height of the tree , Lead to IO More operations , Query performance becomes low ;
Why? InnoDB There is only one clustered index , Instead of clustering all indexes ？
Because clustering index is to store index and data in leaf node , If all indexes are clustered , Each index will hold a copy of the data , It will cause data redundancy , In the case of a large amount of data , This kind of data redundancy is very resource consuming .
What happens when an index is explicitly created , But the execution did not pass the index ？
Query optimizer .
One SQL Query of statement , There can be different execution plans , As for the final choice , You need to choose through the optimizer , Choose the scheme with the lowest execution cost .
The optimization process is as follows ：
- 1、 According to the search criteria , Find all possible indexes
- 2、 Calculate the cost of a full table scan
- 3、 Calculate the cost of executing queries using different indexes
- 4、 Compare the cost of various implementation schemes , Find the one with the lowest cost .
Advantages and disadvantages of index
The advantages of indexing are as follows ：
- 1、 A unique index guarantees the uniqueness of each row of data
- 2、 Improve query speed
- 3、 Acceleration table to table connection
- 4、 Significantly reduce the time of grouping and sorting in queries
- 5、 By using index , During the process of query , Using the optimize hide tool , Improve system performance .
The disadvantages of indexing are as follows ：
- When you create an index , You need to lock the table , While locking the table , It may affect other data operations
- Although the index greatly improves the query speed , At the same time, it will reduce the speed of updating the table , Such as on the table INSERT、UPDATE and DELETE. Because when updating tables ,MySQL Not only to save data , And save the index file .
- Index files that take up disk space . Generally, this problem is not serious , But if you create multiple composite indexes on a large table , And with a large amount of data insertion , Index file size will also expand rapidly .
- If a data column contains many duplicate contents , Indexing it doesn't have much practical effect .
- For very small tables , In most cases, a simple full table scan is more efficient .
Considerations when using indexes
Should not be
1、 More indexes is not better . Too many indexes , It takes time and space to maintain the index
2、 Frequently updated data , It's not good to index .
3、 There is no need to index a table with a small amount of data .
1、 For columns with low repetition rate, it is recommended to build an index . Because there's less duplicate data , Index tree queries are more efficient , The larger the equivalent base, the better .
2、 Data is unique , It is recommended to generate a unique index . At the database level , Make sure the data is correct
3、 frequent group by、order by The columns of the proposed index generation . Can greatly improve the efficiency of grouping and sorting
4、 The fields that are often used for query criteria are suggested to generate indexes . Query by index , Faster
Index failure scenario
1、 Fuzzy search ： Left fuzzy or full fuzzy will lead to index invalidation , such as '%a' and '%a%'. But right ambiguity can be indexed , such as 'a%'
2、 Implicit type conversion ： such as select * from t where name = xxx , name It's a string type , But without quotation marks , So is the MySQL Implicitly transformed , So it will invalidate the index
3、 When a sentence contains or When ： such as select * from t where name=‘sw’ or age=14
4、 Left most prefix matching that does not match the federated index ：(A,B,C) Joint index of , You just where 了 C or B Or just B,C
Other matters needing attention ：
- The index will not contain null Columns of values , As long as the column contains null Values will not be included in the index .
- Use short index . Short index can not only improve query speed but also save disk space and I/O operation
- Index column sort . The query uses only one index , So if where If index has been used in clause , that order by The columns in will not use indexes . So the database default sorting can meet the requirements of the case do not use sorting operations ; Try not to include sorting of multiple columns , If you need to create a composite index for these columns .
- Don't operate on Columns , This will cause the index to fail and perform a full table scan
- Don't use not in and <> operation , This is not a supported query condition , No index .