编程知识 cdmana.com

Deep understanding of MySQL series index

Indexes

The process of finding a piece of data

Let's take a look at InnoDB Logical storage structure :

  1. Table space : You can view it as InnoDB The highest level of the logical structure of the storage engine , All the data is stored in the table space . There is a shared table space by default ibdata1. If enabled innodb_file_per_table Parameters , It should be noted that only data is stored in the table space of each table 、 Index and insert buffer Bitmap page , Other types of data , Such as rollback information 、 Insert buffered index page 、 System transaction information 、 The secondary write buffer is still stored in the original shared table space .

  2. paragraph :
    A table space is made up of segments , Common segments have data segments 、 Index segment 、 Rollback segments, etc . The data segment is B+ Tree leaf node (Leaf node segment), The index segment is B+ Trees are not leaf nodes (Non-leaf node segment)
    District

  3. District : It's a space made up of consecutive pages , In any case, the size of each zone is 1MB. By default , The size of the storage engine page is 16KB, In other words, there are a total of continuous 64 Consecutive pages . And to ensure the continuity of the page ,InnoDB Storage engine requests from disk once 4-5 Districts .

  4. page :
    page ( You can also weigh it ), yes InnoDB Minimum unit of disk management . The default size of each page 16KB.1.2x After the version, you can also use the parameter innodb_page_size Set to 4k、8k、16k

Look up a piece of data :select * from user where id=5;

here id It's the primary key , We go through this B+ Tree to find , First, I'll find the root page , The root page position of each table is fixed in the table space file ; Find the root page through binary search method , Locate the id=5 The data should be in the pointer P5 Point to page , Then go further page number=5 Search for , It can also be found by dichotomy id=5 The record of :

Calculate a tree B+ How many rows of data can a tree hold

You can also view it by command InnoDB Each page defaults to 16KB:

show variables like 'innodb_page_size';

First calculate the non leaf nodes , Suppose the primary key ID by bigint type , The length is 8 byte , And the size of the pointer is InnoDB The source code is set to 6 byte , This is a total of 14 byte

And how many such units can be stored in a page , In fact, it means how many hands there are , namely 16384/14=1170.

Then we can calculate the height of a tree as 2 Of B+ Trees , Can be stored 1170*16=18720 Data records like this .

According to the same principle, we can work out a height of 3 Of B+ Trees can store :1170117016=21902400 Records like this .

So in InnoDB in B+ The height of the tree is generally 1-3 layer , It can satisfy tens of millions of data storage .

Index some concepts
  1. Cluster index (clustered index): It's about putting indexes and data together , Find the index and find the data ; As shown in the figure below, the leaf node stores all data in a row .

  1. Secondary index (Secondary Index Or non clustered index ): It's separating the data from the index , When searching, you need to find the index first , Then index back to the table to find the corresponding data .
    Back to the table : First, scan the database index to find the row where the data is located , And then through the row primary key id Take out the data not provided in the index , In other words, the query based on non primary key index needs to scan one more index tree .

Here's the picture , After auxiliary index search , It will go back to the cluster index , Finally, we find the data .

InnoDB There is and only one clustered index , and MyISAM All of them are nonclustered indexes .

  1. Joint index : Refers to indexing multiple columns on a table .

The leftmost prefix matching principle of union index : A composite index for multiple fields at the same time ( In order ,ABC,ACB They're two completely different kinds of joint indexes ) With the union index (a,b,c) For example , Building such an index is equivalent to building an index a、ab、abc Three indexes . In addition, the composite index is actually an index , Multiple indexes are not really created , It's just that the effect is equivalent to generating multiple indexes .

  1. Overlay index : That is, from the secondary index, you can get the query records , Instead of querying the records in the clustered index .

The benefits of using coverage :

  • The secondary index does not contain all the information of the entire record , So its size is much smaller than the cluster index , A lot less IO operation .
  • For some statistics ( Such as count(id)) Statistics are not performed by querying cluster indexes , Reduce IO operation
  1. unique index : Index generated with unique Columns , Duplicate values are not allowed for this column , But you can have an empty value (NULL)

  2. Index push down :MySQL 5.6 Index push down optimization is introduced , During index traversal , Judge the fields included in the index first , Filter out unqualified records , Reduce the number of words back to the table .

Why B+ Trees , instead of B Trees

B Whether leaf node or non leaf node , Data will be saved , This results in a smaller number of pointers that can be saved in a non leaf node

Save a large amount of data with few pointers , Can only increase the height of the tree , Lead to IO More operations , Query performance becomes low ;

Why? InnoDB There is only one clustered index , Instead of clustering all indexes ?

Because clustering index is to store index and data in leaf node , If all indexes are clustered , Each index will hold a copy of the data , It will cause data redundancy , In the case of a large amount of data , This kind of data redundancy is very resource consuming .

What happens when an index is explicitly created , But the execution did not pass the index ?

Query optimizer .

One SQL Query of statement , There can be different execution plans , As for the final choice , You need to choose through the optimizer , Choose the scheme with the lowest execution cost .

The optimization process is as follows :

  • 1、 According to the search criteria , Find all possible indexes
  • 2、 Calculate the cost of a full table scan
  • 3、 Calculate the cost of executing queries using different indexes
  • 4、 Compare the cost of various implementation schemes , Find the one with the lowest cost .
Advantages and disadvantages of index

The advantages of indexing are as follows :

  • 1、 A unique index guarantees the uniqueness of each row of data
  • 2、 Improve query speed
  • 3、 Acceleration table to table connection
  • 4、 Significantly reduce the time of grouping and sorting in queries
  • 5、 By using index , During the process of query , Using the optimize hide tool , Improve system performance .

The disadvantages of indexing are as follows :

  • When you create an index , You need to lock the table , While locking the table , It may affect other data operations
  • Although the index greatly improves the query speed , At the same time, it will reduce the speed of updating the table , Such as on the table INSERT、UPDATE and DELETE. Because when updating tables ,MySQL Not only to save data , And save the index file .
  • Index files that take up disk space . Generally, this problem is not serious , But if you create multiple composite indexes on a large table , And with a large amount of data insertion , Index file size will also expand rapidly .
  • If a data column contains many duplicate contents , Indexing it doesn't have much practical effect .
  • For very small tables , In most cases, a simple full table scan is more efficient .
Considerations when using indexes

principle :
Should not be

1、 More indexes is not better . Too many indexes , It takes time and space to maintain the index
2、 Frequently updated data , It's not good to index .
3、 There is no need to index a table with a small amount of data .

should

1、 For columns with low repetition rate, it is recommended to build an index . Because there's less duplicate data , Index tree queries are more efficient , The larger the equivalent base, the better .
2、 Data is unique , It is recommended to generate a unique index . At the database level , Make sure the data is correct
3、 frequent group by、order by The columns of the proposed index generation . Can greatly improve the efficiency of grouping and sorting
4、 The fields that are often used for query criteria are suggested to generate indexes . Query by index , Faster

Index failure scenario

1、 Fuzzy search : Left fuzzy or full fuzzy will lead to index invalidation , such as '%a' and '%a%'. But right ambiguity can be indexed , such as 'a%'
2、 Implicit type conversion : such as select * from t where name = xxx , name It's a string type , But without quotation marks , So is the MySQL Implicitly transformed , So it will invalidate the index
3、 When a sentence contains or When : such as select * from t where name=‘sw’ or age=14
4、 Left most prefix matching that does not match the federated index :(A,B,C) Joint index of , You just where 了 C or B Or just B,C

Other matters needing attention :

  1. The index will not contain null Columns of values , As long as the column contains null Values will not be included in the index .
  2. Use short index . Short index can not only improve query speed but also save disk space and I/O operation
  3. Index column sort . The query uses only one index , So if where If index has been used in clause , that order by The columns in will not use indexes . So the database default sorting can meet the requirements of the case do not use sorting operations ; Try not to include sorting of multiple columns , If you need to create a composite index for these columns .
  4. Don't operate on Columns , This will cause the index to fail and perform a full table scan
  5. Don't use not in and <> operation , This is not a supported query condition , No index .

《MySQL Technology insider 》
https://mp.weixin.qq.com/s/6j64s9W6ogs5Y8BbhhkgnA
https://mp.weixin.qq.com/s/KB73550tKpNccW-WKxT7-A
https://mp.weixin.qq.com/s/ovMx9Dv9NCFxSsFM98uYFw

版权声明
本文为[Cao Zibiao]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224152542969k.html

Scroll to Top