编程知识 cdmana.com

Deep understanding of MySQL series index

#### Indexes ###### Look at the process of querying a piece of information InnoDB The logical storage structure of :1. Table space : You can view it as InnoDB The highest level of the storage engine's logical structure , All the data is stored in the table space . By default, there is a shared table space ibdata1. If enabled innodb_file_per_table Arguments , Note that the table space memory of each table only contains data 、 Index and insert buffer Bitmap Page , Other categories of information , Such as rollback information 、 Insert buffered index page 、 System transaction information 、 The secondary write buffer is still stored in the original shared table space .2. paragraph : A table space is made up of segments , The common segments are data segments 、 Index segment 、 Rollback segment, etc . The data segment is B+ Tree leaf node (Leaf node segment), The index segment is B+ The tree is not a leaf node (Non-leaf node segment) District 3. District : It's a space made up of consecutive pages , In any case, the size of each zone is 1MB. By default , The size of the storage engine page is 16KB, That is to say, in a district there are continuous 64 A continuous page . To ensure the continuity of the page ,InnoDB The storage engine requests from disk at one time 4-5 District .4. Page : Page ( It can also be called block ), yes InnoDB The smallest unit of disk management . Preset the size of each page 16KB.1.2x You can also use arguments after the version innodb_page_size Set to 4k、8k、16k![](https://img2020.cnblogs.com/blog/2211828/202012/2211828-20201222171110725-1867744008.png) If you look up a piece of information :select * from user where id=5; Here id It's the primary key , We go through this tree B+ Tree to query , First, I'll find the root page , The root page position of each table is fixed in the table space file ; Find the root page through the binary query method , Locate the id=5 The information should be in the indicator P5 Point to the page , So go further page number=5 Query in the page of , The same can be found through the binary query method id=5 The record of :![](https://img2020.cnblogs.com/blog/2211828/202012/2211828-20201224131808085-2136935822.png)###### Calculate a tree B+ How many lines of data can be stored in the tree can also be viewed through commands InnoDB Default for each page 16KB:> show variables like 'innodb_page_size'; First calculate the non leaf nodes , Suppose the primary key ID For bigint Type , The length is 8 Byte , And the size of the indicator is InnoDB The source code is set to 6 Byte , This is a total of 14 Bytes and how many such units can be stored in a page , In fact, it represents the number of indicators , namely 16384/14=1170. Then we can calculate the height of a tree as 2 Of B+ Tree , Can be stored 1170*16=18720 There are records like this . According to the same principle, we can work out a height of 3 Of B+ Trees can store :1170*1170*16=21902400 Records like this . So in InnoDB in B+ The height of the tree is generally 1-3 Layer , It can satisfy tens of millions of data storage .###### Index some concepts 1. Cluster index (clustered index): It's putting the index and the data together , Find the index and find the data ; As shown in the figure below, the leaf node stores all the data in a row .![](https://img2020.cnblogs.com/blog/2211828/202012/2211828-20201224140117685-1934751501.png)2. Secondary index (Secondary Index Or non clustered index ): That is to separate the data from the index , You need to query the index first , Then through the index back to the table to find the corresponding information . Back to the table : First, scan the database index to find the row where the data is located , And then through the row primary key id Take out the data not provided in the index , In other words, a query based on a non primary key index needs to scan one more index tree . Here's the picture , After the secondary index query , It will go back to the cluster index , Finally, I found the information .![](https://img2020.cnblogs.com/blog/2211828/202012/2211828-20201224140048888-833144751.png)> InnoDB There is and only one cluster index , and MyISAM All of them are nonclustered indexes .3. Union index : Refers to indexing multiple columns on a table .> The left most prefix matching principle for federated indexes : A composite index created for multiple fields at the same time ( In order ,ABC,ACB They're two completely different kinds of joint indexes ) With the union index (a,b,c) For example , Establishing such an index is equivalent to establishing an index a、ab、abc Three indexes . In addition, the composite index is actually an index , It's not really building multiple indexes , Only the effect is equivalent to multiple indexes .4. Override index : That is, the query records can be obtained from the auxiliary index , Instead of querying the records in the clustered index . Using overlay benefits :- The secondary index does not contain all the information of the entire row , So its size is much smaller than the cluster index , Reduce a lot of IO operation .- For some statistics ( Such as count(id)) Statistics are not performed by querying clustered indexes , Reduce IO operation 5. unique index : Index generated with unique Columns , Duplicate values are not allowed for this column , But null values are allowed (NULL)6. Index push down :MySQL 5.6 Index push down optimization is introduced , During index traversal , Judge the fields included in the index first , Filter out unqualified records , Reduce the number of return table words .###### Why choose B+ Tree , instead of B Tree B No matter leaf node or non leaf node , Will store data , As a result, the number of indexes that can be stored in the non leaf node becomes less, and a large amount of data will be stored in the case of less indexes , Can only increase the height of the tree , Lead to IO More operations , Query performance becomes low ;###### Why? InnoDB There is only one cluster index , Instead of using clustered indexes for all indexes ? Because clustering index is to store index and data in leaf node , If all indexes are clustered , Each index will store a copy of the data , It will cause data redundancy , In case of large amount of data , This kind of data redundancy is very resource consuming .###### What happens when an index is clearly established , But it didn't pass the index when it was executed ? Query optimizer . One SQL Statement query , There can be different implementations , To decide which option to choose , You need to choose through the optimizer , Choose the solution with the lowest execution cost . The optimization process is as follows :- 1、 According to the search conditions , Find all possible indexes - 2、 Calculate the cost of a full table scan - 3、 Calculate the cost of executing queries using different indexes - 4、 Compare the cost of various implementation schemes , Find the one with the lowest cost .###### The advantages and disadvantages of index are as follows :- 1、 Unique index can ensure the uniqueness of each row of data - 2、 Improve query speed - 3、 Acceleration table to table connection - 4、 Significant reduction in sorting and grouping time - 5、 By using index , In the process of query , Use the optimized hider , Improve the effectiveness of the system . The disadvantages of index are as follows :- When indexing , Need to lock the table , While locking the table , It may affect other data operations - Although the index greatly improves the query speed , At the same time, it will slow down the speed of updating the table , For example, to make a table INSERT、UPDATE and DELETE. Because when updating the table ,MySQL It's not just storing data , Also store the index file .- An index file that uses disk space to create an index . Generally, this problem is not serious , But if you build multiple composite indexes on a large table , And with a large amount of data insertion , Index file size will also expand rapidly .- If a column contains many repetitions , Indexing it doesn't have much practical effect .- For a very small watch , In most cases, a simple full table scan is more efficient .###### The principle of using index : It shouldn't be > 1、 More indexes is not better . Too many indexes , It takes time and space to maintain indexes > 2、 Frequently updated information , It's not good to index .> 3、 There is no need to index a table with a small amount of data . It should be > 1、 Index generation is recommended for columns with low repetition rate . Because there are few duplicate data , Index tree queries are more efficient , The larger the equivalent base, the better .> 2、 The data is unique , It is recommended to generate a unique index . At the database level , Ensure the correctness of the data > 3、 Frequently group by、order by The columns of the proposed index generation . Can greatly improve the efficiency of grouping and sorting > 4、 The fields often used for query criteria suggest building an index . Query by index , Faster indexing scenarios > 1、 Fuzzy search : Left fuzzy or full fuzzy will cause index invalidation , such as '%a' and '%a%'. But right ambiguity can be indexed , such as 'a%' > 2、 Implicit type conversion : such as select * from t where name = xxx , name It's a string type , But without quotation marks , So is the MySQL Implicitly transformed , So it will invalidate the index > 3、 When a sentence contains or When : such as select * from t where name=‘sw’ or age=14> 4、 Left most prefix matching that does not match the federated index :(A,B,C) The joint index of , You just where 了 C or B Or just B,C Other considerations :> 1. The index will not contain null Columns of values , As long as the column contains null Values will not be included in the index .> 2. Use short index . Short index can not only improve query speed, but also save disk space and I/O operation > 3. Index column sort . The query uses only one index , So if where If an index is already used in the clause , So order by Columns in are not indexed . Therefore, do not use the sort operation when the database default sort can meet the requirements ; Try not to include multiple column sort , It is best to index these columns if necessary .> 4. Don't operate on Columns , This will cause the full index to fail > 5. Don't use not in and <> operation , This is not a supported query condition , No index .《MySQL Inside the technology 》https://mp.weixin.qq.com/s/6j64s9W6ogs5Y8BbhhkgnAhttps://mp.weixin.qq.com/s/KB73550tKpNccW-WKxT7-Ahttps://mp.weixin.qq.com/s/ovMx9Dv9NCFxSs

版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201224215001820N.html

Scroll to Top