Lao Liu is a graduate student about to find a job , Self study big data development , All the way , Deep feeling , The information of big data on the Internet is good and bad , So I want to write a detailed guide to big data development . This guide puts big data into 【 Basic knowledge 】【 Frame analysis 】【 Source code understanding 】 All in their own words , Let partners learn by themselves and never ask for others .
The address of big data development guide is as follows ：
Code cloud ：https://gitee.com/BigDataLiu/BigDataGuide
Your praise is the driving force for my continuous updating , No whoring , If you see it, you will get something , Need to contact the public number ： Hard working old Liu .
What I want to share with you today is the basic part of big data development MySQL The index of , This article must be followed by practice , Otherwise, it's a waste of time ！
What is the index ？
In daily development, we often encounter the situation of slow query , Our first reaction is to index it , What is the index ？ The official introduction index is to help MySQL Data structure for efficient data acquisition , A database index is like a catalogue of a book , It can speed up the data query speed of the database .
What are the advantages of indexing ？
It can improve the efficiency of data retrieval , Reduce the cost of databases .
Sort data by index , Reduce the cost of data sorting , Reduce CPU Consume .
Everything has its pros and cons , Indexes are no exception , What are the disadvantages of indexing ？
The index takes up disk space .
Index can improve query efficiency , But it reduces the efficiency of updating tables .
MySQL It's not just storing data , And save or update the corresponding index file .
Is there any disadvantage of not using index ？
Of course not , The index has to be taken . Generally speaking, the index itself is also very large , It's impossible to store it all in memory , Therefore, the index is often stored in the file on the disk .
Classification of index
Single column index ：
General index ：add unique
unique index ： The value in the index column must be unique , But null values are allowed ,add unique index
Primary key index ： It's a special unique index , Null value is not allowed
Composite index ：
An index created on a combination of fields in a table
The use of composite indexes , You need to follow the leftmost principle
In general , It is recommended to use composite index instead of single column index （ Except for the primary key index ）
Full-text index ： Only in MyIsam、InnoDB To use , And only in char、varchar、text Use full text index on type field .
Spatial index ： Not usually used
Use of index
CREATE INDEX index_name ON table(column(length)) ;
DROP INDEX index_name ON table;
SHOW INDEX FROM table_name \G;
Index principle （ Emphasis ）
Storage structure of index
Before we talk about indexing principles , Let's talk about index storage structure first . Indexing is implemented in the storage engine , Different storage engines , Will use different indexes . among MyIsam and InnoDB Support only B+ Number index , Lao Liu didn't talk about it first B Tree and B+ The concept of trees , Search for it yourself .
Next is the focus of the index , Clear up the nonclustered index and clustered index , The indexing principle is about the same ！
Nonclustered indexes (MyIsam)
It says B+ The tree leaf node only stores Columns ( Data files ) Indicators of , The data and index are not together . It contains primary key index and secondary index , Will store the value of the indicator .
Primary key index
MyIsam in B+ The data stored in the tree leaf node is the index value of the data , Find the corresponding index through the index tree , And then through the record index stored in the index , Find the corresponding record in the data file .
Secondary index ( Secondary index )
stay MyIsam in , There is no structural difference between the primary index and the secondary index , It's just the primary index requirement key Is the only one. , And the auxiliary index key It can be repeated .
Clustered index (InnoDB)
Primary key index ( Clustered index ) The leaf node of will store Columns , That is to say, the data and index are together .
Secondary indexes store only primary key values .
If there is no primary key , A unique index is used to build a clustered index ; If there is no unique index ,MySQL It will build a clustered index according to certain rules .
Primary key index
stay InnoDB Table must have primary key in (MyIsam There can be no ), If the specified is not displayed , Then MySQL The system will automatically select a column that can uniquely identify the data record as the primary key , If there is no such column , Then MySQL Automatically for InnoDB The table generates an implied field as the primary key, and the type is long .
The picture above is InnoDB Main index （ It's also a data file ） The schematic diagram of , You can see that the leaf node contains the complete data record , This kind of index is called clustered index . Because InnoDB The data file itself should be aggregated according to the primary key .
InnoDB The secondary index of data The domain stores the value of the corresponding record primary key instead of the address . In other words ,InnoDB All secondary indexes of refer to the primary key as data Domain .
The implementation of clustered index makes the search by primary key very efficient , But the secondary index search needs to retrieve the index twice ： First retrieve the secondary index to get the primary key , Then use the primary key to retrieve the records from the primary index , That is, back to table query .
select * from user where name='Alice'
According to this paragraph SQL Sentence , Will query back to the table , Search twice , To get the record . The efficiency of return table is relatively low , Try not to look back .
Index usage scenarios
After introducing the related concepts of index , Lao Liu has to tell us which scenes can be used , Just remember a few .
When you need to use indexes
The primary key automatically creates a unique index
Fields that are frequently used as query criteria should be indexed
Multi table associated query , Associated fields should be indexed
The sorted fields in the query should be indexed
Frequently query fields
The statistics or grouping fields in the query should be indexed
In which cases you don't need to index
There are too few records , There's no need to index
Tables that are often added, deleted, or modified
Frequently updated fields
where Fields that are not frequently used in conditions
Why use composite index more ？
In order to save mysql Index storage space and improve search efficiency , If you can use a composite index, you don't use a single column index .
Using a composite index requires the leftmost prefix principle , What is the leftmost principle ？
Prefix index ：where like a%
Multi character % On the right, not on the left , What is prefix index ？ When the index is a long sequence of characters , This index will be slow , Memory usage . If the name For index , When name When the corresponding string is very long , You can use prefix index where like a%.
There are indexes from left to right , Can't break , Until you encounter a range query <,>,between.
When we're slow in data searching , There may be index failures . Don't be afraid in this situation , We can use explain The order is right select Statement execution plan for analysis .explain The information is 10 Column , The difference is
Now Lao Liu uses a case to illustrate these arguments , You can practice with Lao Liu . This 10 Lao Liu only talks about the important ones , Others learn by themselves .
-- User list
create table tuser(
id int primary key,
-- Department list
create table tdep(
id int primary key,
-- Address table
create table taddr(
id int primary key,
-- Build a normal index
mysql> alter table tuser add index idx_dep(dep);
-- Build unique index
mysql> alter table tuser add unique index idx_loginname(loginname);
-- Build a composite index
mysql> alter table tuser add index idx_name_age_sex(name,age,sex);
-- Build a full-text index
mysql> alter table taddr add fulltext ft_addr(addr);
Every SELECT Statements are automatically assigned a unique identifier , Represents the order of operation tables in the query , There are four situations ：
id identical ： The order of execution is from top to bottom
id Different ： If it is a subquery ,id The number will go up ,id The bigger it is , The higher the priority .
id The same difference exists at the same time
id Listed as null That means this is a result set , No need to use it to query .
select_type( important )
Represents a query type , It is mainly used to distinguish ordinary queries 、 Joint query (union、union all)、 Complex queries such as subqueries .
simple, It means that there is no need for union Simple to operate or not to include subqueries select Inquire about .
primary, A need union Operations or with subqueries select, For the outermost unit select_type That is to say primary, And there's only one .
Let's do the... In parentheses first sql Sentence , And then perform the outside sql Sentence , The inner query is subquery.
subquery, except from In addition to the subquery contained in the sentence , Subqueries that appear elsewhere may be subquery.
dependent subquery, It means this subquery The query of is affected by the external table query .
union, It connects the two select Inquire about , The first query is PRIMARY, Except for the first table , The second watch after select_type All are union.
dependent union, It is associated with union The same thing , Appears in union or union all In the sentence , But this query is affected by external queries .
union result, It contains union The result set of , stay union and union all In the sentence , Because it doesn't need to participate in the query , therefore id The field is null.
derived,from Sub query in the sentence , It's also called a derived table , Other databases may be called inline views or nested select.
It can be understood as from A subquery appears after the sentence , Take an alias , It's called a derived table .
Displays the table name of the query , If the query uses aliases , So here's the alias .
type( important )
It shows a lot of argument types , Performance goes from good to bad in this order ：
except all outside , Other type Index can be used , except index_merge outside , Other type Only one index can be used , The optimizer selects the best index , At least the index should be used to range Level . Lao Liu only talks about this important , Some of the contents are not clear .
Meet but not ask , There is only one row or empty table in the table .
const( important )
Use a unique index or primary key , The return record must be 1 The equivalence of row records where Conditions .
eq_ref( important )
Generally, it is the primary key or unique index of the line field .
This type of table usually appears in join Inquire about , For each result in the previous table , Can only match the result of one row in the following table . And the query comparison operation is usually '=', The query efficiency is high .
ref( important )
For non unique indexes , Use equivalence （=） Query non primary key . Or queries that use the leftmost prefix rule index .
range( important )
Index range scan , It is common to use >,<,is null,between ,in ,like And so on .
index( important )
Keywords ： The condition is the number of nodes that appear in the index tree , May not exactly match index .
Index full table scan , Scan the index from beginning to end , It is common to use index columns to handle queries that do not need to read data files 、 You can use index sorting or grouped queries .
all( important )
This is the full table scan data file , And then in server Layer to filter and return the records that meet the requirements .
possible_keys、key、key_len、ref、rows Let's not talk about it , Let's go straight to the last one extra.
This column contains additional information that is not suitable for displaying the list in other columns , This column can display a lot of information , There are dozens of , Here are some common ones .
no tables used
It means not to bring from Query of words and sentences , Use not in() Formal subquery or not exists Operator's connection query , This is called reverse wiring . In general, the connection query is to query the inner table first , Check the appearance again , Reverse connection is to query the appearance first , Then query the inner table .
using filesort( important )
When index cannot be used in sorting , There will be this , Common in order by and group by In the sentence .
using index( important )
There is no need to query back to the table , The query data can be obtained directly through the index .
using where( important )
Usually type Type is all, Not all records meet the query criteria , Usually there are where Conditions , And generally no index or index failure .
After analyzing the arguments of the index , Now Lao Liu talks about some index failures , We must keep it in mind , Lao Liu also wrote it down several times ！
Index failure analysis
commonly SQL Statement query adopts full value matching , It's called full value matching in the data. My favorite .
The left most prefix rule , For composite indexes , The query starts at the top left of the index , And you can't skip the columns in the index, otherwise it will be invalid .
Now let's take a leading index break （ Take the lead in indexing , Other indexes are invalid ） Example ：
Don't do calculations on indexes , For example, calculate 、 Function 、 Automatically / Manual type conversion , Otherwise, it will lead to index failure and turn to full table scanning .
The column to the right of the range condition is invalid , You can't continue to use the range condition in the index （bettween、<、>、in etc. ） The column on the right .
Try to use an overlay index （ Query only indexed columns ）, That is, the index column is consistent with the query column , Reduce select *.
Don't use inequality on index fields , Use... In the index field （！= perhaps < >） When judging , Will lead to index failure and turn to full table scan .
The primary key index field cannot be judged null, Use... In the index field is null When judging , You can use the index .
Index fields use like Start with a wildcard （‘% String ’） When , Will lead to index failure and turn to full table scan .like To end a range query with a wildcard is equivalent to , The index will not fail .
When the index field is a string , Put single quotation marks , Otherwise, it will lead to index failure and turn to full table scanning .
Don't use index fields or, Otherwise, it will lead to index failure and turn to full table scanning .
You must follow Lao Liu in this article , Seeing without practice is learning without practice ！ Although the current level may not be as good as you guys , But Liu will try to be better , Let you learn by yourself and never ask for help ！
If there are related questions , Contact the public number ： Hard working old Liu . The article all saw this , Like, focus on the support wave ！