编程知识 cdmana.com

Issue 16: index design (index structure of MySQL)

image

In the last chapter, the database is basically used B+ The reason why trees store indexes : Suitable for disk storage , It can make full use of the characteristics of multi fork balanced tree , Disk read ahead , And good support for equivalence , Range , Sequential scanning, etc . This article mainly introduces MySQL Two common engines ,MyISAM and InnoDB The index organization of , Learn about these ways of storage , Good for database optimization .

MySQL The index of is divided into two categories according to the storage mode :

Clustered index : Also known as Clustered Index. The physical order of relational table records is the same as the logical order of indexes . Because a table can only be stored in one physical order , A table can only have one clustered index at most . Compared to nonclustered indexes , Clustered index has faster retrieval speed .

MySQL Only in Li INNODB Tables support clustered indexes ,INNODB Table data itself is a clustered index , That is to say IOT, Index organization table . Non leaf nodes are stored in primary key order , The leaf node stores the primary key and the corresponding row records . So for INNODB It's very fast to scan the whole table sequentially .

Nonclustered indexes : Also called Secondary Index. It means that non leaf nodes are stored according to the key value order of the index , The leaf node stores the index key value and the corresponding primary key value .MySQL In addition to INNODB Outside the primary key of the table , The others are secondary indexes .MYISAM,memory The table indexes of the engine are all nonclustered indexes . To put it simply , The index is stored separately from the row data . A table can have multiple secondary indexes . 

MYISAM surface :

MYISAM A table is a typical separation of data and index storage , There is no essential difference between a primary key and a secondary index . For example MYISAM Table inside primary key 、 The only index is the same , There is no essential difference .

Hypothesis table t1 by MYISAM engine , As a ID, full name , Gender , Age , Phone number . among ID Primary key , Age is a secondary index . Record the following :

image

The corresponding two B+ The tree index is shown in the following figure ,

Primary key field index tree :

image

The image above is a 3 Step B+ Trees , Non leaf nodes are sorted according to the value of the primary key , Leaf nodes are also sorted according to the value of the primary key , And contains a pointer to the physical data row on the disk . 

Age field index tree :

image

The index tree in the age field above is also a 3 Step B+ Trees , Non leaf nodes are stored in the order of values in the age field , The leaf node holds the value of the age field and a pointer to the physical data row on the disk .

As can be seen from the two pictures above ,MYISAM The biggest disadvantage of the index storage method of the table is that it is not stored in the order of the physical data rows , In this way, both the primary key retrieval and the secondary index retrieval need to be sorted twice . 

Let me give you a simple example ,

following SQL 1 There is no sort by default , Out of order output ; Need to follow ID Sequential output , You have to use it. SQL 2, Explicitly add ORDER BY .

mysql
# SQL 1
mysql> select * from t1;
+-------+----------+--------+------+--------------+
| id    | username | gender | age  | phone_number |
+-------+----------+--------+------+--------------+
| 10001 |  floret      |  Woman      |   18 | 18501877098  |
| 10005 |  petty thief      |  Woman      |   21 | 15827654555  |
| 10006 |  The small white      |  male      |   38 | 19929933000  |
| 10009 |  Xiao He      |  male      |   35 | 19012378676  |
| 10002 |  Xiao Wang      |  male      |   20 | 17760500293  |
| 10003 |  Xiao zhao      |  Woman      |   29 | 13581386000  |
| 10004 |  indigo plant      |  Woman      |   25 | 13456712000  |
| 10007 |  millet      |  male      |   23 | 19800092354  |
| 10008 |  Xiao Xu      |  Woman      |   22 | 18953209331  |
+-------+----------+--------+------+--------------+
9 rows in set (0.00 sec)

# SQL 2
mysql> select * from t1 order by id;
+-------+----------+--------+------+--------------+
| id    | username | gender | age  | phone_number |
+-------+----------+--------+------+--------------+
| 10001 |  floret      |  Woman      |   18 | 18501877098  |
| 10002 |  Xiao Wang      |  male      |   20 | 17760500293  |
| 10003 |  Xiao zhao      |  Woman      |   29 | 13581386000  |
| 10004 |  indigo plant      |  Woman      |   25 | 13456712000  |
| 10005 |  petty thief      |  Woman      |   21 | 15827654555  |
| 10006 |  The small white      |  male      |   38 | 19929933000  |
| 10007 |  millet      |  male      |   23 | 19800092354  |
| 10008 |  Xiao Xu      |  Woman      |   22 | 18953209331  |
| 10009 |  Xiao He      |  male      |   35 | 19012378676  |
+-------+----------+--------+------+--------------+
9 rows in set (0.00 sec)

So let's see INNODB The composition of primary key index and secondary index .

INNODB surface :

INNODB The table itself is an index organized table , In other words, index is data . Here's the chart T1 The data rows are displayed in the form of clustered indexes , The non leaf node holds the value of the primary key , The leaf node holds the value of the primary key and the corresponding data row , And each page has a pointer to the front and back pages .

INNODB A watch is different from MYISAM,INNODB Tables have their own data page management , Default 16KB.MYISAM The management of table data depends on the file system , For example, the file system generally defaults to 4KB,MYISAM The size of the block is also 4KB,MYISAM Table does not have its own crash recovery mechanism , It all depends on the file system .

image

INNODB There are two advantages of this design :

  1. Data is stored in primary key order . The order of the primary keys is the physical order of the record rows , Compared to the storage of pointer to data row , Avoid reordering . We know , Sorting costs the most . Now watch t1 It's based on the primary key ID Sort .
mysql
   mysql> select * from t1;
   +-------+----------+--------+------+--------------+
   | id    | username | gender | age  | phone_number |
   +-------+----------+--------+------+--------------+
   | 10001 |  floret      |  Woman      |   18 | 18501877098  |
   | 10002 |  Xiao Wang      |  male      |   20 | 17760500293  |
   | 10003 |  Xiao zhao      |  Woman      |   29 | 13581386000  |
   | 10004 |  indigo plant      |  Woman      |   25 | 13456712000  |
   | 10005 |  petty thief      |  Woman      |   21 | 15827654555  |
   | 10006 |  The small white      |  male      |   38 | 19929933000  |
   | 10007 |  millet      |  male      |   23 | 19800092354  |
   | 10008 |  Xiao Xu      |  Woman      |   22 | 18953209331  |
   | 10009 |  Xiao He      |  male      |   35 | 19012378676  |
   +-------+----------+--------+------+--------------+
   9 rows in set (0.00 sec)
  1. Two leaf nodes contain pointers to the first and second nodes respectively , In this way, when inserting new lines or splitting pages , Just move the corresponding pointer .

And then look at INNODB Secondary index of table , As shown in the figure below :

image

INNODB The non leaf nodes of the secondary index hold the field values of the index , The index in the figure above is the table t1 Field of age. The leaf node contains the index field value and the corresponding primary key value .

The advantage of this is that when there is data row movement or data page splitting , Unnecessary maintenance of the secondary index . When data needs to be updated , The secondary index does not need to be rebuilt , Just modify the cluster index .

But there are drawbacks :

  1. The secondary index keeps the primary key value at the same time , It's going to get bigger . Especially when the primary key design is unreasonable , For example, use UUID Do primary key . In the next article, I'll explain in detail how to design a reasonable primary key .
  2. The retrieval of the secondary index needs to retrieve the index tree twice . The first time by retrieving the secondary index leaf node , Find the primary key value corresponding to the filter row ; The second time through the value of the primary key to the cluster index to find the corresponding row .

for instance :

as follows SQL sentence , The retrieval age was 23 The line record of :

mysql
select * from t1 where age = 23;

It will be split into the following two SQL sentence :

First through the index field age Find the corresponding primary key of the value :10005.

mysql
select id from t1 where age=23;

Then go to the cluster index according to the primary key ID = 10005 Retrieve the required data row , If the table reads for the first time , You need to go back to the table .

mysql
select * from t1 where id = 10005;

however MySQL It's a good optimization of this , The data was warmed up in advance ( Data preheating , I won't talk about it here , You can refer to MySQL manual , It's very detailed in the manual ).

The content of this article is introduced here , A brief review of this article . This paper mainly introduces MySQL Two common engines MYISAM and INNODB And their advantages and disadvantages . If you have any questions, you are welcome to criticize , In the next article, I'll introduce MySQL How to design the primary key well .


About MySQL Technical content of , What else do you want to know ? Leave a message and tell Xiaobian !

image

版权声明
本文为[Aikesheng cloud database]所创,转载请带上原文链接,感谢

Scroll to Top