编程知识 cdmana.com

Large paging query optimized by MySQL

background

Basically, just do background development , Will be exposed to paging this requirement or function . Basically everyone can use MySQL Of LIMIT To deal with it , And the project I'm in charge of is written in the same way . But once the data is measured , Actually LIMIT It's going to be extremely inefficient , This article will talk about LIMIT Clause optimized .

LIMIT Optimize

Many business scenarios need to use paging , Basically, they use LIMIT To achieve .

Create a table and insert 200 Ten thousand data :

#  Create a new one t5 surface 
CREATE TABLE `t5` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `text` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_name` (`name`),
  KEY `ix_test` (`text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

#  Create a stored procedure to insert 200 All the data 
CREATE PROCEDURE t5_insert_200w()
BEGIN
    DECLARE i INT;
    SET i=1000000;
    WHILE i<=3000000 DO
        INSERT INTO t5(`name`,text) VALUES('god-jiang666',concat('text', i));
        SET i=i+1;
    END WHILE;
END;

#  Call the stored procedure to insert 200 All the data 
call t5_insert_200w();

In the case of less page turning ,LIMIT There will be no performance problems .

But if the user needs to find the last page number ?

Usually , We want to ensure that all pages can jump normally , Because I can't use order by xxx desc In this reverse order SQL To check the number of pages that follow , Instead, it uses the positive order to do paging query :

select * from t5 order by text limit 100000, 10;

 Insert picture description here

Adopt this kind of SQL Query paging , from 200 Take this out of ten thousand data 10 The cost of row data is very big , We need to sort it out first 1000010 Bar record , And then abandon the front 1000000 strip . my macbook pro It's going to cost you 5.578 second .

Let's take a look at , This one above SQL Statement execution plan :

explain select * from t5 order by text limit 1000000, 10;

 Insert picture description here

As can be seen from the implementation plan , In the case of large pagination ,MySQL No index scanning , Even if text I have added index to the field .

Why is that ?

go back to MySQL Indexes ( Two ) How to design index It is mentioned in ,MySQL The query optimizer of database is based on cost , And the cost of query is estimated based on CPU cost and IO cost .

If MySQL In query cost estimation , If you think that the whole table scanning method is more efficient than the index scanning method , You're going to give up the index , Direct full table scan .

That's why it's on big pages SQL Querying , Clearly index the field , however MySQL But the reason why we went through the full table scan .

Then we continue to use the above query SQL To test my guess :

explain select * from t5 order by text limit 7774, 10;

 Insert picture description here

explain select * from t5 order by text limit 7775, 10;

 Insert picture description here

All the above experiments are in my mbp Running on , stay 7774 At this critical point ,MySQL Index scan and full table scan are used to optimize the query .

So you can say MySQL It queries the optimizer at its own cost to determine whether to use the index .

because MySQL The core of the algorithm of query optimizer is that we can't intervene manually , So our optimization idea is to start with how to maintain paging at the best paging critical point .

How to optimize

1、 Use overlay index

If one SQL sentence , The results of the query can be obtained directly through the index , No need to return to the table query , This index is called the overlay index .

stay MySQL Use in the database explain Keyword view execution plan , If extra This column shows Using index, It means this one SQL The statement uses an override index .

Let's compare the use of overlay indexes , How much more performance will be improved .

#  No override index used 
select * from t5 order by text limit 1000000, 10;

 Insert picture description here

 Insert picture description here

This inquiry took 3.690 second , Let's see how much performance can be improved with overlay index optimization .

#  Override index used 
select id, `text` from t5 order by text limit 1000000, 10;

 Insert picture description here

 Insert picture description here

From the contrast above , Large paging query , After using the overlay index , It took 0.201 second , Instead of using the overlay index 3.690 second , Improved 18 More than double , This is in actual development , It's a big performance optimization .( The data is in my mbp Run on the top to get )

2、 Sub query optimization

Because in actual development , use SELECT Querying a column or two is very rare , Therefore, the scope of application of the above coverage index is relatively limited .

So we can sort out the page SQL The method of rewriting statements into subqueries can improve the performance .

select * from t5 where id>=(select id from t5 order by text limit 1000000, 1) limit 10;

 Insert picture description here

In fact, in this way , The improved efficiency is basically the same as the coverage index used above .

But there are limitations in this optimization method :

  • This kind of writing , Primary key required ID It has to be continuous
  • Where Clause does not allow additional conditions to be added

3、 Delayed correlation

Similar to the above sub query method , We can use JOIN, Page operation is done on the index column first , Then go back to the table to get the columns you want .

select a.* from t5 a inner join (select id from t5 order by text limit 1000000, 10) b on a.id=b.id;

 Insert picture description here

It can be concluded from the experiment that , When using JOIN After rewriting , Both of the above limitations have been lifted , and SQL And there is no loss in the efficiency of execution .

4、 Record the location at which the last query ended

It's different from the method used above , The optimization idea of recording the last ending position is to use some variable to record the position of the last data , The next page is scanned directly from the location of this variable , To avoid MySQL Scan a lot of data and discard the operation .

select * from t5 where id>=1000000 limit 10;

 Insert picture description here

According to the above experiment , It's not hard to get out , Because the primary key index is used for paging operations ,SQL Is the fastest performance .

summary

  1. This paper introduces the causes of the poor performance of large paging query , And I share some ideas for optimization
  2. The optimization idea of large paging is to make paging SQL Try to perform in the best performance range , Don't trigger a full table scan
  3. Hope to share the above , You can be in MySQL Don't take detours on this road ~~~

Reference material

  • 《MySQL performance optimization 》 Chapter six Query optimization performance
  • 《 The art of database query optimizer 》

版权声明
本文为[osc_3dm2dqof]所创,转载请带上原文链接,感谢

Scroll to Top