编程知识 cdmana.com

SF express: please sign for MySQL soul 10 company

Python The actual combat community

Java The actual combat community

Long press to identify the QR code below , Add as needed

Scan code, pay attention to add customer service

Into the Python community ▲

Scan code, pay attention to add customer service

Into the Java community

The author 丨 sowhat1412

Source sowhat1412(ID:sowhat9094)


Not very aggressive , It's very insulting

1、SQL Statement execution process

MySQL Generally speaking, it can be divided into Server layer and Storage engine layer Two parts .

Server layer :
  • The connector :TCP After shaking hands, the server verifies the identity of the login user ,A After the user creates a connection , The administrator is right A If the user permissions are modified, the created link permissions will not be affected , You have to log in again .

  • The query cache : Storage location of query results ,MySQL8.0 The version has been cancelled since , Because the query cache fails too often , Do more harm than good .

  • analyzer : According to the rules of grammar , Judge the one you typed SQL Does the statement satisfy MySQL grammar .

  • Optimizer : Multiple execution strategies can achieve the goal , The system automatically selects the best for execution .

  • actuator : Judge if there is authority , Submit the final task to the storage engine .

Storage engine layer

Responsible for data storage and extraction . Its architecture model is Plug in Of , Support InnoDBMyISAMMemory Wait for multiple storage engines . Now the most commonly used storage engine is InnoDB, It is from MySQL 5.5.5 Version began to be the default storage engine ( It's the same thing that I often use ).


SQL Execution order


2、BinLog、RedoLog、UndoLog

BinLog

BinLog Record all database table structure changes ( for example create、alter table) And table data modification (insert、update、delete) Binary log , Master slave database synchronization uses BinLog file .BinLog There are three modes of log files .

STATEMENT Pattern

Content :binlog Only those that may cause data changes will be recorded sql sentence

advantage : In this mode , Because there's no actual data recorded , So the amount of logs and IO The consumption is very low , The performance is optimal

Inferiority : But some operations are not certain , such as uuid() Function generates a unique identifier at random , When dependent binlog During playback , The data generated by this operation must be different from the original data , There may be unforeseen consequences at this time .

ROW Pattern

Content : In this mode ,binlog Meeting Record the source data and modified target data for each operation ,StreamSets This mode is required .

advantage : It can be restored with absolute precision , So as to ensure the safety and reliability of the data , And the process of replication and data recovery can be concurrent

Inferiority : The disadvantage is binlog It's going to be very big , meanwhile , There are many records to modify 、 For operations with large field length , Performance consumption can be severe when recording . Reading also requires special instructions to read data .

MIXED Pattern

Content : It's for the above STATEMENT Follow ROW   A mixture of the two modes .

details : For most operations , All use STATEMENT To carry out binlog The record of , Only the following operations are used ROW To achieve : The storage engine of the table is NDB, Used uuid() And so on , Used insert delay sentence , Using a temporary table

Master slave synchronization process

1、 The primary node must have binary logging enabled , Record any events that modify database data .

2、 Start a thread from the node (I/O Thread) Play yourself as mysql The client of , adopt mysql agreement , Request events in the binary log file of the primary node .

3、 The master node starts a thread (dump Thread), Check your binary log for events , Compare with the position requested by the other party , If there is no request location parameter , Then the master node will send the first event in the first log file to the slave node one by one .

4、 The slave node receives the data sent by the master node and places it in the relay log (Relay log) In file . And record the specific location inside the binary log file from the request to the primary node ( There will be multiple binaries in the primary node ).

5、 Start another thread from the node (sql Thread ), hold Relay log The events in are read out , And do it again locally .

mysql The default copy mode is asynchronous Of , And when you copy it, you have Parallel replication capabilities Of . After the master database sends the log to the slave database, it doesn't matter , This will lead to a problem, that is, assuming that the main database is hung , Failed to process from library , At this time, after upgrading from the warehouse to the main library , The journal was lost . Two concepts emerge from this .

  1. Full synchronous replication

Main library write binlog After forced synchronization log to slave database , All slave libraries are executed before returning to the client , But it's clear that this approach will seriously affect performance .

  1. Semi-synchronous replication

The logic of semi synchronous replication is this , When the log is successfully written from the library, it returns ACK Confirm to the master database , When the master database receives the confirmation from at least one slave database, it is considered that the write operation is completed .

It can also be extended to different master-slave configurations 、 The main library big business 、 Too much pressure from the reservoir 、 Network concussion and so on Active standby delay , How to avoid this problem ? When switching between the master and the standby, use Reliability first principle still Usability first principle ? How to judge the main database Crash 了 ? How to avoid the replication of primary and standby loops in the case of mutual primary and standby ? How to restore the deleted database correctly ?(⊙o⊙)… It feels more and more about DBA We're going to work on .

RedoLog

You can go through the following first demo understand :

The bill can be written in Account book It can also be written on Powder board On . If someone has a credit or a payback , There are generally two ways :

1、 Turn over the account book directly , Add up or deduct this credit account .

2、 Write down this account on the powder board first , After closing, turn over the account book for accounting .

Choose the latter when business is busy , Because the former is too much trouble . We have to find this person's total credit information in the dense records , Find out and then take out the abacus to calculate , Finally, write the result back to the account book .

Also in MySQL If every update operation needs to be written to disk , Then the disk also needs to find the corresponding record , Then update , The whole process IO cost 、 Search costs are high . And the whole process of powder board and account book coordination is MySQL What is used is Write-Ahead Logging technology , The key point is Write the log , Write the disk again . At this point, the ledger = BinLog, Powder board = RedoLog.

1、 When the record is updated ,InnoDB The engine will write the record first RedoLog( Powder board ) Inside , And update memory . meanwhile ,InnoDB The engine will update the operation record to the disk when it is idle .

2、 If there are too many updates RedoLog When you can't handle it , You need to put RedoLog Part of the data is written to disk , And then erase RedoLog Part of the data .RedoLog It's like a turntable .

RedoLog Yes write pos Follow checkpoint

write pos : Is the location of the current record , Move back as you write , Write to No 3 Go back to... At the end of file 0 The beginning of file No .

check point: Is the current location to erase , It's also going back and forth , Before erasing a record, update the record to a data file .

write pos and check point Between them are the empty parts of the powder board , It can be used to record new operations . If write pos Catch up checkpoint, It means the powder board is full , No new updates can be performed at this time , You have to stop and erase some records , hold checkpoint Push on .

With redo log,InnoDB It can guarantee that even if the database is restarted abnormally , No records submitted before will be lost , This ability is called crash-safe.redolog Two-phase commit : In order to make binlog Follow redolog The logic between the two logs is consistent . The submission process is as follows :

1 prepare Stage -->  2 Write binlog  --> 3 commit

  1. When in 2 Before the crash , After the restart, we found that there was no commit, Roll back . Backup recovery : No, binlog . Agreement

  2. When in 3 Before the crash , Restart recovery found that although there is no commit, But satisfied prepare and binlog complete , So it's going to be Automatically commit. Backup : Yes binlog. Agreement

binlog Follow redolog difference

  1. redo log yes InnoDB Engine specific ;binlog yes MySQL Of Server Layer , All engines can use .

  2. redo log It's a physical log , What is recorded is what changes have been made on a data page ;binlog It's a logical log , What is recorded is the original logic of this statement , Such as to ID=2 In this line c Field plus 1.

  3. redo log It's written in cycles , The space will be used up ;binlog Can be added to write . Additional writing means binlog When the file is written to a certain size, it will switch to the next , Does not overwrite previous logs .

UndoLog

UndoLog It's usually a logical log , There are two main types :

  1. insert undo log

On behalf of business in insert When new records are made undo log, Only when the transaction is rolled back , And it can be discarded immediately after the transaction is committed

  1. update undo log

The business is going on update or delete When the undo log; Not only is it necessary when the transaction is rolled back , You also need to read the snapshot ; So you can't delete , Only if a fast read or transaction rollback does not involve the log , The corresponding log will be purge Thread unified cleaning

3、MySQL Index in

Common models of indexing are Hashtable Ordered array and Search tree .

Hashtable : A kind of KV Structure of stored data , Only suitable for equivalent query , Not suitable for range query .

Ordered array : Only for static storage engines , When it comes to inserting, it's more troublesome . You can refer to Java Medium ArrayList.

Search tree : Store data according to the binary tree in the data structure , But now it's N Fork tree (B+ Trees ). It is widely used in the storage engine layer .

B+ Tree ratio B Trees advantage lie in :

  1. B+ The non leaf nodes of a tree store only indexes , You can store more .B+ Tree ratio B The trees are fatter ,IO Fewer times .

  2. B+ Tree leaf node management before and after , More convenient range query . At the same time, the results are all in the leaf node , Query efficiency is stable .

  3. B+ It's better to scan the data in the tree , You can avoid B Backtracking scanning of trees .

The advantages of indexing :

1、 A unique index guarantees the uniqueness of each row of data  

2、 Improve query speed  

3、 Acceleration table to table connection  

4、 Significantly reduce the time of grouping and sorting in queries

5、 By using index , During the process of query , Using the optimize hide tool , Improve system performance .

Disadvantages of indexes :

1、 It takes time to create and maintain  

2、 When you create an index , You need to lock the table , While locking the table , It may affect other data operations  

3、 Indexes need disk space for storage , Disk occupancy is also very fast .

4、 When the data in the table is processed CRUD When the , It also triggers index maintenance , It takes time to maintain the index , Data manipulation performance may be reduced

The principles of index design Should not be :

1、 More indexes is not better . Too many indexes , It takes time and space to maintain the index .

2、 Frequently updated data , It's not good to index .

3、 There is no need to index a table with a small amount of data .

should :

1、 For columns with low repetition rate, it is recommended to build an index . Because there's less duplicate data , Index tree queries are more efficient , The larger the equivalent base, the better .

2、 Data is unique , It is recommended to generate a unique index . At the database level , Make sure the data is correct  

3、 frequent group by、order by The columns of the proposed index generation . Can greatly improve the efficiency of grouping and sorting  

4、 The fields that are often used for query criteria are suggested to generate indexes . Query by index , Faster

Index failure scenario

1、 Fuzzy search : Left fuzzy or full fuzzy will lead to index invalidation , such as '%a' and '%a%'. But right ambiguity can be indexed , such as 'a%' .

2、 Implicit type conversion : such as select * from t where name = xxx , name It's a string type , But without quotation marks , So is the MySQL Implicitly transformed , So it will invalidate the index 3、 When a sentence contains or When : such as select * from t where name=‘sw’ or age=14

4、 Left most prefix matching that does not match the federated index :(A,B,C) Joint index of , You just where 了 C or B Or just B,C

Knowledge points about index

primary key : The leaf node of the primary key index stores The whole line Data and information . stay InnoDB in , Primary key indexes are also called clustered indexes (clustered index). The auto increment of primary key is There's no guarantee that it's self increasing , Unique key conflict 、 Transaction rollbacks, etc., may cause discontinuities .

unique index : Index generated with unique Columns , Duplicate values are not allowed for this column , But you can have an empty value (NULL)

General index and unique index query performance :InnoDB Data is read and written in data pages , Default per page 16KB, Therefore, there is little difference between the two indexes in data query performance .

change buffer: The acceleration of common index references in the update process , If the updated field is in the cache , If it is a normal index, it can be updated directly . If it is a unique index, all data needs to be read into memory to ensure that uniqueness is not violated , So try to use a normal index .

Non primary key index : The leaf node content of the non primary key index is Primary key Value . stay InnoDB in , Non primary key indexes are also called secondary indexes (secondary index)

Back to the table : First, scan the database index to find the row where the data is located , And then through the row primary key id Take out the data not provided in the index , In other words, the query based on non primary key index needs to scan one more index tree .

Overlay index : If an index contains ( Or coverage ) The value of all the fields that need to be queried , We call it overlay index .

Joint index : Relative single column index , A composite index is an index constructed by combining multiple columns , One time most joint 16 individual .

Leftmost prefix principle : A composite index for multiple fields at the same time ( In order ,ABC,ACB They're two completely different kinds of joint indexes ) With the union index (a,b,c) For example , Building such an index is equivalent to building an index a、ab、abc Three indexes . In addition, the composite index is actually an index , Multiple indexes are not really created , It's just that the effect is equivalent to generating multiple indexes .

Index push down :MySQL 5.6 Index push down optimization is introduced , During index traversal , Judge the fields included in the index first , Filter out unqualified records , Reduce the number of words back to the table .

Index maintenance :B+ In order to maintain the index order, the tree involves page splitting and page merging . When adding or deleting data, the page space utilization should be considered .

Since the primary key : Generally, an auto increment primary key unrelated to the business will be created , Does not trigger leaf node splitting .

Delayed correlation : Return the required primary key by using an overlay index query , Then according to the primary key Association of the original table to obtain the required data .

InnoDB Storage : * .frm The document is a definition document , That is to define what kind of table the database table is .*.ibd The file is the index of the table , Data storage files , All index trees of the table , All row record data is stored in this file .

MyISAM Storage * .frm The document is a definition document , That is to define what kind of table the database table is .* .MYD File is MyISAM A file that stores all row data of the engine table .* .MYI The file is stored in MyISAM A file that stores the index related data of the engine table .MyISAM Under the engine , Table data and table index data are stored separately .

MyISAM Inquire about : stay MyISAM Next , The primary key index and the secondary key index are both nonclustered indexes . Query whether it is the primary key index , Or non primary key index , In the leaf node, we get the address of the destination data , Also need to go through this address , In order to find the target data in the data file .

PSInnoDB Support clustering index ,MyISAM Clustered index is not supported

4、SQL Transaction isolation level

ACID Four characteristics of

  1. Atomicity (Atomicity): Put multiple operations into one transaction , Make sure that either of these operations are successful , Or it didn't work

  2. Uniformity (Consistency): It is understood as a string of programs that operate on data , It doesn't have a bad impact on the data , For example, it was born out of thin air , Or disappear

  3. Isolation, (Isolation, Also called independence ): Isolation means that multiple transactions do not interfere with each other , Even in the case of concurrent transactions , They're just two concurrent implementations that don't intersect , Things that don't affect each other ; Of course, it's being implemented , It doesn't have to be so completely isolated , It doesn't have to be that way , Sometimes it's allowed to have some interference . therefore MySQL Can support 4 Transaction isolation

  4. persistence (Durability): When an operation is finished , So that's what it turns out to be , And this operation is persisted to the log record

PS:ACID in C And CAP In the theorem C The difference between

ACID Of C When single database transaction operation is emphasized , To ensure the integrity and correctness of data , Data doesn't disappear and increase .CAP In theory C Refers to the read-write consistency of multiple backups of one data

Data problems that may arise from transaction operations

1、 Dirty reading (dirty read):B Transaction change data has not yet been committed ,A Transactions have been seen and used .B If the transaction is rolled back , be A Business is wrong  

2、 It can't be read repeatedly (non-repeatable read): The point of unrepeatable reading is to modify : The same conditions , The data you read , Read it again and find that the value is different , Just lock the records that meet the conditions  

3、 Fantasy reading (phantom read): Business A First, modify the status field of all records in a table to processed , Not submitted ; Business B An unprocessed record is added at this time , And submitted ; Business A Then query the record , But found that there is a record is not processed, resulting in phantom reading phenomenon , Fantasy reading is just A new line inserted . Illusory reading can cause Semantically The problem with Data consistency problem .

4、 In repeatable reading RR Under isolation level , The normal query is Read the snapshot , You will not see the data inserted by other transactions . therefore , Unreal reading is The current reading Next will appear . To solve this problem with clearance lock .

Before we talk about isolation levels , First of all, you need to know , The tighter your isolation , The less efficient . So many times , We all need to find a balance between the two .SQL The standard transaction isolation level from low to high is as follows : In turn, the performance of the system will be reduced from the above mode to the parallel mode , Security is improved in turn .

Read uncommitted : The transaction of data modification by others has not been submitted , I can also read in my business .

Read submitted (Oracle Default ): The data modification has been submitted , I can only read... In my business .

Repeatable (MySQL Default ): The data modification has been submitted , I don't read in my business either , In order to ensure the consistency of repeated reading .

Serial : My business has not yet been committed , Other people don't want to change the data .

Standards and implementation : It's all about business standards , But each database has a different implementation , such as MySQL InnDB The default is RR Level , however There's no magic reading . Because when business A Updated a field of all records , At this point, the transaction A You'll get... For this table Table locks , Because business A Not yet submitted , So the business A The lock obtained was not released , At this point, the transaction B Insert a new record in the table , Because the lock of the table cannot be obtained , The insert operation is blocked . Only business A After committing the transaction , Lock released , Business B In order to carry out the next operation . So you could say   MySQL Of RR The level of isolation has been implemented to solve dirty read , Non repeatable and unreadable .

5、MySQL In the lock

Whether it's Java Both concurrent programming and database operations involve locking , R & D has introduced Pessimistic locking Follow Optimism lock Such a lock design idea .

Pessimistic locking

advantage : It is suitable for concurrent environment with more write and less read , Although it can't maintain very high performance , But under the premise that optimistic lock can't provide better performance , Data security can be achieved

shortcoming : Locking increases system overhead , Although it can guarantee data security , But the data processing throughput is low , It's not suitable for reading and writing

Optimism lock

advantage : In the concurrent scenario of reading more and writing less , It can avoid the cost of database locking , Improve DAO Layer response performance , In many cases ORM Tools have implementations with optimistic locks , So these methods don't have to be implemented artificially .

shortcoming : In the concurrent scenario of writing more and reading less , That is, in the case of fierce competition in writing operations , It can lead to CAS Multiple retries , The frequency of conflict is too high , Results in higher overhead than pessimistic locks .

Realization : The optimistic lock on the database level is actually the same as CAS Similar thinking , through Data version number perhaps Time stamp It can also be realized .

There are three main scenarios for database concurrency :

read - read : There is no problem , There is no need for concurrency control

read - Write : There are isolation issues , May encounter dirty reading , Fantasy reading , It can't be read repeatedly

Write - Write : There may be a problem with missing updates , For example, the first type of update is missing , The second type of update is missing

There are two types of update loss problems :

The first type of update is missing : Business A The transaction rollback of covers the transaction B Submitted results The second type of update is missing : Business A The commit of the has covered the transaction B Submitted results

In order to reasonably implement the idea of lock ,MySQL A variety of locks have been introduced in :



Lock classification

MySQL Supports three levels of locking , Respectively

  1. Table level locking

MySQL Medium locking granularity Maximum A lock of , Most commonly used MYISAM And INNODB Table level locking is supported .

  1. Page level lock

yes MySQL A lock whose granularity is between row level lock and table level lock , The watch lock is fast , But there are many conflicts , Less line level conflict , But the speed is slow. . So take a compromise Page level of , Lock an adjacent set of records at a time .

  1. Row level locking

Mysql Medium locking granularity The smallest A lock of , Indicates that only the row of the current operation is locked . Row level lock can greatly reduce the conflict of database operation . The lock granularity is the smallest , But locking costs the most Row level locks are not necessarily better than table level locks : The finer the granularity of the lock , The higher the price , Compared with table level lock, lock the head of the table directly , The row level lock also scans to find the corresponding row and locks it , In fact, the cost is relatively high , So table locks and row locks have their own advantages .

MyISAM In the lock
  1. although MySQL Support table , page , Line three lock , but MyISAM Storage engine Only table locks are supported . therefore MyISAM The cost of locking is relatively low , But the concurrent performance of data operation is relatively low . But if the writes are tail inserts , That can still support a certain degree of read-write concurrency

  2. from MyISAM It can also be seen in the locks supported ,MyISAM Is a support read read concurrency , But it does not support universal read-write concurrency , Write concurrent database engine , So it's more suitable for reading more and writing less applications , It is less used in general engineering .

InnoDB In the lock

There are too many locks supported in this mode , As follows :

Shared lock and exclusive lock (Shared and Exclusive Locks)

Intent locks (Intention Locks) 

Record locks (Record Locks) 

Clearance lock (Gap Locks) 

Temporary key lock (Next-Key Locks) 

Insert intention lock (Insert Intention Locks) 

Primary key auto increment lock (AUTO-INC Locks) 

Spatial index assertion lock (Predicate Locks for Spatial Indexes)

Take a chestnut , Like in the row lock Shared lock and exclusive lock :lock in share modle Shared read lock :

In order to ensure that the data they find is not being modified by other transactions , That is to say, make sure that the data you find is The latest data , And no one else is allowed to modify the data . But you don't have to be able to modify the data , Because it is possible that other transactions also use this data in share mode In the way of S lock . If not in time commit perhaps rollback It may be Causing a lot of transaction waiting .

for update Let it write the lock :

To make sure that the data you find is up to date , And when the found data can only be modified by yourself , Need to use for update. Equivalent to one update sentence . In case of busy business , If things don't happen in time commit perhaps rollback It may cause other transactions to wait for a long time , This will affect the concurrent efficiency of the database .

Gap Lock Clearance lock :

1、 The row lock can only lock the line , If you insert data in the gap between records, you can't solve it , therefore MySQL The gap lock is introduced (Gap Lock). The clearance lock is Left and right open range . Between the clearance locks No conflict .

2、 Gap lock and row lock are called together NextKeyLock, Every NextKeyLock yes Front open back close interval .

Principle of gap lock and lock ( Forget about it ):

1、 The basic unit of locking is NextKeyLock, It's the front opening and back closing section .

2、 Only objects accessed during the search process will be locked .

3、 Equivalent query on Index , to unique index When it's locked ,NextKeyLock Degenerate to row lock .

4、 Equivalent query on Index , When traversing to the right and the last value does not meet the equivalence condition ,NextKeyLock Degenerate to clearance lock .

5、 The range query on the unique index will access the first value that does not meet the condition .

6、MVCC

MVCC:

1、 Full name Multi-Version Concurrency Control, namely Multi version concurrency control .MVCC It's concurrency controlled idea , Maintaining multiple versions of one data , So that there is no conflict between read and write operations .

2、MVCC stay MySQL InnoDB The main purpose of achieving the goal is to Improve database concurrency performance , Deal with reading in a better way - Write conflict , Even if there is a conflict between reading and writing , Can also do without lock , Non blocking concurrent read .

MySQL InnoDB Under the current read and snapshot read

  1. The current reading

1、 image select lock in share mode( Shared lock )、select for update 、updateinsertdelete( Exclusive lock ) These operations are all kinds of The current reading , It reads the latest version of the record , When reading, it is also necessary to ensure that other concurrent transactions cannot modify the current record , Would be right Lock the read records .

2、 The current reading can be regarded as Pessimistic locking The concrete function realization of

  1. Read the snapshot

1、 Unlocked select That's snapshot reading , Non blocking read without lock ; The premise of snapshot read is that isolation level is not serial level , Snapshot reads at the serial level degrade to current reads ; The reason why snapshot reading occurs , It is based on the consideration of improving concurrent performance , The implementation of snapshot read is based on multi version concurrency control , namely MVCC, It can be said that MVCC It's a variant of a line lock , But it's in many cases , Avoid lock operation , Lower the cost ; Since it's based on multiple versions , That is to say, what the snapshot read may not be the latest version of the data , It could be the previous version of history .

2、 Snapshot reading is MVCC Thoughts in MySQL The realization of non blocking reading function ,MVCC The purpose of reading is to realize reading - Write conflict without lock , Improve concurrent read and write performance , And this reading refers to Read the snapshot .

3、 Snapshot reading is MySQL For us to achieve MVCC One of the specific non blocking read functions of the ideal model .

Because the big guy is not satisfied, only let the database adopt pessimistic lock to solve the problem of reading - Write conflict questions , And he proposed MVCC, So we can form two combinations :

MVCC + Pessimistic locking :MVCC Resolve read-write conflicts , Pessimistic lock solves write conflict

MVCC + Optimism lock :MVCC Resolve read-write conflicts , Optimistic lock solves write conflict

MVCC Implementation principle of

MVCC The implementation principle is mainly dependent on the record Four implicit fields undo journal Consistent Read View To achieve .

Four implicit fields

  1. DB_TRX_ID:

6byte, Recently revised ( modify / Insert ) Business ID: Record create this record / Last time the record was modified Business ID

  1. DB_ROLL_PTR

7byte, rollback pointer , Pointing to this record Last version ( Store in rollback segment in )

  1. DB_ROW_ID

6byte, Implied self increasing ID( Hide primary key ), If the data table has no primary key ,InnoDB Will automatically use DB_ROW_ID Generate a clustered index

  1. FLAG

A delete flag Hide fields , If a record is updated or deleted, it does not mean that it is really deleted , But delete flag Changed

The modification of a record by a transaction , Will cause the record to undo log Become a linear table of record versions ( Linked list ),undo log The head of the chain is the latest old record , The end of the chain is the earliest old record .

undo journal : As mentioned above , Yes MVCC The essence of helping is update undo log,undo log In fact, there is rollback segment Middle old record chain .

Consistent read view Consistent Read View:Read View It is the read view produced when a transaction performs snapshot read operations (Read View), At the moment the snapshot of the transaction is read , Will generate the current database system snapshot , Record and maintain the current active transactions of the system ID(InnoDB Each transaction has a unique transaction ID, called transaction id. It is at the beginning of the transaction to InnoDB Of the transaction system , It is strictly increasing in the order of application ). With this ID Follow the record ID Contrast for selective display , Here under General thinking .

You can Simple understanding by MVCC Two hidden fields are added to each row , The two fields hold the line's Current affairs ID Follow me Delete transaction ID.

  1. insert when :

InnoDB Save the current system version number as the version number for each newly inserted line .

  1. select when :

1、 InnoDB Only rows with a version earlier than the current transaction version will be found ( This is the system version number of the line <= The system version number of the transaction ), This ensures that the rows read by the transaction , Or something that existed before the transaction started , Either the transaction itself has been inserted or modified .

2、 The deleted version of the line is either undefined , Either greater than the current transaction version number , This ensures that the rows read by the transaction are not deleted before the transaction starts .

3、 Only 1,2 At the same time meet the record , Can be returned as a query result .

  1. delete when :

InnoDB The version number of the current system will be saved for each deleted line ( The transaction ID) As a deletion mark .

  1. update when :

InnoDB perform update, It's actually a new line of records inserted , And save the creation time of the current transaction ID, Also save the current transaction ID To get to update Deleted at .

It's just a simple explanation MVCC Choose the standard process , The source code level should be based on Low water level Follow High water level Intercepted . Specific implementation can be self Baidu .

a key

1、 The result of snapshot reading in a transaction is Very dependent on Where the snapshot read occurs for the first time in this transaction , In other words, the place where snapshot reading occurs for the first time in a transaction is critical , It has the ability to determine the results of subsequent snapshot reads of the transaction .

2、 stay RC Under isolation level , It's every snapshot that reads Will generate And get the latest Read View; And in the RR Under isolation level , In the same transaction first Create a snapshot and read it Read View, After that, the snapshot reads are all The same Read View.

7、 Buffer pool (buffer pool)


Hierarchical architecture of application system , To speed up data access , The most frequently accessed data , Put it in the cache (cache) in , Avoid going to the database every time . operating system , There will be buffer pools (buffer pool) Mechanism , Avoid accessing the disk every time , To speed up data access .MySQL As a storage system , It also has buffer pools (buffer pool) Mechanism , In order to avoid every query data disk IO, The main role :

1、 The meaning of existence is to speed up the query  

2、 Buffer pool (buffer pool) It is a common Reduce disk access The mechanism of ;

3、 Buffer pools are usually in pages (page 16K) Cache data for units ;

4、 A common management algorithm for buffer pools is LRU,memcache,OS,InnoDB All use this algorithm ;

5、InnoDB To the ordinary LRU optimized : Divide the buffer pool into Old generation and The new generation , Page into buffer pool , Give priority to the old generation , The page was accessed , To enter the new generation , To solve the pre read failure of the problem page is accessed . And in the old generation Residence time exceeds configured threshold Of , To enter the new generation , To solve bulk data access , The problem of mass elimination of hot data

Read ahead failure

Because of the preview (Read-Ahead), Put the page into the buffer pool ahead of time , But in the end MySQL No data was read from the page , It's called read ahead invalidation

Buffer pool pollution

When one SQL sentence , When you want to scan a lot of data in bulk , This may cause all pages in the buffer pool to be replaced , It causes a lot of heat data to be swapped out ,MySQL Sharp performance degradation , This is called buffer pool pollution . terms of settlement : Join in Old generation stay window After the strategy , Pages that are loaded in large numbers in a short period of time , It's not immediately inserted into the new generation's head , It's about prioritizing those , Pages visited only once in a short period of time .

8、table Slimming

empty

MySQL perform delete The command is actually just the location of the record , Or the data page is marked with Reusable , But the size of the disk file doesn't change . adopt delete The command cannot reclaim the table space . These can be reused , And there is no space to be used , It looks like empty . When you insert it, you can split, and you'll get holes .

Rebuild table ideas

1、 Create a new heel A Tables with the same structure B 

2、 According to the primary key ID take A Data row by row reading is synchronized to the table B 

3、 Use table B Replace table A Realize the effect of slimming .

Rebuild table instruction

1、alter table A engine=InnoDB, Careful reuse , Fucking great DBA Use the following open source tools .

2、 recommend Github:gh-ost

9、SQL Joins、 Statistics 、 Random query

7 Kind of join As follows : Statistics

1、MyISAM In this mode, the total row number of a table is saved on disk , Just use it directly  

2、InnoDB The engine is due to MVCC Why , You need to read the data out and sum it up  

3、 In terms of performance From good to bad :count( Field ) < count( Primary key id) < count(1) ≈ count(*), As far as possible with count(*).

Random query

mysql> select word from words order by rand() limit 3;

Use it directly order by rand(),explain This statement finds it necessary to Using temporary and Using filesort, The execution cost of query is often large . So when designing, we should avoid this kind of writing .

mysql> select count(*) into @C from t;
set @Y1 = floor(@C * rand());
set @Y2 = floor(@C * rand());
set @Y3 = floor(@C * rand());
select * from t limit @Y1,1select * from t limit @Y2,1;
select * from t limit @Y3,1;

This avoids the generation of temporary tables and sort , The final number of query rows = C + (Y1+1) + (Y2+1) + (Y3+1)

exist and in contrast

1、in When querying, first query the table of subquery , And then make an inside surface and a surface The cartesian product , And then we sift through the conditions .

2、 Subquery uses exists, Will first line main inquiry , Will query each row of data Circulation brings in The subquery verifies the existence of , Filter out the overall returned data .

3、 The two tables are the same size ,in and exists Not much difference . The inner surface is big , use exists More efficient ; The inner surface is small , use in More efficient .

4、 For inquiry not in Then the internal and external tables are scanned , No index is used ; and not extsts The subquery of can still use the index on the table .not exists All ratio not in Be quick .

10、MySQL Optimize

SQL The main points of optimization are 4 A direction :SQL Statement and index Table structure The system configuration Hardware .

The general optimization idea is Maximize the use of indexes Avoid full table scanning as much as possible Reduce invalid data queries

1、 Reduce data access : Set up Reasonable field type , Enable compression , Reduce disks by index access, etc IO.

2、 Return less data : only Return requires Pagination of fields and data of , Reduce disk IO And the Internet IO.

3、 Reduce the number of interactions : Batch DML operation , Function storage to reduce the number of data connections .

4、 Reduce servers CPU expenses : Minimize database sorting operations and full table queries , Reduce CPU Memory footprint  .

5、 Table partition : Use Table partitioning , Parallel operations can be added , Make greater use of CPU resources .

SQL An example of sentence optimization is given

1、 Build coverage index reasonably : Can effectively reduce back to the table .

2、union,or,in Can hit the index , It is recommended to use in 

3、 Negative condition (!=、<>、not in、not exists、not like etc. ) Indexes don't use indexes , Suggest using in.

4、 The use of an index on a function or column , To scan the whole table  

5、 Watch out for implicit type conversions , The original string with an integer will trigger CAST Function causes the index to fail . primary int If you use a string, you will go to the index .

6、 Not recommended % Prefix fuzzy query .

7、 When multi table associated query , The watch is in front of , The big watch is at the back . stay MySQL in , perform from After the table Association query is executed from left to right (Oracle contrary ), The first table will involve a full table scan .

8、 adjustment Where The order of connection in a sentence ,MySQL Use left to right , Top down order analysis where Clause . According to this principle , The condition of filtering more data should be put forward , The fastest way to reduce the result set .

SQL General idea of tuning

1、 First use the slow query log to locate the specific need to optimize sql 

2、 Use explain Execute the plan to see index usage  

3、 Focus on ( Generally, according to this 4 Column to find the index problem ):

1、key( Check to see if the index is in use ) 

2、key_len( Check whether the index is fully used )

3、type( View type index ) 

4、Extra( View additional information : Sort 、 A temporary table 、where Condition is false etc. )

4、 According to the above 1 Step to find the index problem optimization sql 5、 Back to No 2 Step

Table structure optimization

1、 Use as much as possible TINYINT、SMALLINT、MEDIUM_INT As an integer type instead of INT, If it's not negative, add UNSIGNED .

2、VARCHAR The length of the only allocation of the really needed space  .

3、 Use as much as possible TIMESTAMP Instead of DATETIME .

4、 Don't have too many fields in a single table , It is suggested that 20 within .

5、 Avoid using NULL Field , It's hard to query optimization and take up extra index space . String defaults to ''.

Read / write separation

Write only on the primary server , Just reading from the server . The corresponding database cluster is generally a master and a slave 、 One master, many followers . The business server writes all the operations that need to be written to the master database , Read operations are to query from the library . The master database will synchronize the data to the slave database to ensure the consistency of the data . commonly Read / write separation There are two ways to do this : Code encapsulation Follow database middleware .

Sub database and sub table Sub database and sub table It can be divided into vertical and horizontal ways , It's usually First vertical then horizontal .

1、 Vertical sub database : Divide the application into several modules , For example, order module 、 User module 、 Commodity module 、 Payment module and so on . In fact, it is the concept of micro service .

2、 Vertical sub table : Generally, the fields that are not commonly used are separated from the fields with large data .

3、 Horizontal sub table : According to the scenario, select which fields to use as sub table fields , For example, Taobao day order 1000 ten thousand , use userId As sub table field , Data query support up to the latest 6 Month's order , exceed 6 Months of filing , that 6 The amount of data in a month is 18 Billion , branch 1024 A watch , Every watch holds 200W data ,hash(userId)%100 Find the corresponding table .

4、ID generator Distributed ID It needs to be globally unique across databases to facilitate query storage - Retrieving data , Ensure uniqueness and incremental numbers .

At present, the main popular sub database and sub table tool Namely Mycat and sharding-sphere.

TiDB: Open source Distributed database , Combined with the traditional RDBMS and NoSQL The best feature of .TiDB compatible MySQL, Support unlimited horizontal expansion , Strong consistency and high availability .TiDB The goal is to OLTP(Online Transactional Processing) and OLAP (Online Analytical Processing) The scenario provides a one-stop solution .TiDB With the following core characteristics

1、 Support MySQL agreement ( Low development access cost ).

2、100% Support transactions ( Data consistency is easy to implement 、 reliable ).

3、 Infinite horizontal expansion ( There is no need to consider sub database and sub table ), Constant service .

4、TiDB Support and MySQL Mutual backup of .

5、 follow jdbc principle , The cost of learning is low , Strongly related , Strong consistency , Don't worry about master-slave configuration , You don't have to think about sub database and sub table , It can also be expanded seamlessly and dynamically .

fit :

1、 Original business MySQL When a single machine capacity or performance bottleneck is encountered , Consider using TiDB Seamless replacement MySQL.

2、 Big data ,MySQL Complex queries are slow .

3、 Big data , Data is growing fast , Close to the limit of single machine processing , Don't want to separate database and table or use database middleware, which is more invasive to business 、 Business constrained Sharding programme .

4、 Big data , There are high concurrent real-time writes 、 Real-time query 、 The need for real-time statistical analysis .5、 There are distributed transactions 、 Data from multiple data centers 100% Strong consistency 、auto-failover High availability requirements for .

Not suitable for :

1、 stand-alone MySQL There's no need for a satisfying scene TiDB.

2、 The number of data is less than 5000w You don't usually use TiDB,TiDB It's designed for large-scale data scenarios .

3、 If your app has a small amount of data ( All the data are below ten million levels ), And there's no high availability 、 Strong consistency or multiple data center replication requirements , So it's not suitable for use TiDB.

End

There are also some simple ones MySQL Summary of knowledge points , You can take it yourself :

1、SQL Basics :https://juejin.im/post/6844903790571700231

2、SQL interview :https://sowhat.blog.csdn.net/article/details/71158104

3、MySQL Torture :https://www.jianshu.com/nb/22933318

 Programmer column   Scan code and pay attention to customer service   Press and hold to recognize the QR code below to enter the group 

Recent highlights are recommended :  

  Another programmer , Caught !( Real events )

  What kind of experience is it for a programmer to have a cute girlfriend ?

 “12306” How awesome is the architecture of ?

 csv A simple solution to the problem of file reading and writing garbled code


Here's a look Good articles to share with more people ↓↓


版权声明
本文为[osc_ 5rxi0ziy]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201225130401765g.html

Scroll to Top