编程知识 cdmana.com

Analysis of storage engine mvstore of H2 (1) -- mvstore initialization

Preface :

​ Come into contact with by chance H2 database , Learn that it supports memory 、 File mode , Can run in embedded mode , Server mode and cluster mode .H2 Although the database sparrow is small, it has all kinds of internal organs , Support transactions ,MVCC etc. . I remember when I was reading, I always wanted to know the specific operation principle of a database , Want to know how it puts a sql The process from interpretation to execution . But the time was limited and the level was not enough , Until you see this H2 So I decided to start with it , This article is about H2 The storage engine part of the database MVStore A little introduction to ( Please refer to the official website , Combined with my own understanding ) By the way, sort out relevant knowledge for yourself . Insufficient level , Welcome to correct .

One 、 Test method use case

​ Oracle There are table spaces in the data storage of 、 paragraph 、 District 、 block 、 Data files ;MySQL InnoDB It's the same with storage management , however MySQL The concept of shared table space and independent table space is added .

Follow Mysql,Oracle Equal similarity ,H2 The storage structure of is actually hierarchical , Respectively :file,chunk,page, The basic unit of storage is block( It's usually 4k Bytes ).flie There are several chunk, Every time a new version is written, there will be one chunk, One chunk By multiple block form .h2 It's a kind of copy on write btree The way to , Each modification btree Will get a new root page Written in this chunk in , Each transaction just needs to keep its own root page Can .

150104113477241.png

​ Mysql InnoDB Storage structure

​ In the reading H2 Source code time , Both top-down and bottom-up approaches are useful . When I first saw the source code, I couldn't start , So we first understand how it starts as a whole , function , perform sql Of . After having a general understanding, I will debug Go in part and part of the code to understand how each module is organized and operated from bottom to top . in addition ,UT It's a good thing .H2 The source code contains a lot of unit test code . There are many unit tests in the test file , The granularity of each test method is fine , From the name of the method, you can know what function has been tested . I'll start with one of them called testExample() The way to start , Step by step combing MVStore The logic of . Interested partners can also start with this test method , Less detours .

image-20210818232517731.png

H2 Unit tests in source code

image-20210818232936104.png

H2 Unit tests in source code

testExample() Method

private void testExample() {
    String fileName = getBaseDir() + "/" + getTestName();
    FileUtils.delete(fileName);

    // open the store (in-memory if fileName is null)
    try (MVStore s = MVStore.open(fileName)) {

        // create/get the map named "data"
        MVMap<Integer, String> map = s.openMap("data");

        // add and read some data
        map.put(1, "Hello World");
        // System.out.println(map.get(1));
    }
    try (MVStore s = MVStore.open(fileName)) {
        MVMap<Integer, String> map = s.openMap("data");
        assertEquals("Hello World", map.get(1));
    }
}
 Copy code 

Two 、 Introduction to the form of storage file

​ H2 The data of the database is stored in a file . This file contains... For security purposes Two file headers And a series of them chunks. Each file header occupies a block, One block yes 4096 byte . every last chunk At least one block, But usually it is 200 individual blocks Or more . The data to log structured storage The form is stored in chunks in . Every version (version) There will be one chunk.

H2 The form of storing files would be like this :

[ file header 1 ] [ file header 2 ] [ chunk ] [ chunk ] ... [ chunk ]
 Copy code 

every last chunk It contains several B+ Treelike page. Take the following example , There will be two chunks.

MVStore s = MVStore.open(fileName);
MVMap<Integer, String> map = s.openMap("data");
for (int i = 0; i < 400; i++) {
    map.put(i, "Hello");
}
s.commit();
for (int i = 0; i < 100; i++) {
    map.put(i, "Hi");
}
s.commit();
s.close();
 Copy code 
Chunk 1:

-Page 1 : The root node , Point to two child nodes page 2 and page 3

-Page 2: leaf , Contains 140 Elements ( key 0 ~ key 139)

-Page 3: leaf , Contains 260 Elements ( key 140~ key 399)

Chunk 2:

-Page 4: The root node , Point to two child nodes page 5 and page 3

-Page 5: leaf , Contains 140 Elements ( key 0~ key 139)

This shows that every chunk Contains changes for each version : The page of the new version and its root node . Like the example here ,Page2 If it is changed, there will be a new one Page, Then the pages from its parent node to the root node will also have a new version Page.

image-20210819232721151.png

​ for the first time commit

image-20210819234009093.png

​ The second time commit

3、 ... and 、 The file header

There are two file headers in the storage file , The contents of both file headers are the same . When the file header is updated , There will be “ Partial failure ” The situation of , In this way, one of the file headers will be destroyed . So the second header is used when both headers are successfully updated ( go by the name of “ Update in place ”) It is a successful update . The form of the file header is as follows :

H:2,block:2,blockSize:1000,chunk:7,created:1441235ef73,format:1,version:7,fletcher:3044e6cc
 Copy code 

Data is a kind of “ Key value pair ” In the form of , among “ value ” Is a number in hexadecimal form . Form the following :

  • H: "H:2" It means H2 database

  • block: One of the newest chunks At the beginning block Location ( It doesn't need to be up-to-date chuncks) ---- I don't quite understand that

  • blockSize: File size ( In bytes ), Currently hexadecimal 1000, That is, decimal 4096. Exactly the same size as the disk sector .

  • chunk:chunk id. Usually consistent with the version number . however chunk id It is possible to roll back to 0, But the version number will not .

  • created: File creation time ( from 1970 To the current number of milliseconds )

  • format: File format number , At present, it is 1

  • version:chunk Version number of

  • fletcher: Fletcher test and

When the file is opened , Two file headers will be read in and verified and used to verify . If both headers match , Then the header with the updated version number will be used . The latest version number chunk Can be known ( How do you know? I'll talk about it below ), The remaining metadata (metadata) Can from chunk We know . If not stored in the header file chunk id,block, and Version number . So the latest chunk From the last... In the file chunk Start looking for

Four 、chunk Format

Each version will have chunk. every last chunk Contains a header , The page modified in this version (page), And a tail . page (page) Contains the table (map) Actual data . In a chunk The pages in are stored in header Behind (right after). One chunk The size of is a block Multiple of the size of . The tail is stored in chunk Last 128 In bytes .

[ header ] [ page ] [ page ] ... [ page ] [ footer ]
 Copy code 

tail (footer) Can be used to verify chunk Whether it has been completely written . Every write operation has a chunk, And can find the last one in the file chunk Starting position .chunk The head and tail of contain the following data :

chunk:1,block:2,len:1,map:6,max:1c0,next:3,pages:2,root:4000004f8c,time:1fc,version:1
chunk:1,block:2,version:1,fletcher:aed9a4f6
 Copy code 
  • chunk:chunk id
  • block:chunk The first of block Location ( To multiply by block To get the location in the file )
  • len:chunk Size , use block The number of
  • map: Abreast of the times map Of id, When one map When created, this id I'll add one
  • max: The sum of the sizes of all the largest pages ( I don't understand )
  • next: next chunk The predicted starting position of
  • pages:chunk Number of pages in
  • root: Metadata (metadata) The location of the root page of .
  • time:chunk Time when it is written . In milliseconds after the file is written
  • version: This chunk Indicates the version number of the
  • fletcher: The checksum

Chunks It will never be updated in place , every last chunk Contains the pages modified in that version ( Each version has a page ), And the parent node pages of these pages , To the root page . If one map The elements in are modified , Delete or add , Then the corresponding page will be copied , Modify and store in the next chunk in , And the old chunk Loose leaf in (live pages) There will be less . This mechanism is called copy-on-write, And Btrfs File systems work in a similar way . No, live pages Of Chunks Will be marked as free, therefore chunk Can be reused . Because not all chunk All the same size , So there will be some free time (free) Of chunks In a certain chunk In front of . In a free (stackoverflow.com/questions/1…

The latest one chunk In the open h2 The method determined when persisting files : The file header contains a recent chunk The location of , This chunk It doesn't have to be the latest chunk. This is to reduce the number of file header updates . When you open a file , There is also the latest one in the file header chunk The tail of will be read into . If the one you read for the first time chunk There is one next The pointer to ,next The pointer points to chunk Will also be read into . Until the latest one chunk Be read into .

This passage is a little puzzling , You can directly see the implementation in the source code , Later, see how to read the latest version through the source code chunk Of . One of the file headers looks like this

image-20210825232045447.png

As mentioned earlier , The first two block Is used to store files header Of

image-20210831215238857.png

When opening the file header , I'll read it first chunk The head and tail of . Of the parameters block and chunkId It's all on the file header .

image-20210827233415016.png

image-20210827234431016.png

image-20210831215809482.png

Read Chunk In the end

image-20210831223711142.png

chunk Check the head and tail of

image-20210831223924158.png

Now let's open a MVStore The process is as follows

MVStore s = MVStore.open(fileName);
 Copy code 
  1. First read the file header .
  2. After reading the first file header, according to the... In the file header block Field to the corresponding location in the file (block * bolck_size). Read the corresponding Chunk head

Here are the file header and chunk Head example

image-20210901223022039.png ​ file head

image-20210901222921709.png

​ chunk

//  Parameters block from file In the header block Field in . Parameters expected It is also in the file header chunk Field 
private Chunk readChunkHeaderAndFooter(long block, int expectedId) {
    Chunk header = readChunkHeaderOptionally(block, expectedId);
    if (header != null) {
        Chunk footer = readChunkFooter(block + header.len);
        if (footer == null || footer.id != expectedId || footer.block != header.block) {
            return null;
        }
    }
    return header;
}

//
private Chunk readChunkHeaderOptionally(long block, int expectedId) {
        Chunk chunk = readChunkHeaderOptionally(block);
        return chunk == null || chunk.id != expectedId ? null : chunk;
    }
    
    
    private Chunk readChunkHeaderOptionally(long block) {
        try {
            Chunk chunk = readChunkHeader(block);
            return chunk.block != block ? null : chunk;
        } catch (Exception ignore) {
            return null;
        }
    }
    
     private Chunk readChunkHeader(long block) {
        long p = block * BLOCK_SIZE;
        ByteBuffer buff = fileStore.readFully(p, Chunk.MAX_HEADER_LENGTH);
        return Chunk.readChunkHeader(buff, p);
    }
    
    
      static Chunk readChunkHeader(ByteBuffer buff, long start) {
        int pos = buff.position();
        byte[] data = new byte[Math.min(buff.remaining(), MAX_HEADER_LENGTH)];
        buff.get(data);
        try {
            for (int i = 0; i < data.length; i++) {
                if (data[i] == '\n') {
                    // set the position to the start of the first page
                    buff.position(pos + i + 1);
                    String s = new String(data, 0, i, StandardCharsets.ISO_8859_1).trim();
                    return fromString(s);
                }
            }
        } catch (Exception e) {
            // there could be various reasons
            throw DataUtils.newMVStoreException(
                    DataUtils.ERROR_FILE_CORRUPT,
                    "File corrupt reading chunk at position {0}", start, e);
        }
        throw DataUtils.newMVStoreException(
                DataUtils.ERROR_FILE_CORRUPT,
                "File corrupt reading chunk at position {0}", start);
    }
    
    
 Copy code 
  1. Read chunk Check the tail of

    private Chunk readChunkHeaderAndFooter(long block, int expectedId) {
            Chunk header = readChunkHeaderOptionally(block, expectedId);
            if (header != null) {
                Chunk footer = readChunkFooter(block + header.len);
                if (footer == null || footer.id != expectedId || footer.block != header.block) {
                    return null;
                }
            }
            return header;
        }
        
     Copy code 
  2. Read the second file header . If one of them version The field is larger than that read from the first file header chunk Of version, Continue to read... As in the above steps chunk

Review what I said earlier

When opening a file , There is also the latest one in the file header chunk The tail of will be read into . If the one you read for the first time chunk There is one next The pointer to ,next The pointer points to chunk Will also be read into . Until the latest one chunk Be read into .

Look below , stay while In circulation , It will chunk(next Next to chunk Read in ), If the next chunk The version of is less than or equal to the current version . thus newest It points to the latest version chunk.

image-20210902221302944.png

  1. The last section identifies the latest chunk (newest) after , It's down here .
if (assumeCleanShutdown) {
    // quickly check latest 20 chunks referenced in meta table
    Queue<Chunk> chunksToVerify = new PriorityQueue<>(20, Collections.reverseOrder(chunkComparator));
    try {
        setLastChunk(newest);
        // load the chunk metadata: although meta's root page resides in the lastChunk,
        // traversing meta map might recursively load another chunk(s)
        Cursor<String, String> cursor = layout.cursor(DataUtils.META_CHUNK);
        while (cursor.hasNext() && cursor.next().startsWith(DataUtils.META_CHUNK)) {
            Chunk c = Chunk.fromString(cursor.getValue());
            assert c.version <= currentVersion;
            // might be there already, due to meta traversal
            // see readPage() ... getChunkIfFound()
            chunks.putIfAbsent(c.id, c);
            chunksToVerify.offer(c);
            if (chunksToVerify.size() == 20) {
                chunksToVerify.poll();
            }
        }
        Chunk c;
        while (assumeCleanShutdown && (c = chunksToVerify.poll()) != null) {
            Chunk test = readChunkHeaderAndFooter(c.block, c.id);
            assumeCleanShutdown = test != null;
            if (assumeCleanShutdown) {
                validChunksByLocation.put(test.block, test);
            }
        }
    } catch(MVStoreException ignored) {
        assumeCleanShutdown = false;
    }
}
 Copy code 

Read notes :quickly check latest 20 chunks referenced in meta table

It can be seen that it is to read the latest 20 individual chunk. A priority queue is used to chunk Sort . But here is by variable chunkComparator In reverse order .chunkComparator Is defined as follows :

Comparator<Chunk> chunkComparator = (one, two) -> {
    int result = Long.compare(two.version, one.version);
    if (result == 0) {
        // out of two copies of the same chunk we prefer the one
        // close to the beginning of file (presumably later version)
        result = Long.compare(one.block, two.block);
    }
    return result;
};
 Copy code 

First compare two chunk Version of , Whose version is big is in the front . If the version is the same , Then compare block Size , Whose? block Small ones are in front . So , Its flashback is that whose version is small comes first , Whose? block Big is in front .

Notice the method :

 setLastChunk(newest);
 Copy code 

The latest... Determined from the previous step chunk Will be used to initialize layout This MVMap. The details will not be introduced here , Later on Page The structure will be easier to understand

image-20210908222603207.png

after setLastChunk after ,layout This MVMap There will be a structure similar to the following

image-20210908223159745.png

Let's look at the next step :

Cursor<String, String> cursor = layout.cursor(DataUtils.META_CHUNK);
while (cursor.hasNext() && cursor.next().startsWith(DataUtils.META_CHUNK)) {
    Chunk c = Chunk.fromString(cursor.getValue());
    assert c.version <= currentVersion;
    // might be there already, due to meta traversal
    // see readPage() ... getChunkIfFound()
    chunks.putIfAbsent(c.id, c);
    chunksToVerify.offer(c);
    if (chunksToVerify.size() == 20) {
        chunksToVerify.poll();
    }
}
 Copy code 

DataUtils.META_CHUNK Is a string constant ”chunk.“ . From this we can guess , there while Circulation is to put layout Of "chunk." The first keys are read , Put it in chunks This map And priority lines chunksToVerify in . And keep the number of priority queues at most 19 individual .

Then there is the following cycle :

Read from the priority queue in turn chunk, Put it in validChunksByLocation This map in

while (assumeCleanShutdown && (c = chunksToVerify.poll()) != null) {
    Chunk test = readChunkHeaderAndFooter(c.block, c.id);
    assumeCleanShutdown = test != null;
    if (assumeCleanShutdown) {
        validChunksByLocation.put(test.block, test);
    }
}
 Copy code 

6. The last step , It's some cleaning work

fileStore.clear();
// build the free space list
for (Chunk c : chunks.values()) {
    if (c.isSaved()) {
        long start = c.block * BLOCK_SIZE;
        int length = c.len * BLOCK_SIZE;
        fileStore.markUsed(start, length);
    }
    if (!c.isLive()) {
        deadChunks.offer(c);
    }
}
 Copy code 

版权声明
本文为[I'll be king bidong]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/09/20210909124112655G.html

Scroll to Top