（ external ） Memory fragmentation is a long history Linux Kernel programming problems , With the system running , Pages are assigned to various tasks , As time goes on, memory will gradually become fragmented , In the end, a busy system with a long uptime may have only a few physical pages that are continuous . because Linux The kernel supports virtual memory management , Physical memory fragmentation is usually not a problem , Because with the help of page table , Physically dispersed memory is still continuous in the virtual address space （ Unless you use big pages ）, But the need to allocate continuous physical memory from the kernel's linear mapping area becomes very difficult , For example, block allocators are used to allocate structure objects （ It is very common and frequent in kernel mode ）, Or not to support scatter/gather Mode DMA Buffer operation, etc , Will cause frequent direct memory recycling / Be regular , The system performance will fluctuate greatly , Or allocation failure （ In slow memory allocation paths, different operations are performed according to the page allocation flag ）.
If kernel programming no longer relies on high-order physical memory allocation in linear address space , Then the memory fragmentation problem is fundamentally solved , But for the Linux kernel For such a huge project , Such a change is obviously impossible , So from Linux 2.x Version to date , People in the community have been thinking about ways , Including a lot of very tricky Of patch To alleviate the memory fragmentation problem , Although some of the merged patch It's also controversial , For example, memory regulation mechanism , stay [LSFMM 2014] At the conference , Many people complain that the efficiency of memory organization is too low , Too slow , And there is something that is not easy to reproduce bug, But the community didn't give up this feature, but optimized it in subsequent versions of the kernel .
The most persistent person in this field is Mel Gorman, There are two groups of important patch It's all from him , The first group was in Linux 2.6.24 Version merge , this patch Before being accepted by the community 28 A version , The time span is annualized （05 year 2 Month has its own v19 Introduction to version , A formal merger v28 Is in 07 year 10 month ）, The second group patch stay Linux 5.0 Merge , this patch stay 1 or 2 socket Compared with the machine on patch edition , In general, it can reduce 94% Memory fragmentation events for .
This article will focus on the current commonly used 3.10 Version of the kernel in the partner allocator to prevent the expansion of memory fragmentation , Memory regularization principle , How to view the fragmentation index , And how to quantify the delay cost caused by memory warping .
A brief history of anti fragmentation
Before we get to the point , First of all, I'd like to summarize some parts for you Linux All the efforts in the history of kernel development to improve high-level memory allocation . Every article here is worth reading carefully , It is hoped that this table will facilitate readers interested in anti fragmentation details .
|lwn Release time||title|
|2004-09-08||[Kswapd and high-order allocations]lwn.net/Articles/101230/|
|2004-05-10||[Active memory defragmentation]lwn.net/Articles/105021/|
|2005-02-01||[Yet another approach to memory fragmentation]lwn.net/Articles/121618/|
|2005-11-08||[More on fragmentation avoidance]lwn.net/Articles/159110/|
|2006-11-28||[Avoiding - and fixing - memory fragmentation]lwn.net/Articles/211505/|
|2014-03-26||[Memory compaction issues]lwn.net/Articles/591998/|
|2015-07-14||[Making kernel pages movable]lwn.net/Articles/650917/|
|2016-04-23||[CMA and compaction]lwn.net/Articles/684611/|
|2016-05-10||[make direct compaction more deterministic]lwn.net/Articles/686801/|
|2018-10-31||[Fragmentation avoidance improvements]lwn.net/Articles/770235/|
|2020-04-21||[Proactive compaction for the kernel]lwn.net/Articles/817905/|
Now let's get to the point .
Linux Partner distributor
Linux Use [ Partner algorithm ] As a page allocator , It is simple and efficient .Linux On the basis of the classical algorithm, some extensions are made ：
Partition's partner allocator ;
Group by migration type ;
We've introduced Linux The kernel using node, zone, page To describe physical memory , The partition's partner allocator focuses on a certain node One of the zone.4.8 Version before , Page recycling strategy is also based on zone To achieve , Because early design was mainly for 32 Bit processor , And there's a lot of high-end memory , But there is the same way node Different zone Page aging speed is inconsistent , It leads to a lot of problems . The community has added a lot to the community for a long time tricky Of patch To solve all kinds of problems , But it still hasn't solved this problem fundamentally . With the recent years 64 Bit processor + There are more and more models with large memory ,Mel Groman Change the page recycling policy from zone Migrate to node, Solved this problem . We are using BPF Note this when writing tools to observe recycling operations .
Per-CPU pageset Is used to optimize single page allocation , Can reduce lock contention between processors . It has nothing to do with anti fragmentation , Therefore, this article does not make a detailed introduction to .
Grouping according to migration type is the anti fragmentation method we will introduce in detail .
Group by migration type
Before we learn about migration types , You need to understand the memory address space layout first , Each processor architecture has a definition , such as x86_64 In the definition of [mm.txt]. For accessing virtual address space through page table （ For example, the heap memory requirement of user space ） There is no need for continuous physical memory , Why? ？ Here we are [Intel 5-level] Page table as an example , Virtual addresses are divided from low to high ： Page offset 、 Direct page table index 、 Index of contents in the middle of the page 、 Index of the top table of contents of the page 、 Page 4 index of contents 、 Page global index , The physical memory page frame number is stored in the direct page table entry , Through the direct page table index you can find , The combination of the found page frame number and the in page offset is the physical address . Suppose I want to replace the corresponding physical page in a direct page table entry , Just assign a new page , Copy the data from the old page to the new page , Then modify the value of the direct page table entry to the new page frame number , Instead of changing the original virtual address , Such a page can be freely migrated . But for linear mapping regions , Virtual address = Physical address + Constant , If we change the physical address , Will inevitably lead to changes in the virtual address , All the behavior that continues to visit the original virtual address is bug 了 , Such a page is obviously not suitable for migration . So when physical pages accessed through page tables are mixed with pages that are mapped linearly , It's easy to have memory fragmentation , Therefore, the kernel defines several migration types according to the mobility of the page , According to the migration type, the pages are grouped to achieve anti fragmentation .
The kernel defines multiple migration types , We usually just need to focus on 3 individual ：MIGRATE_UNMOVABLE、MIGRATE_MOVABLE、MIGRATE_RECLAIMABLE, this 3 This is the real migration type , Other migration types have special uses , I don't want to express it here . We can go through /proc/pagetypeinfo View the distribution of each migration type at each level .
The specific migration type from which the page is assigned is from the application page , Use the page allocation flag bit to determine . For example, the memory requirement of user space is used __GFP_MOVABLE, For document pages, use __GFP_RECLAIMABLE. When a migration type of page runs out , Physical pages can be stolen from other migration types . Steal from the largest page block （ Size by pageblock_order decision , The details are not in this article ） Start stealing , Avoid debris . The standby priority of the above three migration types is from high to low :
MIGRATE_UNMOVABLE: MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE
MIGRATE_RECALIMABlE: MIGRATE_UNMOVABLE, MIGRATE_MOVABLE
MIGRATE_MOVABLE: MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE
The purpose of the kernel to introduce migration types for grouping is to prevent fragmentation , So when there's frequent theft , Indicates that there is an external memory fragmentation event , These external debris events have laid hidden dangers for the future . I was in the last article Why should we ban THP It has been mentioned that the kernel can provide ftrace Event to analyze external memory fragmentation events , The specific steps are as follows ：
echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc_extfrag/enable cat /sys/kernel/debug/tracing/trace_pipe > ~/extfrag.log
Execute after an event Ctrl-c, And implement
echo 0 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc_extfrag/enable
Stop to collect . This event contains many fields ：
For analyzing the number of times an external memory fragmentation event occurred within an event , We just need to pay attention fallback_order < pageblock order （x86_64 In the environment 9） That's enough .
We can see that grouping by migration type only delays memory fragmentation , It's not a fundamental solution , So over time , When there is too much memory fragmentation , When continuous physical memory requirements cannot be met , Will cause performance problems . So it's not enough to rely on this feature alone , The kernel needs some means to clean up memory fragmentation .
To be continued …