编程知识 cdmana.com

Linux kernel vs memory fragmentation (1)

( external ) Memory fragmentation is a long history Linux Kernel programming problems , With the system running , Pages are assigned to various tasks , As time goes on, memory will gradually become fragmented , In the end, a busy system with a long uptime may have only a few physical pages that are continuous . because Linux The kernel supports virtual memory management , Physical memory fragmentation is usually not a problem , Because with the help of page table , Physically dispersed memory is still continuous in the virtual address space ( Unless you use big pages ), But the need to allocate continuous physical memory from the kernel's linear mapping area becomes very difficult , For example, block allocators are used to allocate structure objects ( It is very common and frequent in kernel mode ), Or not to support scatter/gather Mode DMA Buffer operation, etc , Will cause frequent direct memory recycling / Be regular , The system performance will fluctuate greatly , Or allocation failure ( In slow memory allocation paths, different operations are performed according to the page allocation flag ).

If kernel programming no longer relies on high-order physical memory allocation in linear address space , Then the memory fragmentation problem is fundamentally solved , But for the Linux kernel For such a huge project , Such a change is obviously impossible , So from Linux 2.x Version to date , People in the community have been thinking about ways , Including a lot of very tricky Of patch To alleviate the memory fragmentation problem , Although some of the merged patch It's also controversial , For example, memory regulation mechanism , stay [LSFMM 2014] At the conference , Many people complain that the efficiency of memory organization is too low , Too slow , And there is something that is not easy to reproduce bug, But the community didn't give up this feature, but optimized it in subsequent versions of the kernel .

The most persistent person in this field is Mel Gorman, There are two groups of important patch It's all from him , The first group was in Linux 2.6.24 Version merge , this patch Before being accepted by the community 28 A version , The time span is annualized (05 year 2 Month has its own v19 Introduction to version , A formal merger v28 Is in 07 year 10 month ), The second group patch stay Linux 5.0 Merge , this patch stay 1 or 2 socket Compared with the machine on patch edition , In general, it can reduce 94% Memory fragmentation events for .

This article will focus on the current commonly used 3.10 Version of the kernel in the partner allocator to prevent the expansion of memory fragmentation , Memory regularization principle , How to view the fragmentation index , And how to quantify the delay cost caused by memory warping .

A brief history of anti fragmentation

Before we get to the point , First of all, I'd like to summarize some parts for you Linux All the efforts in the history of kernel development to improve high-level memory allocation . Every article here is worth reading carefully , It is hoped that this table will facilitate readers interested in anti fragmentation details .

lwn Release time title
2004-09-08 [Kswapd and high-order allocations]lwn.net/Articles/101230/
2004-05-10 [Active memory defragmentation]lwn.net/Articles/105021/
2005-02-01 [Yet another approach to memory fragmentation]lwn.net/Articles/121618/
2005-11-02 [Fragmentation avoidance]lwn.net/Articles/158211/
2005-11-08 [More on fragmentation avoidance]lwn.net/Articles/159110/
2006-11-28 [Avoiding - and fixing - memory fragmentation]lwn.net/Articles/211505/
2010-01-06 [Memory compaction]lwn.net/Articles/368869/
2014-03-26 [Memory compaction issues]lwn.net/Articles/591998/
2015-07-14 [Making kernel pages movable]lwn.net/Articles/650917/
2016-04-23 [CMA and compaction]lwn.net/Articles/684611/
2016-05-10 [make direct compaction more deterministic]lwn.net/Articles/686801/
2017-03-21 [Proactive compaction]lwn.net/Articles/717656/
2018-10-31 [Fragmentation avoidance improvements]lwn.net/Articles/770235/
2020-04-21 [Proactive compaction for the kernel]lwn.net/Articles/817905/

Now let's get to the point .

Linux Partner distributor

Linux Use [ Partner algorithm ] As a page allocator , It is simple and efficient .Linux On the basis of the classical algorithm, some extensions are made :

  1. Partition's partner allocator ;

  2. Per-CPU pageset;

  3. Group by migration type ;

We've introduced Linux The kernel using node, zone, page To describe physical memory , The partition's partner allocator focuses on a certain node One of the zone.4.8 Version before , Page recycling strategy is also based on zone To achieve , Because early design was mainly for 32 Bit processor , And there's a lot of high-end memory , But there is the same way node Different zone Page aging speed is inconsistent , It leads to a lot of problems . The community has added a lot to the community for a long time tricky Of patch To solve all kinds of problems , But it still hasn't solved this problem fundamentally . With the recent years 64 Bit processor + There are more and more models with large memory ,Mel Groman Change the page recycling policy from zone Migrate to node, Solved this problem . We are using BPF Note this when writing tools to observe recycling operations .

Per-CPU pageset Is used to optimize single page allocation , Can reduce lock contention between processors . It has nothing to do with anti fragmentation , Therefore, this article does not make a detailed introduction to .

Grouping according to migration type is the anti fragmentation method we will introduce in detail .

Group by migration type

Before we learn about migration types , You need to understand the memory address space layout first , Each processor architecture has a definition , such as x86_64 In the definition of [mm.txt]. For accessing virtual address space through page table ( For example, the heap memory requirement of user space ) There is no need for continuous physical memory , Why? ? Here we are [Intel 5-level] Page table as an example , Virtual addresses are divided from low to high : Page offset 、 Direct page table index 、 Index of contents in the middle of the page 、 Index of the top table of contents of the page 、 Page 4 index of contents 、 Page global index , The physical memory page frame number is stored in the direct page table entry , Through the direct page table index you can find , The combination of the found page frame number and the in page offset is the physical address . Suppose I want to replace the corresponding physical page in a direct page table entry , Just assign a new page , Copy the data from the old page to the new page , Then modify the value of the direct page table entry to the new page frame number , Instead of changing the original virtual address , Such a page can be freely migrated . But for linear mapping regions , Virtual address = Physical address + Constant , If we change the physical address , Will inevitably lead to changes in the virtual address , All the behavior that continues to visit the original virtual address is bug 了 , Such a page is obviously not suitable for migration . So when physical pages accessed through page tables are mixed with pages that are mapped linearly , It's easy to have memory fragmentation , Therefore, the kernel defines several migration types according to the mobility of the page , According to the migration type, the pages are grouped to achieve anti fragmentation .

The kernel defines multiple migration types , We usually just need to focus on 3 individual :MIGRATE_UNMOVABLE、MIGRATE_MOVABLE、MIGRATE_RECLAIMABLE, this 3 This is the real migration type , Other migration types have special uses , I don't want to express it here . We can go through /proc/pagetypeinfo View the distribution of each migration type at each level .

The specific migration type from which the page is assigned is from the application page , Use the page allocation flag bit to determine . For example, the memory requirement of user space is used __GFP_MOVABLE, For document pages, use __GFP_RECLAIMABLE. When a migration type of page runs out , Physical pages can be stolen from other migration types . Steal from the largest page block ( Size by pageblock_order decision , The details are not in this article ) Start stealing , Avoid debris . The standby priority of the above three migration types is from high to low :

MIGRATE_UNMOVABLE: MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE

MIGRATE_RECALIMABlE: MIGRATE_UNMOVABLE, MIGRATE_MOVABLE

MIGRATE_MOVABLE: MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE

The purpose of the kernel to introduce migration types for grouping is to prevent fragmentation , So when there's frequent theft , Indicates that there is an external memory fragmentation event , These external debris events have laid hidden dangers for the future . I was in the last article Why should we ban THP It has been mentioned that the kernel can provide ftrace Event to analyze external memory fragmentation events , The specific steps are as follows :

echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc_extfrag/enable              
cat /sys/kernel/debug/tracing/trace_pipe > ~/extfrag.log

Execute after an event Ctrl-c, And implement

echo 0 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc_extfrag/enable

Stop to collect . This event contains many fields :

For analyzing the number of times an external memory fragmentation event occurred within an event , We just need to pay attention fallback_order < pageblock order (x86_64 In the environment 9) That's enough .

We can see that grouping by migration type only delays memory fragmentation , It's not a fundamental solution , So over time , When there is too much memory fragmentation , When continuous physical memory requirements cannot be met , Will cause performance problems . So it's not enough to rely on this feature alone , The kernel needs some means to clean up memory fragmentation .

To be continued …

Click to see more

版权声明
本文为[osc_ p1q9onsn]所创,转载请带上原文链接,感谢
https://cdmana.com/2020/12/20201225111824266E.html

Scroll to Top