编程知识 cdmana.com

Notes on C + + compilation knowledge (III) -- static link


After compiling, we will get .o Format of the target file , Every c perhaps cpp Every file will generate a .o, That is, a compilation unit corresponds to a .o, To generate an executable program , You need to coordinate and cooperate among various compilation units , There are various calling relationships between various compilation units , For example, accessing variables or calling functions across compilation units , Simply put, put each .o The process of organizing the content of into an executable file is to link , The reason why static links are called static is that they are done in advance at the compilation time , In contrast, there are dynamic links , An article will also be written later to introduce . Nowadays, linkers generally use a method called two-step linking to link statically , That is, the whole link process is divided into Allocation of space and address and Symbol analysis and repositioning Two phases . In the last article, we learned about the contents of the target file , In this article, let's take a look at how the compiler organizes object files to link .

Here is based on 《 Self cultivation of programmers 》 A simple example in the book to see the process of linker executing static links , There are two c file , That is, there are two compilation units :

/*a.c*/
extern int shared;
int main() {
    
    int a = 100;
    swap(&a, &shared);
    return 0;
}
/*b.c*/
int shared = 1;
void swap(int *a,int *b) {
    
    *a ^= *b ^= *a ^= *b; 
}

perform gcc -c a.c b.c Get back a.o and b.o,ld a.o b.o -e main -o ab Link get ab Executable file .

One 、 Allocation of space and address

Because you want to merge multiple target files together , Naturally, it first involves the allocation of space and address , Because each target file has its own paragraph content , These contents should first be saved in a file , It also needs to be loaded into memory when executing , Therefore, it involves two types of address and space allocation :
(1) Space and address allocation in executable files , That is, how to organize and save the contents of each target file in the file .
(2) Allocation of virtual address space used during loading , That is, how each part is organized in memory at run time , Modern mainstream operating systems manage the memory of each program according to the virtual address space .

about (1), Relatively simple and direct , Save the similar segments of each target file together and summarize them into one file :
 Insert picture description here
What's more complicated is (2), Because when the program runs, it needs to load all parts according to the virtual address , When using functions and variables, our code finds the object to be used according to the address , Now unified into a new file , The address of each symbol will naturally change . So first, we should Determine the loading address of the merged segment , Two is Know the address of each symbol after merging , In this way, the following symbol parsing and relocation can be carried out correctly . The space address allocation we focus on is mainly this part .

1.1 Merge similar segments and determine the loading address

It can be used objdump Command to view the address allocation before and after the link :
 Insert picture description here
 Insert picture description here
 Insert picture description here
You can see ,.text Segment after link size Exactly equal to a.o and b.o Of the corresponding paragraph size The sum of the ,0x2c+0x4b=0x77, And two .o Of all the paragraphs vma(Virtual Memory Address) All are 0, That is, no virtual address space is allocated , And the executable ab Except for .comment Are assigned to the virtual address for loading ,.comment You don't need to load into memory, so you don't need . Note that the virtual memory address space is not from 0 At the beginning , Usually 32 Bit program is from 0x08048000 Start , and 64 Bit program is from 0x400000 Start , So in the picture above 401000 No wonder .

Here is a summary picture of self-cultivation , The book uses 32 Bit compiled , So the address is slightly different .
 Insert picture description here

1.2 Determine symbol address

After determining the starting address of the segment of the final output executable file , We can also determine the virtual address of each symbol , Because of the relative position of each symbol in the segment offset Is constant , So we can pass the paragraph start Signed offset To determine the virtual address of the symbol , And the order within the segment is determined by ld The order of the target files in the command determines , Like what we use ld a.o b.o -e main -o ab Will make the final code segment main stay swap Before . In the above example, our custom global symbol has three , According to this calculation, we can get

Symbol type Virtual address
main.text Function in segment 0x401000 + 0x0(a.o In the code snippet offset) = 0x401000
swap.text Function in segment 0x401000 + 0x2c( In front main Size ) + 0x0(b.o In the code snippet offset) = 0x40102c
shared.data Variable in segment 0x404000 + 0x0(b.o Within the data segment offset) = 0x404000

We can use nm Verify the correctness of the command :
 Insert picture description here
such , We can get the address of each symbol in all target files , Save to a Global symbol table For symbol parsing and relocation .

Two 、 Symbol analysis and repositioning

After the space address allocation is completed , The linker is about to perform relocation , It is also the core content of static links , It refers to changing some addresses that cannot be determined in the independent compilation process of the compilation unit into real target addresses , That is, get the address of the required symbol from the global symbol table , Modify the address according to various addressing methods , And in the process , Symbol parsing is required ( Get the virtual address of the symbol according to the symbol name ).

2.1 Symbol resolution

The so-called symbol parsing can be understood as finding the target address of the symbol to be used , For example, in the previous example a.o External symbols are used shared and swap, In the process of relocation, it needs to be replaced with the real target virtual address , The linker finds the symbol to be used by looking up the global symbol table composed of the symbol tables of all input target files .

such as nm The command can be seen a.o Two undefined symbols for :
 Insert picture description here
These two undefined symbols are because the corresponding external symbols are used and have their relocation items , The relocation process needs to be able to find... In the global symbol table , Otherwise, it will be reported that the symbol you like to see is undefined .

2.2 relocation

Let's take a look first a.o Code segment , According to the disassembly instructions, we know that the two instructions that need to be relocated are circled in the red box , They correspond to each other “ take shared Take the address of as parameter number two ” and “ call swap function ”, On the right is the destination address to be used , All are 0, Obviously, it needs to be corrected .
 Insert picture description here
And in the ab in , You can see that it has been corrected to the correct address .
 Insert picture description here
Because it's a small end sequence , therefore shared yes 0x00404000, Consistent with the above calculation , Relatively clear , and call An instruction is a relative call based on one of the following instructions , So what I call here is ( below mov The address of + 0x07000000)= 0x00401025 + 0x00000007 = 0x40102c, It also matches the address we calculated above .

So here comes the question , How does the linker know which instructions in the code segment need to be relocated ? This will use the previous Introduction ELF The relocation table mentioned in the article of document content ,a.o The relocation table of is as follows :
 Insert picture description here
You can see that there are two places in the code segment that need to be relocated , It's the two places we circled manually above . With shared For example , The adult translation is .text Part of the 0x14 The position of... Is used shared The address of , Need to relocate , The type of relocation is R_X86_64_32, This type is direct addressing , Just change it to the absolute address . and swap Use of R_X86_64_PLT32 It is corrected in the way of relative addressing , Old version gcc yes R_X86_64_PC32, Not much here .

3、 ... and 、 summary

This article summarizes and introduces the core content of static links , In fact, there are many details involved , Those who are interested can go and have a look 《 Self cultivation of programmers 》 Related content in .

【 Reference resources 】
1.《 Self cultivation of programmers — link 、 Loading and storage 》
2.https://stackoverflow.com/questions/18296276/base-address-of-elf

版权声明
本文为[wxj1992]所创,转载请带上原文链接,感谢
https://cdmana.com/2022/134/202205141334531675.html

Scroll to Top