Catalog
After compiling, we will get .o Format of the target file , Every c perhaps cpp Every file will generate a .o, That is, a compilation unit corresponds to a .o, To generate an executable program , You need to coordinate and cooperate among various compilation units , There are various calling relationships between various compilation units , For example, accessing variables or calling functions across compilation units , Simply put, put each .o The process of organizing the content of into an executable file is to link , The reason why static links are called static is that they are done in advance at the compilation time , In contrast, there are dynamic links , An article will also be written later to introduce . Nowadays, linkers generally use a method called two-step linking to link statically , That is, the whole link process is divided into Allocation of space and address and Symbol analysis and repositioning Two phases . In the last article, we learned about the contents of the target file , In this article, let's take a look at how the compiler organizes object files to link .
Here is based on 《 Self cultivation of programmers 》 A simple example in the book to see the process of linker executing static links , There are two c file , That is, there are two compilation units :
/*a.c*/
extern int shared;
int main() {
int a = 100;
swap(&a, &shared);
return 0;
}
/*b.c*/
int shared = 1;
void swap(int *a,int *b) {
*a ^= *b ^= *a ^= *b;
}
perform gcc -c a.c b.c Get back a.o and b.o,ld a.o b.o -e main -o ab Link get ab Executable file .
One 、 Allocation of space and address
Because you want to merge multiple target files together , Naturally, it first involves the allocation of space and address , Because each target file has its own paragraph content , These contents should first be saved in a file , It also needs to be loaded into memory when executing , Therefore, it involves two types of address and space allocation :
(1) Space and address allocation in executable files , That is, how to organize and save the contents of each target file in the file .
(2) Allocation of virtual address space used during loading , That is, how each part is organized in memory at run time , Modern mainstream operating systems manage the memory of each program according to the virtual address space .
about (1), Relatively simple and direct , Save the similar segments of each target file together and summarize them into one file :
What's more complicated is (2), Because when the program runs, it needs to load all parts according to the virtual address , When using functions and variables, our code finds the object to be used according to the address , Now unified into a new file , The address of each symbol will naturally change . So first, we should Determine the loading address of the merged segment , Two is Know the address of each symbol after merging , In this way, the following symbol parsing and relocation can be carried out correctly . The space address allocation we focus on is mainly this part .
1.1 Merge similar segments and determine the loading address
It can be used objdump Command to view the address allocation before and after the link :
You can see ,.text Segment after link size Exactly equal to a.o and b.o Of the corresponding paragraph size The sum of the ,0x2c+0x4b=0x77, And two .o Of all the paragraphs vma(Virtual Memory Address) All are 0, That is, no virtual address space is allocated , And the executable ab Except for .comment Are assigned to the virtual address for loading ,.comment You don't need to load into memory, so you don't need . Note that the virtual memory address space is not from 0 At the beginning , Usually 32 Bit program is from 0x08048000 Start , and 64 Bit program is from 0x400000 Start , So in the picture above 401000 No wonder .
Here is a summary picture of self-cultivation , The book uses 32 Bit compiled , So the address is slightly different .
1.2 Determine symbol address
After determining the starting address of the segment of the final output executable file , We can also determine the virtual address of each symbol , Because of the relative position of each symbol in the segment offset Is constant , So we can pass the paragraph start Signed offset To determine the virtual address of the symbol , And the order within the segment is determined by ld The order of the target files in the command determines , Like what we use ld a.o b.o -e main -o ab Will make the final code segment main stay swap Before . In the above example, our custom global symbol has three , According to this calculation, we can get
Symbol | type | Virtual address |
---|---|---|
main | .text Function in segment | 0x401000 + 0x0(a.o In the code snippet offset) = 0x401000 |
swap | .text Function in segment | 0x401000 + 0x2c( In front main Size ) + 0x0(b.o In the code snippet offset) = 0x40102c |
shared | .data Variable in segment | 0x404000 + 0x0(b.o Within the data segment offset) = 0x404000 |
We can use nm Verify the correctness of the command :
such , We can get the address of each symbol in all target files , Save to a Global symbol table For symbol parsing and relocation .
Two 、 Symbol analysis and repositioning
After the space address allocation is completed , The linker is about to perform relocation , It is also the core content of static links , It refers to changing some addresses that cannot be determined in the independent compilation process of the compilation unit into real target addresses , That is, get the address of the required symbol from the global symbol table , Modify the address according to various addressing methods , And in the process , Symbol parsing is required ( Get the virtual address of the symbol according to the symbol name ).
2.1 Symbol resolution
The so-called symbol parsing can be understood as finding the target address of the symbol to be used , For example, in the previous example a.o External symbols are used shared and swap, In the process of relocation, it needs to be replaced with the real target virtual address , The linker finds the symbol to be used by looking up the global symbol table composed of the symbol tables of all input target files .
such as nm The command can be seen a.o Two undefined symbols for :
These two undefined symbols are because the corresponding external symbols are used and have their relocation items , The relocation process needs to be able to find... In the global symbol table , Otherwise, it will be reported that the symbol you like to see is undefined .
2.2 relocation
Let's take a look first a.o Code segment , According to the disassembly instructions, we know that the two instructions that need to be relocated are circled in the red box , They correspond to each other “ take shared Take the address of as parameter number two ” and “ call swap function ”, On the right is the destination address to be used , All are 0, Obviously, it needs to be corrected .
And in the ab in , You can see that it has been corrected to the correct address .
Because it's a small end sequence , therefore shared yes 0x00404000, Consistent with the above calculation , Relatively clear , and call An instruction is a relative call based on one of the following instructions , So what I call here is ( below mov The address of + 0x07000000)= 0x00401025 + 0x00000007 = 0x40102c, It also matches the address we calculated above .
So here comes the question , How does the linker know which instructions in the code segment need to be relocated ? This will use the previous Introduction ELF The relocation table mentioned in the article of document content ,a.o The relocation table of is as follows :
You can see that there are two places in the code segment that need to be relocated , It's the two places we circled manually above . With shared For example , The adult translation is .text Part of the 0x14 The position of... Is used shared The address of , Need to relocate , The type of relocation is R_X86_64_32, This type is direct addressing , Just change it to the absolute address . and swap Use of R_X86_64_PLT32 It is corrected in the way of relative addressing , Old version gcc yes R_X86_64_PC32, Not much here .
3、 ... and 、 summary
This article summarizes and introduces the core content of static links , In fact, there are many details involved , Those who are interested can go and have a look 《 Self cultivation of programmers 》 Related content in .
【 Reference resources 】
1.《 Self cultivation of programmers — link 、 Loading and storage 》
2.https://stackoverflow.com/questions/18296276/base-address-of-elf
版权声明
本文为[wxj1992]所创,转载请带上原文链接,感谢
https://cdmana.com/2022/134/202205141334531675.html