编程知识 cdmana.com

Notes on C + + compilation knowledge (II) -- Linux ELF file analysis


The file generated after the compiler compiles the source code is called the object file , That is to say .o file , It contains executable machine code and data , Here we use Linux Take the platform as an example to analyze the contents stored in the target file in detail . The object file is structurally , It's basically the same as an executable , The main difference is that there is no link , So maybe some symbols or addresses have not been adjusted , But the whole is similar . This article will take an object file as an example to talk about in detail Linux What exactly is in the target file under , And the format used by it and executable files ELF(Executable Linkable Format).

One 、ELF Format Overview

Linux The following executable file format is ELF, yes COFF(Common Object File Format, Early classes UNIX System use ) A variety of formats , Although the name is the executable file format , But it's not just the executable file, it's in this format , The target file is also saved according to this file type , Not only that ,Linux There are four types of documents in accordance with ELF In the format of , as follows :

ELF file type explain Example
Relocatable files (Relocatable) This kind of file contains code and data , It can be used to link to executable files or share target files , Static link libraries fall into this category Linux Of .o Target file
Executable file (Excutable File) Such files contain programs that can be executed directly , such as ELF Executable file ,Linux Generally, there is no extension /bin/bash file
Shared target file (Shared Object File).so file , This kind of file contains code and data , The linker can use this file to link with other target files and shared target files , Generate a new object file . Dynamic linker can combine the shared file, target file and executable file , Run as part of the process image /lib/glibc-2.5.so
Core dump file (Core Dump File) When the process terminates unexpectedly , The system can dump the contents of the address space of the process and some other information at the time of termination to the core dump file core dump file

For a skilled linux Platform c/c++ Developer , These file types should be familiar . in addition , As mentioned earlier ,.a The static link library can be understood as .o The packaging , Therefore, it also belongs to ELF type .

Regardless of the specific format , We already know that the object file holds what we need to execute the program , So obviously, there should be instructions and data needed for execution , This is the most basic , besides , There are also some information needed when linking , For example, symbol table 、 Debugging information 、 String, etc. . The general object file divides this information into different attributes and types , With “ paragraph ”(section or segment) Form storage of , Like code snippets 、 Data segments, etc .

In the following figure, a simple example is used to intuitively represent the... After the program is compiled ELF Target file format ( Some paragraphs have been omitted ):
 Insert picture description here
The first will be a file header , It describes this ELF File properties , Including whether it is an executable 、 Big end or small end 、 The target hardware architecture that the file adapts to . The file header is as follows :
 Insert picture description here
There is also a segment table in the file header , Contains the information of each paragraph , The paragraph table is as follows :
 Insert picture description here
You can see that there are 13 Segments , And what will be used later ojbdump -h The command will omit some non critical auxiliary paragraphs .

After the file header is the specific content of each paragraph .. The first segment is reserved by the system , Applications can also use some non reserved names to create custom segments . In the picture .text It's code snippets , Saved the compiled machine code ,.data It's data segments , preservation Initialized global and local static variables , and .bss Segments are saved Uninitialized global variables and local static variables , This is more confusing bss My name is mainly for historical reasons . One more sentence , The static variable here refers to that the declaration cycle is the variable of the whole program , stay c/c++ In the program , All global variables and static Decorated local variables belong to this type , as for Whether the global variable is controlled by static Keyword modification only affects the visibility of the global variable , added static The global variable of is only visible in this compilation unit , Each compilation unit can have the same name static Global variables . Don't combine static variables with static The keywords are confused .

Two 、 Common sections and corresponding uses

In addition to the data code mentioned above bss Duan et al , Here's a post 《 Self cultivation of programmers 》 One in a Book ELF Summary table of common paragraphs and corresponding uses of the document :

Paragraph name Content
.text Store the compiled machine code
.rodata Store read-only data , Generally, they are read-only static variables and string constants in the program
.data Save the initialized global and local static variables
.bss The store is uninitialized and initialized to 0 Global and local static variables
.rodata1 It is also a read-only data segment , Store string constant , overall situation const Variable , This paragraph and .rodata similar .
.comment Store compiler version information , such as “GCC:GNU4.2.0”
.debug Debugging information
.dynamic Dynamic link information
.hash field hashes
.line Row number table during debugging , That is, the corresponding table of source code line number and compiled instructions
.note Additional compiler information , Like the company name of the program 、 Release version number, etc
.strtabString Table String table , Used to store ELF Various strings used in files
.symtabSymbol Table The symbol table , From here you can find the symbols in the file
.shstrtab Name table of each segment , It's actually an array of strings made up of the names of each segment
.plt and .got Dynamic linked jump table and global entry table
.init and .fini Program initialization and termination snippets

3、 ... and 、 Target file content parsing

Here's a modified 《 Self cultivation of programmers 》 Take a practical look at each paragraph of the target file with the examples in the book

int printf(const char* format , ...);

int global_init_var = 1;
int global_uninit_var;

void func1(int i){
    
	printf("%d\n", i);
}
int main(void){
    
	static int static_init_var = 2;
	static int static_uninit_var;
    static const int static_const_init_var = 3;
	static const int static_const_uninit_var;
	const int const_init_var = 4;
	int init_var = 5;
	int uninit_var;
	func1(static_init_var + static_uninit_var + static_const_uninit_var + static_const_uninit_var + init_var + uninit_var);
	return init_var;
}

gcc -c test.c Compile to get text.o, the objdump The command view section is as follows :
 Insert picture description here
You can see that there is id from 0-5 Of 6 Segments ( Non critical paragraphs are omitted ), Use objdump -s -d test.o The command can see the contents of each segment. Press 16 The system is shown as follows :
 Insert picture description here
and 3(.bss) Because there is no actual content, it is not included , among 4、5、6 Is the segment used for auxiliary functions , Let's not discuss , So let's see 0 1 2 3 Three paragraphs , That's code snippets .text, Read only data segment .rodata, Data segment .data, as well as .bss paragraph .

3.1 Code segment .text

All saved in the code segment are machine codes , use objdump -s -d test.o Command can get the assembly code after disassembly , The contents are as follows :
 Insert picture description here
There is no more explanation of assembly statements here , You can see that the content corresponds to the two functions we wrote .

3.2 Read only data segment .rodata

.rodata, It's easy to understand according to the literal meaning ,read only data, and .data Paragraph is similar , But it's a read-only static constant ,
 Insert picture description here
You can see that there are two read-only data , Because byte order ( Big end, small end ) The order of relational bytes is opposite to our usual order ,0x25640a00 yes "%d\n" Corresponding asc2 Mom, add an ending \0, and 0x03000000 It's the corresponding static const int static_const_init_var = 3;

Only static variables or constants need to be defined in the data segment in advance , So you can see const int const_init_var This is not stored in the data segment , Instead, it is written directly in the instruction and temporarily allocated on the stack , You can refer to text Assembly code of segment .

3.3 Data segment .data

The data segment stores the initialized global static variables and local static variables ,0x01000000 and 0x02000000 They correspond to each other int global_init_var = 1; and static int static_init_var = 2;
 Insert picture description here

3.4 .bss paragraph

.bss paragraph (Block Started by Symbol) The uninitialized global variables and local static variables are saved , Actually, it's just place holder, The actual content will not be saved , It can be said that through .bss Segment to reserve space for variables , There is no need to occupy ELF File space , Loading into memory will actually take up space . In the above example, we can also see .bss The paragraph is in the list , But there is no .bss The content of the paragraph . A special case to note is initialization to 0 It may also be placed in the compiler as uninitialized .bss In order to save space .

It can be summarized as :

  • Uninitialized global/static data
  • “Block Started by Symbol”
  • “Better Save Space”
  • Has section header but occupies no space

This name is not as intuitive as other paragraphs , Those interested in further understanding can refer to 【 Reference resources 4】 and 【 Reference resources 5】.

3.5 Relocation table (Reloacation Table) Relevant paragraph .rela.xxx

The relocation table is used for relocation in the link phase , When generating each compilation unit independently, the addresses of many variables and functions cannot be determined , It needs to be corrected in the link phase , The following static links will explain this process in detail , Let's first look at the structure of the relocation table , Each segment requiring relocation operation will correspond to a relocation table segment , such as .text It corresponds to a .rela.text, It can be used objdump -r Command view , You can see that the above example program has two relocation table segments , Namely .text And .eh_frame Of .
 Insert picture description here
With printf Call to , That is to say .text Reposition the... In the table 2 Behavior example , This line means OFFSET by 1b Where you need to relocate in subsequent link stages ,.text Part of the 1b The position is right printf Of call The addressing part of an instruction , That is to say printf Your address needs to be relocated , The relocation type is R_X86_64_PC32, This is a relatively addressable relocation type , Talk about static links later .

3.6 String table .strtab and .shstrtab

ELF A lot of strings are used in the file , Such as Duan Ming 、 The variable name etc. , Usually by .strtab and .shstrtab Two segments are saved , Respectively String table (string table) and Segment table string table (section header string table), The former is used to save ordinary strings , Like the name of the symbol , The latter is used to save the string used in the segment table , Such as Duan Ming . Because the string is longer , Here is a continuous preservation and use \0 Division , adopt offset To get .

We can use readelf The order is typed :
 Insert picture description here
 Insert picture description here
Here's a ascii The code table is convenient for comparison :
 Insert picture description here
such as .strtab This string ,2e7374 72746162 Is the corresponding content , There's one at the front and one at the back \0, So there is one for 9 Of offset I can get it .

3.7 The symbol table .symtab

To link multiple target files together , In essence, it is to merge the contents of each target file and ensure the normal variable access and function call of each other at runtime , That is, the correct address can be found by accessing internal and external functions and variables , In the link, functions and variables are collectively referred to as symbols , Function names or variable names are symbolic names , The core of the whole link process is to determine the correct address according to the symbol . Each target file will have a corresponding symbol table (Symbol Table,.symtab paragraph ), All the symbols used in the target file are recorded , Note that all that is used , Whether it's an internal symbol or an external symbol . Each defined symbol has a corresponding value , It's called symbolic value , For variables and functions , That's their address .
 Insert picture description here
There are different types of symbols , The picture above shows :
(1)func1 and main The type is T, The explanation is in .text paragraph , And globally visible .
(2)global_init_var The type is D, Indicates that it is globally visible and in .data Part of the .
(3)global_uninit_var The type is C, Indicates that it is globally visible in common The block .
(4)printf The type is U, The description is undefined , The symbol is outside the compilation unit .
(5)static_const_init_var The type is r, Description in .rodata paragraph .
(6)static_const_uninit_var and static_uninit_var The type is b, Description in .bss paragraph .
(7)static_init_var The type is d, Indicates that it is locally visible and in .data paragraph , Case indicates visibility .

【 Reference resources 】
1.《 Self cultivation of programmers — link 、 Loading and storage 》
2.https://stackoverflow.com/questions/64626917/global-variables-and-the-data-section
3.https://stackoverflow.com/questions/1856599/when-to-use-static-keyword-before-global-variables
4.https://www.cnblogs.com/idorax/p/6400210.html
5.https://en.wikipedia.org/wiki/.bss

版权声明
本文为[wxj1992]所创,转载请带上原文链接,感谢
https://cdmana.com/2022/134/202205141334531756.html

Scroll to Top