编程知识 cdmana.com

In depth IOS static linker (I): ld64

 picture

Preface

Static links (static linking) It is an important link in program construction , It is responsible for analyzing compiler Wait for the output of the module .o.a.dylib、 Yes symbol Parsing 、 Redirect 、 polymerization , Assemble executable For runtime loader and dynamic linker To execute , Has a connecting role .

 picture

about iOS In terms of Engineering , At present, the main responsible for static links are ld64. Apple pairs ld64 Blessed with some functions , To fit iOS Construction of the project , such as :

  • Now in Xcode Even if you don't actively manage the dependent system dynamic library ( Such as UIKit), Your project can also be linked successfully
  • Provide “ Force loading in static library ObjC class and category” The switch of ( Default on ), Give Way ObjC The information is not lost in the output

The implementation of a large number of features is also completed in the step of static linking , Such as :

With the help of component binarization 、 Custom build system and other optimization methods , At present, the efficiency of incremental construction in large-scale projects has been significantly improved , However, static links are still necessary for each execution “ contribution ” It takes most of the time . understand ld64 The working principle of can help us deepen our understanding of the construction process 、 Find ways to improve link speed 、 And explore more possibilities for quality and experience optimization .

Catalog

  • Historical background
  • Concept bedding
  • ld64 Command parameter
  • ld64 Execute the process
  • ld64 on iOS
  • other

One 、 Historical background

  • GNU ld:GNU ld, Or say GNU linker, yes GNU Project pair Unix ld Implementation of commands . It is GNU binary utils Part of , There are two versions : Traditional based on BFD & Only support ELF Of gold .(gold from Google Team development ,2008 Included in GNU binary utils. At present, with Google Focus on llvm Of lld On ,gold Little maintenance ).ld The name of is said to come from LoaDer Link eDitor .
  • ld64:ld64 It's apple for you Darwin The system was redesigned ld. and ld The biggest difference is ,ld64 yes atom-based instead of section-based( About atom The introduction of will expand later ). stay macOS On the implementation ld /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld ) The default is ld64. Systems and Xcode The built-in version can be through ld -version_details Inquire about , Such as 650.9. Here are the apples https://opensource.apple.com/tarballs/ld64/ It's open ld64 Source code , But the update is not so timely , Always behind the official version ( Such as 2021.8 So far, the latest open source is 609 edition ,Xcode 12.5.1 yes 650.9) .zld Based on ld64 All the projects are fork From the open source version ld64.

Two 、 Concept bedding

Introducing ld64 Before the execution process , A few concepts need to be understood first .

2.1 Input — .o.a.dylib

ld64 Main treatment Mach kernel Upper Mach-O Input , Include :
  • Object File ( .o )
    • from compiler Generate , Contains metadata (header、LoadCommand etc. )、segments & sections( Code 、 data etc. )、symbol table & relocation entries.
    • object file May depend on each other ( Such as A Refer to the B Defined function ),static linker In essence, what we do is to correlate these information and output it into a total effective Mach-O .
 picture
  • Static library ( .a )
    • Can be regarded as .o Set , Let the engineering code be organized and reused in a modular way .
    • Its head also stores symbol name -> .o offset Mapping table , Easy link Quickly query a symbol Attribution of .
    • A static library may contain multiple schemas (universal / fat Mach-O),static linker The target schema is selected as needed during processing . Can pass lipo And other tools to view its architecture information .
 picture
  • Dynamic library ( .dylib .tbd )
    • Different from static library , The dynamic library consists of dyld Run through rebase、binding Wait for the process to load .static linker stay link Only when dealing with undefined symbol Will try to query each dynamic library from the list of dynamic libraries entered export Of symbol.
    • iOS Most of the systems used in the project are system dynamic libraries (UIKit etc. ), Engineering can also be framework And other forms to provide their own dynamic library ( You need to specify a pair of rpath So that the custom dynamic library can be dyld Load normally )
    • .tbd (text-based dylib stub) It's apple. Xcode 7 A description introduced later dylib File format , Include supported architectures 、 Export which symbol Etc . Through analysis .tbd ld64 You can quickly know what to do dylib What are provided symbol Can be used to link & What other dynamic libraries depend on , Instead of parsing the whole dylib. At present, most systems are dylib All in this way .
    • Such as Foundation:
--- !tapi-tbd
tbd-version:     4
targets:         [ i386-ios-simulator, x86_64-ios-simulator, arm64-ios-simulator ]
uuids:
  - target:          i386-ios-simulator
    value:           A4A5325F-E813-3493-BAC8-76379097756A
  - target:          x86_64-ios-simulator
    value:           C2A18288-4AA2-3189-A1C6-5963E370DE4C
  - target:          arm64-ios-simulator
    value:           81DE1BE5-83FA-310A-9FB3-CF39C14CA977
install-name:    '/System/Library/Frameworks/Foundation.framework/Foundation'
current-version: 1775.118.101
compatibility-version: 300
reexported-libraries:
  - targets:         [ i386-ios-simulator, x86_64-ios-simulator, arm64-ios-simulator ]
    libraries:       [ '/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation'
                       '/usr/lib/libobjc.A.dylib' ]
exports:
  - targets:         [ arm64-ios-simulator, x86_64-ios-simulator, i386-ios-simulator ]
    symbols:         [ '$ld$hide$os10.0$_OBJC_CLASS_$_NSURLSessionStreamTask''$ld$hide$os10.0$_OBJC_CLASS_$_NSURLSessionTaskMetrics'
                        ....
                       _NSLog, _NSLogPageSize, _NSLogv, _NSMachErrorDomain, _NSMallocZone, 
                       ....]

2.2 Symbol & Symbol Table

Yes static linker Come on ,symbol yes Mach-O Provided 、link The basic elements that need to be referred to .

Mach-O There is a special area for storing all symbol, namely symbol table.

global function、global variable、class And so on will be one by one entry Be put in symbol table in .

 picture

Symbol Contains the following properties :

  • name : The specific generation rules are provided by compiler decision . Such as C variable _someGlolbalVar 、C function _someGlobalFunction 、 ObjC class __OBJC_CLASS_$_SomeClass 、 ObjC method -[SomeClass foo] etc. . Different compiler Different name mangling Strategy .
  • yes “ Definition ” still “ quote ”: The corresponding function 、 Variable “ Definition ” and “ quote ”.
  • visibility: If it is “ Definition ”, also visibility To control the visibility of other files ( See the following text for details 「visibility」)、
  • strong / weak: If it is “ Definition ”, also strong / weak To control multiple “ Definition ” Merge strategy when it exists ( See the following text for details 「strong / weak definition」.

Mach-O symbol table entry The specific data structure can refer to file (https://github.com/aidansteele/osx-abi-macho-file-format-reference#nlist_64) or Source code (https://opensource.apple.com/source/xnu/xnu-4570.71.2/EXTERNAL_HEADERS/mach-o/nlist.h.auto.html)

2.3 Visibility

Mach-O Lieutenant general symbol Divided into three groups :

  • global / defined external symbol : Externally available symbol Definition
  • local symbol: This document defines and references symbol, Only this file is available ( Such as being static Mark )
  • undefined external symbol: Depending on the outside symbol quote
attribute explain give an example
global / defined external symbol Defined by this file , Visible to the outside int i = 1;
local symbol Defined by this file , Invisible to the outside static int i = 1;
undefined external symbol Reference to external definitions extern int i;

You can view the Mach-O LoadCommand Medium LC_DYSYMTAB To get three groups symbol The offset and size of

 picture

visibility To determine the symbol definition stay link when Whether it is visible to other files . Above said local symbol Invisible to the outside world ,global symbol It can be seen from the outside that .

global symbol There are two categories in :normal & private external. If it is private external( Corresponding Mach-O in N_PEXT Field ) ,static linker This... Will be in the output symbol To local symbol. It can be understood as the symbol definition Only this time link Visible in the process , follow-up link If the product is to be reused link, It's invisible to the outside ( Embodies the private The nature of )

One symbol Whether it is 「private external」 It can be used in source code and compilation time __attribute__((visibility("xxx"))) To mark , Optional value is default(normal)、hidden(private external)

  • Don't specify __attribute__((visibility("xxx"))) Of , The default is default
    • -fvisibility You can change the default visibility (gcc、clang All support )
  • Appoint __attribute__((visibility("xxx"))) Of ,visibility by xxx

give an example :

// test.c

__attribute__((visibility("default"))) int i1Default = 101;
__attribute__((visibility("hidden"))) int i1Hidden = 102;
int i1Normal = 103;

Don't specify -fvisibility

 picture

-fvisibility=hidden

 picture

2.4 Strong / Weak definition

symbol definition There is also strong / weak Points : When static linker Found multiple name same symbol definition when , Will be based on strong/weak Type to perform the following Merge Strategy :

  1. There are many. strong => illegal input ,abort
  2. There is and only one strong => Take this strong
  3. There are many. weak, No, strong => Take the first one weak

symbol definition The default is basically strong, You can use... In the source code __attribute__((weak))#pragma weak Mark weak attribute , Take an example :

// main.c

void __attribute__((weak)) foo() {
  printf("weak foo called");
}

int main(int argc, char * argv[]) {
  foo();
}

// strong_foo.c
void foo() {
  printf("strong foo called");
}

Generated main.o Corresponding to this function in symbol table entry Marked as N_WEAK_DEF,static linker Distinguish between strong / weak:

 picture

Output after execution :

strong foo called

It should be noted that , Analyze which... Is used for the final output symbol definition It needs to be combined with the actual situation . For example, a certain strong symbol Encapsulated in a static library , Never been static linker load , And the same name weak symbol Already loaded , Above (2) Our strategy should become (3) 了 .( About static library symbol See the following text for the loading mechanism of )

2.5 Tentative definitions / Commons

symbol definition It could be tentative definition( Or call it common definition). This is actually very common , such as :

int i;

Such an uninitialized global variable is a tentative definition.

A more official definition is :

A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static

It's a little windy. Don't be brought in , It's easy to understand tentative definition by 「 Uninitialized global variable definition 」. Combine more examples to understand :

int i1 = 1; // regular definition,global symbol
static int i2 = 2; // regular definition,local symbol
extern int i3 = 3; // regular definition,global symbol
int i4; // tentative definition, global symbol
static int i5; // tentative definition, local symbol

int i1; // valid tentative definition, refers to  The first  1  That's ok
int i2; // invalid tentative definition,visibility  And the  2  Yes  static  Conflict
int i3; // valid tentative definition, refers to  The first  3  That's ok
int i4; // valid tentative definition, refers to  The first  4  That's ok
int i5; // invalid tentative definition,visibility  And the  5  Yes  static  Conflict

tentative definition stay Mach-O Of __DATA,__common This section.

2.6 Relocation (Entries)

compiler All... Cannot be determined at compile time symbol The address of ( Such as the call to an external function ), So it will Mach-O Corresponding position “ leave a blank ”、 And generate a corresponding Relocation Entry.static linker Pass... During the link period Relocation Entry Know every section Which locations in the need to be relocate、 how relocate.

Load Command Medium LC_SEGMENT_64 Describes each section Corresponding Relocation Entries The number of 、 Offset :

 picture

Mach-O of use relocation_info It means one Relocation Entry:

  • r_address : From the section The content of how much the head starts to offset needs to be relocate
  • r_extern & r_symbolnum
    • r_extern by 1 From symbol table Of the r_symbolnum individual symbol Read information
    • r_extern by 0 Says from the first r_symbolnum individual section Read information
  • r_type :relocation The type of , Such as X86_64_RELOC_BRANCH Express relocate Yes. CALL/JMP The content of the instruction

Refer to the document for field details https://github.com/aidansteele/osx-abi-macho-file-format-reference#relocation_info.

2.7 ld64 — Atom & Fixup

ld64 It's a kind of atom-based linker,atom It is the basic unit of processing .atom It can be used to express symbol, It can also be used to represent other information , Such as SectionBoundaryAtom.ld64 When parsing, it will put input files Abstract into various atoms, Leave it to Resolver Unified treatment .

comparison section-based linker ,atom-based linker Treat the processing object as a atom graph, The finer granularity facilitates the application of various graph algorithms , It can also implement various features more directly .

Atom It has the following properties :

  • name, Corresponding to the above Symbol Of name
  • content
    • Functional content Is the code instruction it implements
    • Of global variables content Is its initial value
  • scope, Corresponding to the above Symbol Of visibility
  • definition kind, There are four kinds. , adopt Mach-O Symbol Table Entry Of N_TYPE You've got to
    • regular: majority atom It's this type
    • absolute: Corresponding N_ABS ,ld64 Its value will not be modified
    • tentative: N_UNDF , Corresponding to the above Symbol Of tentative definition
    • proxy:ld64 In the parsing phase, if a symbol Provided by dynamic library , Will create a proxy atom placeholder

One atom There may be a group fixup,fixup As the name suggests, it is used to express in link How to correct atom content A data structure of .object file Of Relocation Entries Provides the initial fixup Information ,ld64 During execution, it may also be atom Generate extra fixup.

fixup It describes atom Dependencies between , yes atom graph Medium 「 edge 」,dead code stripping These dependencies are needed to determine which atom Not needed 、 You can remove .

One fixup Contains the following properties :

  • kind:fixup The type of , There are dozens of , Such as kindStoreX86PCRel32
  • offset: Corresponding Relocation Of offset
  • addend: Corresponding Relocation Of addend
  • target atom: Point to the atom
  • binding type:binding Strategy (by-name、by-content、direct、indirect
type Realization explain
direct The record points to the target Atom Of pointer Usually by the same object file For some anonymous 、 Immutable target atom Reference generation for , As in the same object file Call inside static function
by-name The record points to the target Atom name(c-string) The pointer to quote global symbol, For example, call printf
indirect The record points to atom indirect table One of them index The pointer to Not input file Provide , Only by linker stay link Stage generation , Can be used for atom After the merger case

Take a simple example :

// Foo.h
extern const int someGlobalVar;

int someGlobalFunction(void);


// Foo.m
const int someGlobalVar = 100;

int someGlobalFunction() {
  return 123;
}


// main.m
#import "Foo.h"

int main(int argc, char * argv[]) {
  int i = someGlobalVar;
  someGlobalFunction();
}

In the above code main.m Called Foo.h Defined global variables someGlobalVar and function someGlobalFunction,compiler Generated main.o and  Foo.o  With Next symbol:

 picture

link when ld64 It will be converted into the following atom graph:

 picture

Node information (atom) from main.o and Foo.o Of symbol table Provide , Side information (fixup) from main.o Of relocation entries Provide .

If it involves ObjC, The reference relationship will be more complex , later 「-ObjC The origin of 」 This section will expand in detail .

2.8 ld64 — Symbol Table

ld64 One was maintained internally SymbolTable object , It contains all the processed symbol, It also provides various interfaces for fast query .

Go to SymbolTable Increase in atom The merge operation will be triggered , There are two main types

  1. by-name:name same atom Can be combined into one , As mentioned earlier Strong / Weak & Tentative Definition
  2. by-content:content same atom Can be combined into one , Such as string constant

SymbolTable The core data structure is _indirectBindingTable, This thing is actually a storage atom Array of , Every atom Will be parsed in order append To this array ( If not merged ).

meanwhile SymbolTable It also maintains multiple mapping, Auxiliary is used for external name、content、references Query a atom Various needs of .

class SymbolTable : public ld::IndirectBindingTable
{
private:

// core vector 
std::vector<const ld::Atom*>&        _indirectBindingTable;

// for by-name query
NameToSlot                           _byNameTable;

// for by-content query
ContentToSlot                        _literal4Table;
ContentToSlot                        _literal8Table;
ContentToSlot                        _literal16Table;
UTF16StringToSlot                    _utf16Table;
CStringToSlot                        _cstringTable;

// fo by-reference query
ReferencesToSlot                     _nonLazyPointerTable;
ReferencesToSlot                     _threadPointerTable;
ReferencesToSlot                     _cfStringTable;
ReferencesToSlot                     _objc2ClassRefTable;
ReferencesToSlot                     _pointerToCStringTable;
}

ld64 stay Resolve Stage performs the merge 、 Handle undefined All operations are based on this SymbolTable To complete .

3、 ... and 、ld64 Command parameter

iOS Generally, it will not be triggered actively in the project ld64, Can be in Xcode build log Find linking Corresponding clang command , Copied to the terminal add -v To output clang Called ld command .

ld64 The parameter form of the command is :

ld files...  [options] [-o outputfile]

A simple project ld64 The parameters are roughly as follows :

ld -filelist xxx -framework Foundation -lobjc -o yyy 

among

  • -o Appoint output The path of
  • input files There are several ways to enter
    • Passed directly as a command line parameter

    • adopt -filelist Pass in... As a file , The file separates each... With a newline character input file
    • By searching the path
      • -lxxx , tell ld64 Go to lib Search path found libxxx.a perhaps libxxx.dylib
        • lib The default search path is /usr/lib and /usr/local/lib
        • Can pass -Lpath/to/your/lib   To add extra lib Search path
      • -framework xxx , tell ld64 Go to framework Search path found xxx.framework/xxx
        • framework Search path Yes /Library/Frameworks and /System/Library/Frameworks
        • Can pass   -Fpath/to/your/framework To add extra framework Search path
    • If you specify -syslibroot /path/to/search , Will give lib and framework Search paths are added /path/to/search The prefix of ( Such as iOS Simulators are usually put together in the shape of /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.5.sdk The path of )

  • other options

Four 、ld64 Execute the process

From a top-level perspective ,ld64 Receive a set of input files and options, Output executable( notes :ld64 Also support dylib Other types of output , As follows: executable For example )

 picture

The execution logic can be divided into the following 5 A big stage :

  1. Command line processing
  2. Parsing input files
  3. Resolving
  4. Passes/Optimizations
  5. Generate output file
 picture

Command Line Processing

First step yes Parsing command line arguments . Compare intuitive , Is to model the command line parameter string into a string in memory  Options  object , It is convenient for subsequent logic reading .

This step mainly does two things :

  1. Put everything on the command line input, convert to input file paths. As mentioned above, it is... On the command line ld64 Appoint input files There are several ways to enter (-filelist、 The logic of various search paths and so on ) Will be converted and parsed into actual input files The absolute path of

  2. Put other command line parameters ( Such as -dead_strip) Deposit in Options In the corresponding field

 picture

Specific implementation can refer to Options.cpp in Options Constructor for :

// create object to track command line arguments
Options options(argc, argv);

Parsing input files

The second step is analysis input files. Traverse the first step to resolve input file paths, from file system Read the contents of the file for further analysis and conversion into

atom、fixup、sections Etc , for Resolver Use later .

ld::tool::InputFiles inputFiles(options);
 picture

Mentioned above input files The main points by .o.a.dylib Three types of ,ld64 When parsing different types of files , The corresponding... Of the file will be called parser To deal with it ( Such as .o yes mach_o::relocatable::parse), And return the corresponding ld::File Subclass ( Such as .o yes ld::relocatable::File), It smells like a factory model .

analysis .o

.o yes ld64 obtain section and atom The direct source of information , Therefore, it is necessary to scan deeply .

 picture

mach_o::relocatable::parse

1. Read Header and Load Command

  • LC_SEGMENT_64 Provide various section Information about ( Location 、 size 、relocation Location 、relocation Number of entries, etc )

  • LC_SYMTAB Provide symbol table Information ( Location 、 size 、 Number of entries )

  • LC_DYSYMTAB  Provide symbol table Classified statistics

    • local symbol Number ( This document defines symbol, Not visible from the outside )
    • global / defined external symbol Number ( This document defines symbol And externally visible )
    • undefined external symbol Number ( Externally defined symbol)
      LC_LINKER_OPTION
    • Mach-O Used to identify linker option Of Load Command,linker Will read these options Supplement
    • such as  auto-linking  Other characteristics , Rely on this Load Command To achieve ( Inject something similar -framework UIKit Parameters of )
    • Other information such as LC_BUILD_VERSION  

2. Yes section and symbol Sort by address : because Mach-O The order may be chaotic

3. makeSections: according to LC_SEGMENT_64 establish Section Array , Deposit in _sectionsArray

4. Handle __compact_unwind and __eh_frame

5. establish _atomsArray: Traverse _sectionsArray, Put each section Of atom Join in _atomsArray

6.makeFixups: establish fixup

    • Traverse _sectionsArray , Read the section Of relocation entries
    • convert to FixupInAtom
    • Deposit in _allFixups ( vector<FixupInAtom> )

analysis .o A logical reference to ld::relocatable::File* Parser<A>::parse.

analysis .a

Handle .a At first, we only deal with .a Of symbol table (.a Of symbol table What's stored is symbol name -> .ooffset, Only each .o Of global symbols), You don't need to put all the inside .o Analyze it one by one .Resolver stay resolve undefined symbol Will come to find .a Of symbol table And load the corresponding... As needed .o.

 picture

archive::Parser<A>::parse

  1. Read header Verify that the file is .a
  2. Read .a symbol table header, obtain symbol table Number of entries
  3. hold symbol table Save mapping to _hashTable in

analysis .dylib / .tbd

 picture

mach_o::dylib::parse

  1. Read Header and Load Command( and .o  similar )

  • LC_SEGMENT_64 、LC_SYMTAB 、LC_DYSYMTAB  Equal sum .o similar

  • LC_DYLD_INFO、 LC_DYLD_INFO_ONLY Provide dynamic loader info

    • rebase info
    • binding info
    • weak binding info
    • lazy binding info
    •   export info
    • Other information such as  LC_RPATHLC_VERSION_MIN_IPHONEOS

2. according to LC_DYLD_INFO、 LC_DYLD_INFO_ONLY、 LC_DYLD_EXPORTS_TRIE  Information provided , Deposit in  _atoms

Subsequent external queries dylib whether export A symbol is essentially a query _atoms .

If you're dealing with .tbd, The key is to get two pieces of information :

  1. carry For what export symbol ( Such as Foundation Of _NSLog
  2. What other dynamic libraries does the dynamic library depend on ( Such as Foundation rely on CoreFoundation & libobjc)

ld64 With the help of TAPI(https://opensource.apple.com/source/tapi/tapi-1.30/Readme.md) Come on parse .tbd file ,parse End ( It's actually a tune yaml The parsing library parsed it again ) Adjustable interface (tapi::LinkerInterfaceFile) Get structured information directly .

 picture

Fat file

ld64 Support fat Multi architecture Mach-O analysis .

stay InputFiles::makeFile You can see the logic to take out the target architecture :

 picture

pthread Multithreading

  • It is worth mentioning that , Considering the difference input files The parsing process is independent of each other ,ld64 Use pthread Implemented a worker pool Concurrent processing input files(worker Sum of numbers CPU The number of logical cores is the same )
  • pthread Logical reference InputFiles::InputFiles Constructor for

Resolving

The third step is to call Resolver hold input files All of the atoms The summary is related to atom graph And deal with , yes 「 link 」 Core module .

 picture
 picture

There are a lot of logic here , Pick the core process to understand .

1. buildAtomList

This step is responsible for starting from the parsed input files Extract all initial atom And join the global SymbolTable in .

Traverse inputFiles and parse

  • Judge input file stay InputFiles::InputFiles Whether the stage has been parse End
    • has parse End , Go to the next step
    • no parse End , Try starting a pthread worker Handle inputFile( Execute logic and first step 「 analysis Input」 Same as in ), and pthread_cond_wait wait for

load .o Of atoms

parse Stage ld64 Has gone from object file Of symbol table and relocation entries It abstracts _atoms, This step can be handled one by one .

Resolver::doAtom Deal with individual atom The logic of :

  1. SymbolTable::add only global symbol & undefined external symbol,local symbol Don't deal with )

    • If name It didn't show up ,append To _indirectBindingTable For the definition, see 「 Concept bedding — Symbol Table」
    • If name There have been , consider strong / weak etc. symbol definition Conflict resolution strategies
    • Update several auxiliary photos synchronously mapping surface NameToSlot ContentToSlot ReferencesToSlot
  2. Traversing this atom Of fixup, Try to put by-name / by-content Of reference Turn into by-slot( Point directly to the corresponding _indirectBindingTable Corresponding atom)

load .a Of atoms

buildAtomList Phase theoretically does not need to deal with static libraries at all , Because only in the back resolve undefined symbol It is possible to query the... Contained in the static database symbol. But in two cases , In this step, we need to do some research on the .o Expand processing :

  1. If it's time to .a suffer -all_load or -force_load influence , mandatory load all .o
  2. If ld64 Open the -ObjC , mandatory load All contain ObjC class and category Of .o (symbol name contain _OBJC_CLASS_ .objc_c

load The process is the same as that mentioned earlier object file Of parse & load atoms equally .

Static library File There is also a..., which is maintained inside the object MemberToStateMap, To record .o Of load state

load .dylib Of atoms

buildAtomList Stage No add Dynamic library atoms, But there will be some additional processing and verification , Include bitcode bundle(__LLVM, __bundle)、 Swift framework Dependency checking 、Swift Version check, etc .

2. resolveUndefines

here SymbolTable Has been collected in input files Most of them atom, The next step is to put the unknown symbol The reference is associated with the corresponding symbol By definition .

  1. Traverse SymbolTable in undefined symbol ( By reference But there is no corresponding atom The entity's symbol definition)

  2. For each undefined symbol , Try to go to the static library & Find... In the dynamic library

    • Static library : As mentioned earlier, the static library maintains a symbol name -> .o offset Of mapping, So judge a symbol definition Whether it belongs to the static library only needs to go to this mapping Just check it out . If you find , Then the corresponding .o 、 And put the .o Of atoms Join in SymbolTable in ( .o For the loading logic of, refer to the previous article Parsing input files and buildAtomList)
    • Dynamic library : If it matches the of a dynamic library exported symbol,ld64 It's time to undefined atom Create a proxy atom Represents a reference to a dynamic library .
  3. If the static library & Not even in the dynamic library , Judge whether it is section$segment$ etc. boundary atoms, And manually create the corresponding symbol definition

  4. Handle tentative symbol

  5. If -undefined No error( Command line arguments control discovery undefined symbol No error is reported when )、 Or hit -U( ginseng Number controls some undefined symbol Don't complain ), that ld64 Will manually create a UndefinedProxyAtom As its symbol definition

It is possible to introduce new methods in the process of searching static and dynamic libraries undefined symbol, Therefore, after one traversal, you need to judge the condition and re traverse as needed .

3. deadStripOptimize

Next, the execution starts -dead_strip The post logic . All the atom And the reference relationship between them has been recorded in SymbolTable in , Can put all of atom Abstract into atom graph To remove useless... That are not referenced atom.

  1. initialization root atoms
  2. 1.entry point atom( Such as _main
    2. All being -u ( Force load a symbol, Even in static libraries )、 -exported_symbols_list -exported_symbol ( stay output As global symbol Output ) Hit atoms
    3.dyld Related stub atom
    4. All are marked as dont-dead-strip Of atom( The atom Corresponding section stay .o Marked as S_ATTR_NO_DEAD_STRIP
  3. from root atoms Begin to pass fixup Traverse atom graph, Take what they can traverse atoms They are all marked with live
  4. remove dead atom

4. removeCoalescedAwayAtoms

Go through it atoms, Remove all merged atom.

(Symbol Consolidated reference 「 Concept bedding — Symbol」)

5. fillInInternalState

Go through it atoms, Put them according to their section Classified storage .

Passes/Optimizations

thus , We already have the ability to write output The complete 、 Relevant information (sections & Corresponding atoms). Before the output , You also need to perform multiple rounds of 「Pass」. One Pass Corresponding to the code logic that implements a specific feature , Such as

  • ld::passes::objc
  • ld::passes::stubs
  • ld::passes::dylibs
  • ld::passes::dedup::doPass
  • ...
 picture

pass Execute sequentially , individual pass The order of execution will also be enforced to ensure the correctness of the output .

Each project can be adjusted according to the actual needs pass.

Generate Output files

The last step is to output output files.ld64 The output of includes the main output Files and other auxiliary outputs such as link map、dependency info etc. .

 picture

Before the official output ,ld64 Some other operations were also performed , Include :

  • ...
  • synthesizeDebugNotes
  • buildSymbolTable
  • generateLinkEditInfo
  • buildChainedFixupInfo
  • ...

among buildSymbolTable Responsible for building output file Medium symbol table.「 Concept bedding — Symbol」 Mention every symbol stay link The stage has its own visibility, Used to control link Visibility to other files when . Empathy , stay link Output after Mach-O These symbol Now belongs to a new file , At this point, their visibility To be ld64 Readjust according to various processing strategies :

  1. The above mentioned is marked as private extern Of symbol, This step is converted to local symbol
  2. ld64 Various parameters are also provided to control this behavior , Such as -reexport-lx -reexport_library -reexport_framework ( Appoint lib Of global symbol stay output Continue for global)、 -hidden-lx ( Appoint lib Medium symbol stay output Transfer to hidden)

After all the above operations are finished ,ld64 I'll take it FinalSection Write happily output file 了 , The general logic is as follows :

  • Open up a piece of memory , Maintain a current write location offset The pointer
  • Traverse FinalSection Array
    • Traverse atoms

    • If it is created by dynamic library proxy atom, skip ( Does not occupy space in the output file )

    • hold atom content Write current offset

    • Traverse fixups(applyFixUps), according to fixup Type correction of atom content The content of the corresponding position


5、 ... and 、ld64 on iOS

Auto Linking

auto linking It's a way of saying that you don't have to take the initiative -l-framework etc. lib Dependence can also make linker A mechanism that works properly .

such as :

  • A source file declaration depends on #import <AppKit/AppKit.h>
  • link When not specified -framework AppKit
  • Compile generated .o Of LC_LINKER_OPTION With medium -framework AppKit

Or, :

  • A source file declares #import <zlib.h>
  • /usr/include/module.modulemap Content
module zlib [system] [extern_c] {
 header "zlib.h"
 export *
 link "z"
}
  • link When not specified -lz
  • Compile generated .o Of LC_LINKER_OPTION With medium -lz

Realization principle :compiler compile .o when , analysis import, Take what you depend on framework Write last Mach-O Inside LC_LINKER_OPTION( The corresponding... Is stored -framework XXX Information )

 picture

It should be noted that , Turn on Clang module when (-fmodules) Auto on auto linking . It can be used -fno-autolink Active shut down .

-ObjC The origin of

As mentioned earlier, it turns on -ObjC after ,ld64 Will be parsing symbols search lib When loading, each static library contains ObjC class and category Of .o. What is the reason for that ?

It can be found that :

  • ObjC Of class Define correspondence symbol Of visibility by global ( Define your own 、link External files are visible when )
  • ObjC Of class Call the corresponding symbol Of visibility by undefined external ( External definition 、 need link when fixup)
  • ObjC Of method Define correspondence symbol Of visibility by local ( Invisible to the outside )
  • ObjC Of method The call does not generate symbol

Suppose there are now two classes ClassA & ClassB


// ClassA.m


#import "ClassB.h"

@implementation ClassA

- (void)methodA
{
  [[ClassB new] methodB];
}

@end



// ClassB.m

@implementation ClassB

- (void)methodB
{
   
}

@end

After compiling ,ClassA.o

  • global symbol:...
  • local symbol:...
  • undefined external symbol: _OBJC_CLASS_$_ClassB

ClassB.o

  • global symbol: _OBJC_CLASS_$_ClassB
  • local symbol: -[ClassB methodB]
  • undefined external:...

although ClassA Called ClassB Methods , but Class A Generated object file Of symbol table There are only _OBJC_CLASS_$_ClassB This pair ClassB Class itself reference, Not at all -[ClassB methodB]. In this case , according to ld64 Normal parsing logic , Not because ClassA Chinese vs methodB Call to find ClassB.m The definition of ( Not at all undefined external)、 Even if you want to find ,ClassB And didn't expose this method Of symbol (local symbol Invisible to external files ).

In that case ,ObjC Of method Define why it is not ld64 Think it's dead code and strip Drop it

Because ObjC Of class The definition will indirectly refer to its method Definition . For example, above ClassB In the case of ,atom The dependencies are as follows :

_OBJC_CLASS_$_ClassB -> __OBJC_CLASS_RO_$_ClassB->

__OBJC_$_INSTANCE_METHODS_ClassB->-[ClassB methodB]

As long as the class The definition is referenced , So it's all method Definitions will also be considered together as live code And keep it .

Look at the introduction of Category After the :

  • hypothesis B Defined ClassB and methodB
  • C yes B Of category, Defined ClassB Of methodBFromCategory
  • A Refer to the ClassB and methodB methodBFromCategory

In this case :

  • because A Refer to the B Of ClassB, therefore B To be ld64 load .
  • although A Refer to the C Of methodBFromCategory , but A No resolution methodBFromCategory The need for this symbol ( No generation ), therefore ld64 No need to load C.

In order for the program to execute correctly ,C Of methodBFromCategory The definition must be ld64 link Come in . There are two situations :

  1. If C In the main project ,ld64 Need to parse directly C Generated object file, And generate the following atom rely on :

objc-cat-list-> __OBJC_$_CATEGORY_ClassB_$_SomeCategory

-> __OBJC_$_CATEGORY_INSTANCE_METHODS_ClassB_$_SomeCategory ->

-[ClassB(SomeCategory) methodBFromCategory]

among objc-cat-list Express all ObjC Of categories, stay dead code strip The initial phase is marked live, therefore methodBFromCategory Will be link Into the executable Without being cut .

  1. If C Encapsulated in a static library ,link when ld64 There is no incentive to load C, methodBFromCategory Has not been link Into the executable, Resulting in the final runtime ClassB The... Was not loaded category、 Error in execution .

That's why -ObjC This switch , Ensure that... Is defined separately in the static library ObjC category By link Into the final output in .

current Xcode In general, it is turned on by default -ObjC, But for compatibility category And brute force loads all of the static libraries ObjC class and category The implementation of is not the most perfect solution , Because it may be in link The stage loads a lot of things that don't need to be loaded ObjC class. In theory, we can do it artificially category Establish a reference relationship between definition and reference to make ld64 In an open -ObjC Can also load category, such as IGListKit I tried to inject some manually weak Of dummy Variable (PR https://github.com/Instagram/IGListKit/pull/957) , However, in order not to deteriorate, this method will also bring some maintenance costs , Therefore, we also need to weigh .

ld64 Chinese vs -ObjC Refer to src/ld/parsers/archive_file.cpp

bool File<A>::forEachAtom(ld::File::AtomHandler& handler) const
{
    bool didSome = false;
    if ( _forceLoadAll || _forceLoadThis ) {
        // call handler on all .o files in this archive
        ...
    }
    else if ( _forceLoadObjC ) {
        // call handler on all .o files in this archive containing objc classes
        for (const auto& entry : _hashTable) {
            if ( (strncmp(entry.first, ".objc_c", 7) == 0) || (strncmp(entry.first, "_OBJC_CLASS_$_", 14) == 0) ) {
                const Entry* member = (Entry*)&_archiveFileContent[entry.second];
                MemberState& state = this->makeObjectFileForMember(member);
                char memberName[256];
                member->getName(memberName, sizeof(memberName));
                didSome |= loadMember(state, handler, "-ObjC forced load of %s(%s)\n", this->path(), memberName);
            }
        }
        // ObjC2 has no symbols in .o files with categories but not classes, look deeper for those
        const Entry* const start = (Entry*)&_archiveFileContent[8];
        const Entry* const end = (Entry*)&_archiveFileContent[_archiveFilelength];
        ...
    }
    ...    
}

6、 ... and 、 other

Debug command line arguments to

ld64 It also provides rich parameters for developers to query their execution process , Can be in mac through man ld see Options for introspecting the linker One column

-print_statistics

Print ld64 Time consuming distribution of each major step .

      ld total time: 2.26 seconds
   option parsing time:  6.9 milliseconds (  0.3%)
 object file processing:  0.1 milliseconds (  0.0%)
     resolve symbols: 2.24 seconds
     build atom list:  0.0 milliseconds (  0.0%)
         passess:  6.2 milliseconds (  0.2%)
      write output:  10.4 milliseconds (  0.4%)

-t

Print ld64 Each loaded .o .a .dylib.

-why_load xxx

Print .a in .o The reason for being loaded ( That is, what symbol Be needed ).

-ObjC forced load of bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTHomeTab/libCommon.a(ArticleTabBarStyleNewsListScreenshotsProvider_IMP.o)
-ObjC forced load of bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTHomeTab/libCommon.a(TTExploreMainViewController.o)
-ObjC forced load of bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTHomeTab/libCommon.a(TTFeedCollectionViewController.o)
-ObjC forced load of bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTHomeTab/libCommon.a(TTFeedCollectionFollowListCell.o)
....
_dec_8i40_31bits forced load of external/TTAudio/Vendor/libopencore-amrnb.a(d8_31pf.o)
_decode_2i40_11bits forced load of external/TTAudio/Vendor/libopencore-amrnb.a(d2_11pf.o)
_decode_2i40_9bits forced load of external/TTAudio/Vendor/libopencore-amrnb.a(d2_9pf.o)

-why_live xxx

Print on -dead_strip after , Some symbol Of reference chain( That is not strip Why )

such as -why_live _OBJC_CLASS_$_TTNewUserHelper

_OBJC_CLASS_$_TTNewUserHelper from external/TTVersionHelper/ios-arch-iphone/libTTVersionHelper_TTVersionHelper_awesome_ios.a(TTNewUserHelper.o)
 objc-class-ref from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTPrivacyAlertManager/libNews.a(TTPrivacyAlertManager.swift.o)
  +[TTDetailLogManager createLogItemWithGroupID:] from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailLogManager.o)
   __OBJC_$_CLASS_METHODS_TTDetailLogManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailLogManager.o)
    __OBJC_METACLASS_RO_$_TTDetailLogManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailLogManager.o)
     _OBJC_METACLASS_$_TTDetailLogManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailLogManager.o)
      _OBJC_CLASS_$_TTDetailLogManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailLogManager.o)
       objc-class-ref from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/LMCoreKitTTAdapter/libNews.a(LMDetailTechnicalLoggerImpl.o)
        ___73-[TTDetailFetchContentManager fetchDetailForArticle:priority:completion:]_block_invoke from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailFetchContentManager.o)
         -[TTDetailFetchContentManager fetchDetailForArticle:priority:completion:] from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailFetchContentManager.o)
          __OBJC_$_INSTANCE_METHODS_TTDetailFetchContentManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailFetchContentManager.o)
           __OBJC_CLASS_RO_$_TTDetailFetchContentManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailFetchContentManager.o)
            _OBJC_CLASS_$_TTDetailFetchContentManager from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTDetail/libCommon.a(TTDetailFetchContentManager.o)
             objc-class-ref from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/BDAudioBizTTAdaptor/libNews.a(TTAudioFetchableImp.o)
              objc-class-ref from bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/BDAudioBizTTAdaptor/libNews.a(TTAudioFetchableImp.o)

-map (linkmap)

Output linkmap To the designated path , Contains all the symbols And the corresponding address map .

# Path: /Users/bytedance/NewsInHouse_bin
# Arch: x86_64

# Object files:
...
[3203] bazel-out/ios-x86_64-min10.0-applebin_ios-ios_x86_64-dbg-ST-7bf874b56ea0/bin/Module/TTHomeTab/libCommon.a(TTFeedActivityView.o)
...

# Sections:
# Address        Size            Segment        Section
0x100004000        0x0D28B292        __TEXT        __text
0x10D28F292        0x00011586        __TEXT        __stubs
...
0x10D70B5E8        0x00346BE0        __DATA        __cfstring
0x10DA521C8        0x00032170        __DATA        __objc_classlist
...

# Symbols:
# Address        Size            File  Name
0x100004590        0x00000020        [  8] -[NSNull(Addition) boolValue]
...
0x1117EE0C6        0x00000027        [4282] literal string: -[TTFeedGeneralListView skipTopHeight]
...
0x1104B4430        0x00000028        [22685] _OBJC_METACLASS_$_MQPWebService
0x1104B4458        0x00000028        [22685] _OBJC_CLASS_$_APayH5WapViewToolbar
...
0x1114A9CD4        0x0000005C        [ 10] GCC_except_table0
0x1114A9D30        0x00000028        [ 14] GCC_except_table12
...
<<dead>>         0x00000008        [3269] _kCoverAcatarMargin
<<dead>>         0x00000008        [3269] _kCoverTitleMargin
...

LTO — Link Time Optimization

LTO Is a link period full module level code optimization technology . Turn on LTO after ld64 With the help of libLTO To achieve the relevant functions . About ld64 Handle LTO The mechanism will be introduced in a separate article later .

Conclusion

This paper analyzes the source code ld64 Main working principle of , In practical application, the project can combine its own needs to ld64 Customize to fix specific problems or implement specific functions . This article is also the first chapter of the series , More static linkers will be introduced later , Include zld,lld,mold etc. , Coming soon .

Reference material

  • https://opensource.apple.com/source/ld64/
  • https://opensource.apple.com/source/ld64/ld64-136/doc/design/linker.html
  • https://github.com/aidansteele/osx-abi-Mach-O-file-format-reference

About byte terminal technology team

Byte hop terminal technology team (Client Infrastructure) It is a global R & D team of large front-end basic technology ( In Beijing 、 Shanghai 、 Hangzhou 、 Shenzhen 、 Guangzhou 、 Singapore and mountain view have R & D teams ), Responsible for the whole front-end infrastructure construction , Improve the performance of the company's entire product line 、 Stability and engineering efficiency ; Tiktok products include, but are not limited to, the jitter 、 Today's headline 、 Watermelon Video 、 anonymous letter 、 Guagualong, etc , In mobile terminal 、Web、Desktop And other terminals have in-depth research .

The time is now ! client / front end / Server side / Terminal intelligence algorithm / Test Development Global recruitment ! Let's change the world with technology , Interested please contact chenxuwei.cxw@bytedance.com, Email subject resume - full name - Employment intention - Expect the city - Telephone .


Remember to click on the official account , Don't miss the next dry content push !

版权声明
本文为[Byte hopping terminal technology]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/10/20211002145705493o.html

Scroll to Top