Gestalt Shift: libraries

Showing posts with label libraries. Show all posts

Friday, June 28, 2013

Shared library writeup: Part 4

I ended last time at saying how the dynamic linker had three main tasks: Determine and load dependencies, relocate the application and dependencies, and initialize the application and dependencies, and how the key to speeding up all of these was to have fewer dependencies in the application.

Now, we're going to look at the relocation process more thorougly. First of all, what's going on? What does 'relocation' mean?

I'm by no means an expert in this, but I'm going to venture an attempt at an explanation: After an ELF object has been compiled, it has an entry point address - in other words, at which memory address the file resides, and if control is transferred to that address, the ELF object will start executing.

However, there are at least a couple of caveats here. First of all: Even if your ELF object has a fixed entry point address, it doesn't mean it will be loaded into actual physical memory at this address. Each process gets its own virtual memory space, which is a mapping from physical memory to a 'platonic' memory space. So the application might get loaded into the entry point address of the virtual memory space, but this address will correspond to another address entirely in physical space.

The second point is that if we're not talking about an executable, but rather a dynamic shared object, as we are here (or rather, we have one executable with a potentially high number of dynamic shared objects that are to be associated with it), the entry point address isn't even the entry point address it will end up with in the final executable - it will get shifted depending on what the linker determines is the best way to combine the addresses of all participating DSOs. This means that all 'internal' addresses in that object will be shifted by the same amount as well. This is what we're currently talking about when we use the term 'relocation'.

So the thing we're going to talk about is how the linker accomplishes this relocation - and especially the part where it has to synchronize all the load addresses, etc. First, it must be noted that there are two types of dependencies a given DSO can have. For one, you can have dependencies that are located within the same object - which I imagine happens when you create an object file with two functions/subroutines and one of them depends on the other - and for another, you can have dependencies that come from a different object.

The first kind of dependency is easy to handle, given that you know the 'new' entry point address of the object in question. For each such dependency, you calculate its relative offset from the entry point, and then simply add this offset to the new entry point.

The second type of dependency resolution is more involved, and I'm going to talk about that more the next time.

Tuesday, June 25, 2013

Shared library writeup: Part 3

I ended last time by talking about how the dynamic linker had to run before an ELF file could run. Currently, in other words, the dynamic linker has been loaded into the memory space of the process that we want to run, and we're going to run the linker before we can run our actual application.

We cannot run the linker quite yet, however. The linker has to know where it should transfer control to after it has completed whatever it is supposed to do. This is accomplished by the kernel - it puts an auxiliary vector on top of the process' stack. This vector is a name-value array (or in Python terms, a dictionary), and it already contains the values I talked about last time - the PT_INTERP value which is contained in the p_offset field of the ELF program header. In addition to these, a number of other values are added to the auxiliary vector. The values that can be added like this are defined in the elf.h header file, and have the prefix AT_. After this vector has been set up, control is finally transferred to the dynamic linker. The linker's entry point is defined in the ELF header of the linker, in the e_entry field.

Trusting the linker

At this point the (dynamic) linker is supposed to do its magic. It has three tasks:

Determine and load dependencies
Relocate the application and all dependencies
Initialize the application and dependencies in the correct order

The paper doesn't go into detail on the first and last point - I suppose the author considers them trivial - but the key to speeding up these two points is having fewer dependencies: the time needed to sort through all dependencies and load them in order usually scales worse than linearly with the number of dependencies. However, the relocation process is usually the most expensive part of the linker's tasks, scaling at least as

O(R+nr)

, where

R

is the number of relative relocations,

r

is the number of absolute relocations, and

n

is the total number of DSOs in the application. Also, there could be additional overhead associated with the symbols in the process, depending on which ELF extensions are being used and so on. In such cases, the cost can be

O(R+nrlog s)

, where

s

is the number of symbols.

Monday, June 17, 2013

Shared library writeup: Part 2

I ended last time talking about how the program header file contains references to the segments of an ELF file. Now, these segments can have various access permissions - some parts are executable but not writable, while some are writable but not executable.

Having a lot of non-writable segments is a good thing, since it means, in addition to data being protected from unintentional or malignant modification, that these segments can be shared if there are several applications that use them.

The way that the kernel knows what kind of segments are of what type is by reading the program header table, where this information is located. This table is represented by C structs called ELF32_Phdr or ELF64_Phdr.

However, the program header table is not located at a fixed place in an ELF file. The only thing that is fixed is the ELF header, which is always put at 'offset' zero, meaning the beginning of the file, essentially. (Offset means how many bytes from the beginning something is located). This header is also represented by a C struct, called ELF32_Ehdr or ELF64_Ehdr (the 32 or 64 refers to whether the computer architecture is 32-bit or 64-bit, respectively - i.e., all its registers, memory addresses and buses have sizes of 32 bits or 64 bits.)

Now, the ELF header struct contains several pieces of information (fields) that are necessary to determine where the program header is. Writing down these pieces means essentially copy-pasting the article I'm reading, so I think I will not go down to that level of granulation.

Once the kernel has found the program header table, it can start reading information about each segment. The first thing it needs to know is which type the segment is, which is represented by the p_type field of the program header table struct. If this field has the value PT_LOAD it means that this segment is 'loadable'. (other values this field can have is PT_DYNAMIC, which means that this segment contains dynamic linking information, PT_NOTE, which means the segment contains auxilliary notes, et cetera.) If the p_type field has the value PT_LOAD, the kernel must, in addition to knowing where the segment starts, also know how big it is, which is specified in the p_filesz field. There are also a couple of fields that describe where the segment is located in virtual memory space. However, the actual offset in virtual memory space is irrelevant for DSOs that are not linked, since they haven't been assigned a specific place in virtual memory space. For executables and so-called 'prelinked' DSOs (meaning that they have been bound to an executable even if they're dynamic), the offset is relevant.

However, even though the offset in memory is irrelevant for unlinked DSOs, the virtual memory size of the segment is relevant. This is because the actual memory space that the segment needs can be larger than the size of the segment in-file. When the kernel loads the segment into memory, if the requested memory size is larger than the segment size, the extra memory is initialized with zeroes. This is practical if there are so-called BSS sections in the segment. BSS is an old name for a section of data that contains only zero bits. Thus, as long as extraneous memory is initialized with zeroes, this is a good way to save space - you only need to know how large the bss section is, add that size to the current size of the segment, and the kernel handles the rest. An example of a BSS section is a section containing uninitialized variables in C code, since such variables are set to zero in C anyway.

Finally, each segment has a logical set of permissions that is defined in the p_flags field of the program header struct - whether the segment is writable, readable, executable or any combination of the three.

After this, the virtual address space for the ELF executable is set up. However, the executable binary at this point only contains the segments that had the PT_LOAD value in the p_type field. The dynamically linked segments are not yet loaded - they only have an address in virtual memory. Therefore, before execution can start, another program must be executed - the dynamic linker.

The dynamic linker is a program just like the executable we're trying to run, so it has to go through all the above steps. The difference is that the linker is a complete binary, and it should also be relocatable. Which linker is used is not specified by the kernel - it is contained in a special segment in the ELF file, which has the PT_INTERP value in the p_type field. This segment is just a null-terminated string which specifies which linker to use. And the load address of the linker should not conflict with any of the executables on which it is being run.

This ends the second part of the writeup. And there's plenty left..

Friday, June 14, 2013

Shared library writeup: Part 1

During my daily work this week, I found myself struggling with shared libraries, linking them, and the various compiler flags needed to make the type of library you want. I decided to actually learn this stuff once and for all, and so I am currently reading "How to write shared libraries" by Ulrich Drepper. I decided this was a perfect opportunity to multitask - both write stuff for the blog and learn something! Espec ially since you learn much better by writing about it. Hence, this will be the first part of my writeup of Drepper's paper.

In the most abstract, libraries are collections of code gathered into one file for easy reuse. They can be static, meaning that if you want to use the code in a program, the compiler must take the code contained in the library and bake it into the program upon compilation. Alternatively, they can also be shared or dynamic, meaning that they are not included in the program upon compilation, but the program contains mention of the libraries, so that on run-time, the program loads the library and incorporates it into the program.

Nowadays, (on Unix-like systems) libraries are handled by the so-called ELF (Executable Linkage format), which is a common file format that are used not just for libraries, but for executables and other types of files as well.

Earlier, other formats, such as a.out and the Common Object File Format (COFF) were used. The disadvantage with these were that when these libraries did not support relocation.

When you have a piece of compiled code (typically in what's called an object file), this file will contain a relocation table. Such a table is a list of pointers to various addresses within that object file, and these addresses are typically given relative to the beginning of the file (which is typically zero). When combining several such object files into one large executable, this object-file-specific list must typically be changed, since the object file now is not located at 'zero' anymore, but rather at some arbitrary point within the new executable.Then, when the executable is to be executed, the addresses are again modified to reflect the actual addresses in RAM. This last part is what is not supported by the old library formats.

This essentially means that each library must be given an absolute address in virtual memory upon creation, and that some central authority must keep track of where the various shared libraries are stored. In addition: when we make additions to a library that is supposed to be shared, we don't want to have to tell all the applications that used the old version that the library has changed - as long as the new version still contains all the stuff we need for our application, it should still work for that application without having to re-link the application with the new version of the library. This means that the table that points to where the various parts of the library are located must be kept separate from the actual library, and it must actually keep track of the pointer tables of all the old versions of that library - once a function had been added to a library, its address lasted forever. New additions to a library would just append to the existing table. In short, a.out and COFF were not very practical for use as shared libraries, although they did make the program run fast, since there is no relocation of table pointers at run time.

Enter ELF

ELF is, as mentioned before, a common file type for both applications, object files, libraries and more. It is therefore very easy to make a library once you know how to make an application - you just pass in an additional compiler flag. The only difference between them is that applications usually have a fixed load address, that is, the (virtual) memory address into which they are loaded upon execution. There is a special class of applications, called Position Independent Executables (PIEs) that don't even have a fixed load address, and for those, the difference between applications and shared libraries are even smaller.

For an application that contains no dynamic components (no shared libraries etc.), its execution is straightforward: The application is loaded into memory, then instruction at the 'entry point' memory address is executed, which should start a chain of events that ends with the termination of the program.

For applications that do contain dynamic components, it is less straightforward: There must be another program that can coordinate the application with the DSOs (Dynamic Shared Objects) before execution of the program starts.

The ELF file structure

ELF files usually contain the following:

the file header
the Program header table
the Section header table
Sections

Sections are the meat of the ELF file - they contain all the information about what is actually going on. The other structures are just there to organize the sections.

The section header table is a table with references to the various sections of the file. The program header table contains references to various groupings of the sections. So you might say that the section header table describes each 'atom' of the file, whereas the program header table collects these atoms into 'molecules' and makes sensible chunks, called segments, that are sections that work together to form a coherent whole.

End of part 1 of the writeup! And I'm only on page 3 of the paper!