Gestalt Shift: Shared library writeup: Part 1

During my daily work this week, I found myself struggling with shared libraries, linking them, and the various compiler flags needed to make the type of library you want. I decided to actually learn this stuff once and for all, and so I am currently reading "How to write shared libraries" by Ulrich Drepper. I decided this was a perfect opportunity to multitask - both write stuff for the blog and learn something! Espec ially since you learn much better by writing about it. Hence, this will be the first part of my writeup of Drepper's paper.

In the most abstract, libraries are collections of code gathered into one file for easy reuse. They can be static, meaning that if you want to use the code in a program, the compiler must take the code contained in the library and bake it into the program upon compilation. Alternatively, they can also be shared or dynamic, meaning that they are not included in the program upon compilation, but the program contains mention of the libraries, so that on run-time, the program loads the library and incorporates it into the program.

Nowadays, (on Unix-like systems) libraries are handled by the so-called ELF (Executable Linkage format), which is a common file format that are used not just for libraries, but for executables and other types of files as well.

Earlier, other formats, such as a.out and the Common Object File Format (COFF) were used. The disadvantage with these were that when these libraries did not support relocation.

When you have a piece of compiled code (typically in what's called an object file), this file will contain a relocation table. Such a table is a list of pointers to various addresses within that object file, and these addresses are typically given relative to the beginning of the file (which is typically zero). When combining several such object files into one large executable, this object-file-specific list must typically be changed, since the object file now is not located at 'zero' anymore, but rather at some arbitrary point within the new executable.Then, when the executable is to be executed, the addresses are again modified to reflect the actual addresses in RAM. This last part is what is not supported by the old library formats.

This essentially means that each library must be given an absolute address in virtual memory upon creation, and that some central authority must keep track of where the various shared libraries are stored. In addition: when we make additions to a library that is supposed to be shared, we don't want to have to tell all the applications that used the old version that the library has changed - as long as the new version still contains all the stuff we need for our application, it should still work for that application without having to re-link the application with the new version of the library. This means that the table that points to where the various parts of the library are located must be kept separate from the actual library, and it must actually keep track of the pointer tables of all the old versions of that library - once a function had been added to a library, its address lasted forever. New additions to a library would just append to the existing table. In short, a.out and COFF were not very practical for use as shared libraries, although they did make the program run fast, since there is no relocation of table pointers at run time.

Enter ELF

ELF is, as mentioned before, a common file type for both applications, object files, libraries and more. It is therefore very easy to make a library once you know how to make an application - you just pass in an additional compiler flag. The only difference between them is that applications usually have a fixed load address, that is, the (virtual) memory address into which they are loaded upon execution. There is a special class of applications, called Position Independent Executables (PIEs) that don't even have a fixed load address, and for those, the difference between applications and shared libraries are even smaller.

For an application that contains no dynamic components (no shared libraries etc.), its execution is straightforward: The application is loaded into memory, then instruction at the 'entry point' memory address is executed, which should start a chain of events that ends with the termination of the program.

For applications that do contain dynamic components, it is less straightforward: There must be another program that can coordinate the application with the DSOs (Dynamic Shared Objects) before execution of the program starts.

The ELF file structure

ELF files usually contain the following:

the file header
the Program header table
the Section header table
Sections

Sections are the meat of the ELF file - they contain all the information about what is actually going on. The other structures are just there to organize the sections.

The section header table is a table with references to the various sections of the file. The program header table contains references to various groupings of the sections. So you might say that the section header table describes each 'atom' of the file, whereas the program header table collects these atoms into 'molecules' and makes sensible chunks, called segments, that are sections that work together to form a coherent whole.

End of part 1 of the writeup! And I'm only on page 3 of the paper!

Gestalt Shift

Friday, June 14, 2013

Shared library writeup: Part 1

Enter ELF

The ELF file structure

No comments:

Post a Comment