Friday, June 28, 2013

Shared library writeup: Part 4

I ended last time at saying how the dynamic linker had three main tasks: Determine and load dependencies, relocate the application and dependencies, and initialize the application and dependencies, and how the key to speeding up all of these was to have fewer dependencies in the application.

Now, we're going to look at the relocation process more thorougly. First of all, what's going on? What does 'relocation' mean?

I'm by no means an expert in this, but I'm going to venture an attempt at an explanation: After an ELF object has been compiled, it has an entry point address - in other words, at which memory address the file resides, and if control is transferred to that address, the ELF object will start executing.

However, there are at least a couple of caveats here. First of all: Even if your ELF object has a fixed entry point address, it doesn't mean it will be loaded into actual physical memory at this address. Each process gets its own virtual memory space, which is a mapping from physical memory to a 'platonic' memory space. So the application might get loaded into the entry point address of the virtual memory space, but this address will correspond to another address entirely in physical space.

The second point is that if we're not talking about an executable, but rather a dynamic shared object, as we are here (or rather, we have one executable with a potentially high number of dynamic shared objects that are to be associated with it), the entry point address isn't even the entry point address it will end up with in the final executable - it will get shifted depending on what the linker determines is the best way to combine the addresses of all participating DSOs. This means that all 'internal' addresses in that object will be shifted by the same amount as well. This is what we're currently talking about when we use the term 'relocation'.

So the thing we're going to talk about is how the linker accomplishes this relocation - and especially the part where it has to synchronize all the load addresses, etc. First, it must be noted that there are two types of dependencies a given DSO can have. For one, you can have dependencies that are located within the same object - which I imagine happens when you create an object file with two functions/subroutines and one of them depends on the other - and for another, you can have dependencies that come from a different object.

The first kind of dependency is easy to handle, given that you know the 'new' entry point address of the object in question. For each such dependency, you calculate its relative offset from the entry point, and then simply add this offset to the new entry point.

The second type of dependency resolution is more involved, and I'm going to talk about that more the next time.

No comments:

Post a Comment