Supporting Shared Libraries

Supporting Shared Libraries When an executable is added to the image, we want any required shared libraries to be added automatically. The SharedLibraries module determines which files are required. This section discusses the features of kernel and compiler we need to be aware of in order to do this reliably. Linux executables today are in ELF format; it is defined in Generic ELF Specification ELFVERSION, part of the Linux Standard Base. This is based on part of the System V ABI: Tool Interface Standard (TIS), Executable and Linking Format (ELF) Sepcification ELF has consequences in different parts of the system: in the link-editor, that needs to merge ELF object files into ELF executables; in the kernel (fs/binfmt_elf.c), that has to place the executable in RAM and transfer control to it, and in the runtime loader, that is invoked when starting the application to load the necessary shared libraries into RAM. The idea is as follows. Executables are in ELF format, with a type of either ET_EXEC (executable) or ET_DYN (shared library; yes, you can execute those.) There are other types of ELF file (core files for example) but you can't execute them. These files contain two kind of headers: program headers and section headers. Program headers define segments of the file that the kernel should store consequetively in RAM; section headers define parts of the file that should be treated by the link editor as a single unit. Program headers normally point to a group of adjacent sections. The program may be statically linked or dynamically (with shared libraries). If it's statically linked, the kernel loads relevant segments, then transfers control to main() in userland. If it's dynamically linked, one of the program headers has type PT_INTERP. It points to a segment that contains the name of a (static) executable; this executable is loaded in RAM together with the segments of the dynamic executable. The kernel then transfers control to the userland interpreter, passing program headers and related info in a fourth argument to main(), after envp. There's one interesting twist: one of the segments loaded into RAM (linux-gate.so) does not come from the executable, but is a piece of kernel mapped into user space. It contains a subroutine that the kernel provides to do a system call; the idea is that this way, the C library does not have to know which calling convention for system calls is supported by the kernel and optimal for the current hardware. The link editor knows nothing about this, only the interpreter knows that the kernel can pass the address of this subroutine together with the program headers. For more info on the kernel-supplied shared library for system calls, see LWN: How to speed up system calls, LWN: Patch: i386 vsyscall DSO implementation, LKML: common name for the kernel DSO. The interpreter interprets the .dynamic section of the dynamic executable. This is a table containing various types of info; if the type is DT_NEEDED, the info is the name of a shared library that is needed to run the executable. Normally, it's the basename. The interpreter searches LD_LIBARY_PATH for the library and loads the first working version it finds, using a breath-first search. Once everything is loaded, the interpreter hands over control to main in the executable. Except that that's not how it really works: the path that glibc uses depends on whether threads are supported, and klibc can function as a PT_INTERP but will not load additional libraries. The ldd command finds the pathnames of shared libraries used by an executable. This works only for glibc: it invokes the interpreter with the executable as argument plus an environment variable that tells it to print the pathnames rather than load them. For other C libraries, there's no guaranteed correct way to find the path of shared libraries. Update: ldd also works for another C library, uclibc, unless you disable that support while building the library by unsetting LDSO_LDD_SUPPORT. Thus, to figure out what goes on the initial ram image, first try ldd. If that gives an answer, good. Otherwise, use a helper program to find PT_INTERP and DT_NEEDED. If there's only PT_INTERP, good, add it to the image. If there are DT_NEEDED libraries as well, and they have relative rather than absolute pathnames, we can't determine the full path, so don't generate an image. There are a number of options to build a helper to extract the relevant information from the executable: Build it in perl. The problem here is that unpacking 64-bit integers is an optional part of the language. Build a wrapper around objdump or readelf. The drawback is that there programs are not part of a minimal Linux distribution: depending on them in yaird would increase the footprint. Building a C program using libbdf. This is a library intended to simplify working with object files. Drawbacks are that it adds complexity that is not necessary in our context since it supports multiple executable formats; furthermore, at least in Debian it is treated as internal to the gcc tool chain, complicating packaging the tool. Building a C program based on elf.h. This turns out to be easy to do. Yaird uses the last approach listed.