Supporting Shared Libraries
When an executable is added to the image, we want any required shared
libraries to be added automatically. The SharedLibraries
module determines which files are required. This section discusses
the features of kernel and compiler we need to be aware of in order
to do this reliably.
Linux executables today are in ELF format; it is defined in
Generic ELF Specification ELFVERSION,
part of the Linux Standard Base. This is based on part of the System
V ABI: Tool Interface Standard (TIS), Executable and Linking Format
(ELF) Sepcification
ELF has consequences in different parts of the system: in
the link-editor, that needs to merge ELF object files into ELF
executables; in the kernel (fs/binfmt_elf.c),
that has to place the executable in RAM and transfer control to it,
and in the runtime loader, that is invoked when starting the
application to load the necessary shared libraries into RAM.
The idea is as follows.
Executables are in ELF format, with a type of either
ET_EXEC (executable) or ET_DYN (shared
library; yes, you can execute those.) There are other types of
ELF file (core files for example) but you can't execute them.
These files contain two kind of headers: program headers and
section headers. Program headers define segments of the file that
the kernel should store consequetively in RAM; section headers define
parts of the file that should be treated by the link editor
as a single unit. Program headers normally point to a group
of adjacent sections.
The program may be statically linked or dynamically (with shared
libraries).
If it's statically linked, the kernel loads relevant segments,
then transfers control to main() in userland.
If it's dynamically linked, one of the program headers has type
PT_INTERP. It points to a segment that contains
the name of a (static) executable; this executable is loaded in
RAM together with the segments of the dynamic executable.
The kernel then transfers control to the userland
interpreter, passing program headers and related info in a
fourth argument to main(), after envp.
There's one interesting twist: one of the segments loaded
into RAM (linux-gate.so) does not
come from the executable, but is a piece of kernel mapped
into user space. It contains a subroutine that the kernel
provides to do a system call; the idea is that this way,
the C library does not have to know which calling convention
for system calls is supported by the kernel and optimal for
the current hardware. The link editor knows nothing about
this, only the interpreter knows that the kernel can pass the
address of this subroutine together with the program headers.
For more info on the kernel-supplied shared library for
system calls, see
LWN: How to speed up system calls,
LWN: Patch: i386 vsyscall DSO implementation,
LKML: common name for the kernel DSO.
The interpreter interprets the .dynamic section of
the dynamic executable. This is a table containing various types
of info; if the type is DT_NEEDED, the info is the
name of a shared library that is needed to run the executable.
Normally, it's the basename.
The interpreter searches LD_LIBARY_PATH for the
library and loads the first working version it finds, using a
breath-first search. Once everything is loaded, the interpreter
hands over control to main in the executable.
Except that that's not how it really works: the path that glibc
uses depends on whether threads are supported, and klibc can
function as a PT_INTERP but will not load additional
libraries.
The ldd command finds the pathnames
of shared libraries used by an executable. This works
only for glibc: it invokes the interpreter
with the executable as argument plus an environment variable that
tells it to print the pathnames rather than load them. For other
C libraries, there's no guaranteed correct way to find the path of
shared libraries.
Update: ldd also works for another
C library, uclibc, unless you disable that support while building
the library by unsetting LDSO_LDD_SUPPORT.
Thus, to figure out what goes on the initial ram image, first try
ldd. If that gives an answer, good.
Otherwise, use a helper program to find PT_INTERP and
DT_NEEDED. If there's only PT_INTERP, good,
add it to the image. If there are DT_NEEDED libraries
as well, and they have relative rather than absolute pathnames,
we can't determine the full path, so don't generate an image.
There are a number of options to build a helper to extract the relevant
information from the executable:
Build it in perl. The problem here is that unpacking 64-bit
integers is an optional part of the language.
Build a wrapper around objdump or
readelf. The drawback is that
there programs are not part of a minimal Linux distribution:
depending on them in yaird would
increase the footprint.
Building a C program using libbdf. This is a library
intended to simplify working with object files. Drawbacks
are that it adds complexity that is not necessary in our
context since it supports multiple executable formats;
furthermore, at least in Debian it is treated as internal
to the gcc tool chain, complicating packaging the tool.
Building a C program based on elf.h.
This turns out to be easy to do.
Yaird uses the last approach listed.