Tool Chain This section discusses which tools are used in implementing yaird and why. The application is built as a collection of perl modules. The use of a scripting language makes consistent error checking and building sane data structures a lot easier than shell scripting; using perl rather than python is mainly because in Debian perl has 'required' status while python is only 'standard'. The code follows some conventions: Where there are multiple items of a kind, say fstab entries, the perl module implements a class for individual items. All classes share a common base class, Obj, that handles constructor argument validation and that offers a place to plug in debugging code. Object attributes are used via accessor methods to catch typos in attribute names. Objects have a string method, that returns a string version of the object. Binary data is not guaranteed to be absent from the string version. Where there are multiple items of a kind, say fstab entries, the collection is implemented as a module that is not a class. There is a function all that returns a list of all known items, and functions findByXxx to retrieve an item where the Xxx attribute has a given value. There is an init function that initializes the collection; this is called automatically upon first invocation of all or findByXxx. Collections may have convenience functions findXxxByYyy: return attribute Xxx, given a value for attribute Yyy. The generated initrd image needs a command interpreter; the choice of command interpreter is exclusively determined by the image generation template. At this point, both Debian and Fedora templates use the dash shell, for historical reasons only. Presumably busybox could be used to build a smaller image. However, support for initramfs requires a complicated construction involving a combination of mount, chroot and chdir; to do that reliably, nash as used in Fedora seems a more attractive option. Documentation is in docbook format, since it's widely supported, supports numerous output formats, has better separation between content and layout than texinfo, and provides better guarantees against malformed HTML than texinfo. Autoconf GNU automake is used to build and install the application, where 'building' is perhaps too big a word adding the location of the underlying modules to the wrapper script. The reasons for using automake: it provides packagers with a well known mechanism for changing installation directories, and it makes it easy for developers to produce a cruft-free and reproducible tarball based on the tree extracted from version control. C Library The standard C library under linux is glibc. This is big: 1.2Mb, where an alternative implementation, klibc, is only 28Kb. The reason klibc can be so much smaller than glibc is that a lot of features of glibc, like NIS support, are not relevant for applications that need to do basic stuff like loading an IDE driver. There are other small libc implementations: in the embedded world, dietlibc and uClibc are popular. However, klibc was specifically developed to support the initial image: it's intended to be included with the mainline kernel and allow moving a lot of startup magic out of the kernel into the initial image. See LKML: [RFC] klibc requirements, round 2 for requirements on klibc; the mailing list is the most current source of information. Recent versions of klibc (1.0 and later) include a wrapper around gcc, named klcc, that will compile a program with klibc. This means yaird does not need to include klibc, but can easily be configured to use klibc rather than glibc. Of course this will only pay off if every executable on the initial image uses klibc. Yaird does not have to be extended in order to support klibc, but it is necessary to avoid assumptions about which shared libraries are used. This is discussed in . Template Processing This section discusses the templates used to transform high-level actions to lines of script in the generated image. These templates are intended to cope with small differences between distributions: a shell that is named dash in Debian and ash in Fedora for example. By processing the output of yaird through a template, we can confine the tuning of yaird for a specific distribution to the template, without having to touch the core code. One important function of a template library is to enforce a clear separation between progam logic and output formatting: there should be no way to put perl fragments inside a template. See StringTemplate for a discussion of what is needed in a templating system, plus a Java implementation. Lets consider a number of possible templating solutions: Template Toolkit: widely used, not in perl core distribution, does not prevent mixing of code and templates. Text::Template: not in perl core distribution, does not prevent mixing of code and templates. Some XSLT processor. Not in core distribution, more suitable for file-to-file transformations than for expanding in-process data; overkill. HTML-Template: not in perl core distribution, prevents mixing of code and templates, simple, no dependencies, dual GPL/Artistic license. Available in Debian as libhtml-template-perl, in Fedora 2 as perl-HTML-Template, dropped from Fedora 3, but available via Fedora Extras. A home grown templating system: a simple system such as the HTML-Template module is over 100Kb. We can cut down on that by dropping functions we don't immediately need, but the effort to get a tested and documented implementation remains substantial. The HTML-Template approach is the best match for our requirements, so used in yaird. Configuration Parsing Yaird has a fair number of configuration items: templates containing a list of files and trees, named shell script fragments with a value that spans multiple lines. If future versions of the application are going to be more flexible, the number of configuration items is only going to grow. Somehow this information has to be passed to the application; an overview of the options. Configuration as part of the program. Simply hard-code all configuration choices, and structure the program so that the configuration part is a well defined part of the program. The advantage is that there is no need for any infrastructure, the disadvantage is that there is no clear boundary where problems can be reported, and that it requires the user to be familiar with the programming language. AppConfig. A mature perl module that parses configuration files in a format similar to Win32 "INI" files. Widely used, stable, flexible, well-documented, with as added bonus the fact that it unifies options given on the command line and in the configuration file. An ideal solution, except for the fact that we need a more complex configuration than can conventiently be expressed in INI-file format. An XML based configuration format. XML parsers for perl are readily available. The advantage is that it's an industry standard; the disadvantage that the markup can get very verbose and that support for input validation is limited (XML::LibXML mentions a binding for RelaxNG, but the code is missing, and defining an input format in XML-Schema ... just say no). YAML is a data serialisation format that is a lot more readable than XML. The disadvantage is that it's not as widely known as XML, that it's an indentation based language (so confusion over tabs versus spaces can arise) and that support for input validation is completely missing. A custom made configuration language, based on Perl::RecDescent, a widely used, mature module to do recursive descent parsing in perl. Using a custom language means we can structure the language to minimise opportunities for mistakes, can provide relevant error messages, can support complex configuration structures and can easily parse the configuration file to a tree format that's suitable for further processing. The disadvantage is that a custom language is yet another syntax to learn. Building a recursive descent parser seems the best match for this application.