Supporting NFS Root

Supporting NFS Root It is possible to use an NFS share rather than a local disk as root device; this is (obviously) useful for diskless terminals, but it also can come in handy for recovery. Examples of projects using NFS root for diskless work are LTSP, Lessdisks and Stateless Linux. In these projects, the initial boot image comes with the distribution and it must be sufficiently generic to support a wide range of hardware; in particular it must probe for different network cards. For yaird, we'll focus on recovery use, where the initial boot image is tailored for a single computer. Although in principe the kernel and initial boot image for an NFS root system can be stored on a local disk, it's more common to have them loaded over the network with TFTP. This means you'll need a boot loader that can work over the network, such as pxelinux. This takes place before the initial boot image takes over; we won't dive into the details here. There are a number of issues that make it impossible to automatically determine exactly what is needed to do a network boot: Not all interfaces are suitable for booting: think of loopback devices IPsec tunnels, 802.1Q endpoints. Interfaces may be renamed by udev; thus there is no link between the name while running yaird and the name while running the initial boot image. Once the system is running, there is no way to determine how an interface got its IP address: could be RARP, DHCP or static. An NFS share in /etc/fstab contains a hostname and directory, with no portable indication how that name is resolved to an IP address, whether that IP address will be unchanged during the next reboot and whether the route to that IP address will stay unchanged. This means we cannot determine how to mount the NFS root using only information that is readily available on the running system: we'll need a hint. Rather than give that hint in the form of yaird configuration options, we will use the kernel command line. The NFS part of the boot process takes place after loading of keyboard drivers and before switching to the final root. It has the following phases: Load device drivers for every interface that is backed by hardware: /sys/class/net/*/device. load protocols: nfs for file sharing (this implies lockd and sunrpc), and af_packet for raw ether, needed for DHCP. Configure interfaces: get an IP address, netmask, broadcast, gateway. As a side effect, get hostname, dns, rootserver, rootpath. Mount the NFS root. The last two steps are done by a single program, trynfs. This is based on the klibc components ipconfig and nfsmount. This program only is invoked if the kernel command line parameter ip= (or its alias nfsaddrs=) is set. The kernel parameters ip=, nfsaddrs=, nfsroot= are passed as arguments to trynfs. Earlier versions of Yaird had a command line option "--nfs" to enable NFS code generation. Starting with version 0.0.11, this option no longer is available. Instead, write a configuration file based in Default.cfg that uses the 'nfsstart' template to get an IP address and mount a root file system. The reason the command line option is dropped is that there are more ways to use NFS than can be expressed with a simple command line option: some people need only a driver for a specific card, others need lots of network drivers; you may or may not want to use a local drive as backup if no network is available; using a configuration file makes it possible to tune the generated image exactly for the situation at hand. NFS Pitfalls Yaird can get the system to a state where init is running from an NFS mounted root device, but that is not always sufficient to get a reliable system: the init scripts will also need to be written to work well in an NFS mounted environment. This section discusses some potential problems. The Linux version of NFSv4 (Working Group, Linux reference implementation) has a new channel of communication between the kernel and user space: rpc_pipefs. This is normally mounted on /ar/lib/nfs/rpc_pipefs, and is used to let a user space daemon do locking and Kerberos on behalf of the kernel. The rpc_pipefs support on a machine can interfere with yaird. As an example, in Fedora, /etc/modprobe.conf.dist has an 'install' line for module 'sunrpc' that automatically mounts the rpc_pipefs filesystem when the module is loaded. This means the filesystem is not mounted if the sunrpc module happens to be compiled into the kernel; it also can't be mounted if sunrpc is loaded from the initial boot image, since there is no /var/lib/nfs/rpc_pipefs yet to mount it on. When yaird sees such an install line, it can no longer determine what should go on the initial boot image and terminates. The workaround is to remove the 'install' line from modprobe.conf and to do the mounting in an /etc/init.d script before the rpc.gssd and rpc.statd daemons are started. Note that using Kerberos with an NFS mounted root is of questionable value: Kerberos relies on a secret file on the root file system to guarantee the security of NFS, and if that secret file is on an NFS file system that is itself not protected by Kerberos, the guarantee loses value. Another potential problem is dhclient, a tool to configure a network interface with DHCP. This can call a user script to manage DHCP state changes, and on FC4, that script happens to stop and start the interface to get it to a known state. Since the script itself is accessed over NFS via the interface, the stopping works, but the starting doesn't ... By using a fixed IP address you avoid this problem, but that is not a generally applicable solution.