Supporting NFS Root
It is possible to use an NFS share rather than a local disk
as root device; this is (obviously) useful for diskless terminals,
but it also can come in handy for recovery.
Examples of projects using NFS root for diskless work
are
LTSP,
Lessdisks and
Stateless Linux.
In these projects, the initial boot image comes with the distribution
and it must be sufficiently generic to support a wide range of
hardware; in particular it must probe for different network
cards. For yaird, we'll focus on recovery use, where the initial
boot image is tailored for a single computer.
Although in principe the kernel and initial boot image for an NFS root
system can be stored on a local disk, it's more common to have them
loaded over the network with TFTP. This means you'll need a boot loader
that can work over the network, such as pxelinux.
This takes place before the initial boot image takes over;
we won't dive into the details here.
There are a number of issues that make it impossible to automatically
determine exactly what is needed to do a network boot:
Not all interfaces are suitable for booting: think of
loopback devices IPsec tunnels, 802.1Q endpoints.
Interfaces may be renamed by udev;
thus there is no link between the name while running
yaird and the name while
running the initial boot image.
Once the system is running, there is no way to determine
how an interface got its IP address: could be RARP, DHCP
or static.
An NFS share in /etc/fstab contains
a hostname and directory, with no portable indication how
that name is resolved to an IP address, whether that IP
address will be unchanged during the next reboot and whether
the route to that IP address will stay unchanged.
This means we cannot determine how to mount the NFS root using
only information that is readily available on the running system:
we'll need a hint. Rather than give that hint in the form of
yaird configuration options, we will
use the kernel command line.
The NFS part of the boot process takes place after
loading of keyboard drivers and before switching to the
final root. It has the following phases:
Load device drivers for every interface that is backed
by hardware: /sys/class/net/*/device.
load protocols:
nfs for file sharing (this implies lockd and sunrpc),
and af_packet for raw ether, needed for DHCP.
Configure interfaces: get an IP address, netmask, broadcast,
gateway. As a side effect, get hostname, dns, rootserver,
rootpath.
Mount the NFS root.
The last two steps are done by a single program,
trynfs. This is based on the klibc
components ipconfig and
nfsmount.
This program only is invoked if the kernel command line parameter
ip= (or its alias nfsaddrs=) is set. The kernel parameters ip=,
nfsaddrs=, nfsroot= are passed as arguments to
trynfs.
Earlier versions of Yaird had a command
line option "--nfs" to enable NFS code generation. Starting with
version 0.0.11, this option no longer is available. Instead, write
a configuration file based in Default.cfg that
uses the 'nfsstart' template to get an IP address and mount a root
file system. The reason the command line option is dropped is that
there are more ways to use NFS than can be expressed with a simple
command line option: some people need only a driver for a specific
card, others need lots of network drivers; you may or may not want
to use a local drive as backup if no network is available; using
a configuration file makes it possible to tune the generated image
exactly for the situation at hand.
NFS Pitfalls
Yaird can get the system to a state
where init is running from an NFS mounted root device, but that
is not always sufficient to get a reliable system: the init
scripts will also need to be written to work well in an NFS
mounted environment. This section discusses some potential
problems.
The Linux version of NFSv4 (Working Group,
Linux
reference implementation)
has a new channel of communication between the kernel and user
space: rpc_pipefs. This is normally mounted on
/ar/lib/nfs/rpc_pipefs, and is used to
let a user space daemon do locking and Kerberos on behalf of the
kernel.
The rpc_pipefs support on a machine can interfere with
yaird. As an example, in Fedora,
/etc/modprobe.conf.dist has an 'install'
line for module 'sunrpc' that automatically mounts the
rpc_pipefs filesystem when the module is loaded. This means
the filesystem is not mounted if the sunrpc module happens
to be compiled into the kernel; it also can't be mounted if
sunrpc is loaded from the initial boot image, since there is no
/var/lib/nfs/rpc_pipefs yet to mount it on.
When yaird sees such an install line,
it can no longer determine what should go on the initial boot
image and terminates.
The workaround is to remove the 'install' line from
modprobe.conf and to do the mounting
in an /etc/init.d script before the
rpc.gssd and
rpc.statd daemons are started.
Note that using Kerberos with an NFS mounted root is of
questionable value: Kerberos relies on a secret file on the root
file system to guarantee the security of NFS, and if that secret
file is on an NFS file system that is itself not protected by
Kerberos, the guarantee loses value.
Another potential problem is dhclient, a tool to configure a
network interface with DHCP. This can call a user script
to manage DHCP state changes, and on FC4, that script happens
to stop and start the interface to get it to a known state.
Since the script itself is accessed over NFS via the interface,
the stopping works, but the starting doesn't ... By using a
fixed IP address you avoid this problem, but that is not a
generally applicable solution.