User Tools

Site Tools


soft:node-init

Node initialization process

This document describes a proposal for the initialization process of CONFINE nodes. The process supports several node images (which include CONFINE software) in a node, with each of the images having a different name (that may be autogenerated or reusable). There is always one image selected as a default node image, which is supposed to be stable and working so the node always boots this node image. Any other stored node image can be booted as well, but rebooting the node afterwards still boots the default, safe node image, until another node image is explicitly selected as the new default one. Thus rebooting the node after booting a bad node image (explicitly, forcefully or on system crash) brings the node back to a stable configuration.

The process requires an initial OpenWrt system which is only used as an intermediate and maintenance system. Its kernel includes as few hardware drivers as possible to reduce hardware initilization issues with the subsequent kernel loaded via kexec (see below), which is used to chainload the node image.

This specification is inspired in the paper Secure Remote Management and Software Distribution for Wireless Mesh Networks and it applies to nodes of a platform supporting OpenWrt and kexec.

Storage

This mechanism relies on the existence of an init partition (and a boot partition if the boot loader requires it) as described in Node storage layout to host the initial OpenWrt system. It also relies on the existence of a system partition to host the different node images as indicated below:

confine/image-v1/
Contains the different stored node images, arranged according to the format defined in version 1 of this specification.
confine/image-v1/NAME/
confine/image-v1/NAME/kernel
confine/image-v1/NAME/rootfs
confine/image-v1/NAME/overlay/
The image directory, containing the kernel and read-only root filesystem (e.g. Squashfs) files that conform the stored node image called NAME, plus the directory used as a writable overlay for storing changes performed to the node's root filesystem when using that node image. The special names none, default and next are not allowed.
confine/image-v1/default -> NAME1
The default image link, a symbolic link pointing to the image directory of the default node image. There must always be one stored node image selected as the default one.
confine/image-v1/next -> IMAGE
The next image link, a symbolic link pointing to the image directory of the node image to be used on the next boot. IMAGE may be either default or some image NAME.

Normal boot procedure

In the normal state the next image link (confine/image-v1/next in the system filesystem) points to the default image link (default), which in turn points to the image directory of some stored node image (NAME). When booting:

  1. The boot loader boots the initial OpenWrt system.
  2. The inital system mounts the system filesystem and resolves the next image link to know which node image it has to boot next (i.e. NAME, pointed by default).
  3. The initial system always makes the next image link point to the default image link (default) regardless of the node image resolved in step 2. This makes the default node image boot next in case of trouble.
  4. The initial system boots itself once again but using the kernel of the node image resolved in step 2 (NAME/kernel), which is loaded via kexec without rebooting. The node image to be used next (NAME) is passed as a kernel argument.
  5. The initial system mounts the system filesystem and looks up the node image passed in as a kernel argument (NAME). It then loop-mounts that image's root filesystem and bind-mounts its overlay directory (which is created if missing) somewhere under it, to be used by the node image to create an overlay filesystem, then continues boot.
  6. Afterwards, the node image's boot process may read persistent config and state files (see Node persistent data) and mount the data filesystem (see Node storage layout).

Filesystem labels can be used to locate the different partitions. See Directories and mount procedure for more details on the boot process.

Running a new node image

To install a new node image its kernel and root filesystem images are stored in the node's system filesystem. To boot the new image, the next image link (confine/image-v1/next) is pointed to its image directory (confine/image-v1/NAME) and the node is rebooted. The boot procedure is exactly the same as the normal one, with the exception that in step 2 the next image link points straight to the image directory of the new node image.

If the boot process or normal operation of the new image happens to crash or hang, on (forced) reboot the node will boot the old, default node image because the next image link is pointing to the default image link. If the node administrator is satisfied with the new node image, it can be made the default one by making the default image link point to the image directory of this image.

To remove a node image that is not currently running, its image directory can be deleted. To dump the changes performed to a node image's filesystem, its overlay directory can be emptied or removed.

Booting to the maintenance shell

After resolving the next or default image link in step 2 of the boot procedure, the node sets the next image link to default (step 3). If the link resolution in the previous step failed, then it continues to boot the initial system to a maintenance root shell with default connectivity. Thus, to force booting to this environment one can simply remove the next or default image links, or make them point to a non-existing image directory. Conventionally, the next image link is pointed to none (which is an illegal image name) to signal that this setting has been explicitly set by a node administrator.

If the node goes into maintenance mode but no activity is detected after a while, the mode times out and the node is rebooted automatically (which results in it booting the default image).

Boot procedure flowchart

source Fig. 1: The boot procedure of a CONFINE node

figure 1 shows a flowchart of the different steps followed by a node during its boot. As a summary, after a node is powered on or rebooted it always load the initial OpenWrt system. Then, according to the value of the confine_image kernel argument, it either:

  • (confine_image==(missing)|"next"|"default", left branches) loads a node image's kernel and runs the initial system again with the new kernel using kexec and confine_image=NAME.
  • (confine_image=="none", middle branch) runs the initial system's init and spawns a maintenance root shell with default connectivity. After a period of inactivity, the node is rebooted.
  • (confine_image==NAME, right branch) prepares and runs the normal boot procedure of the node image with the given NAME and starts normal node operation until it is rebooted.

Programs

confine-node-image

This program manages node images present in its storage. It offers the following commands:

help
Print program help to standard output and exit successfully.
list
List the stored node images.

For each node image stored in the node, a line is printed showing the name of the node image. A field containing the string default is appended to the line of the node image selected as the default one. Fields in the line are whitespace-separated.
install KERNEL ROOTFS
Install KERNEL and ROOTFS as a new node image and print its name.

KERNEL is a Linux kernel file to be copied as kernel, and ROOTFS is a root filesystem file to be copied as rootfs, both in the image directory confine/image-v1/NAME in the node's system filesystem. An empty overlay directory for the new node image is created as overlay in the same directory.

If some of the files fails to be copied, the other files are deleted and the program returns an error code.
copy [--reset] [--force] NAME
Create a copy of the node image called NAME and print its name.

The copy of the node image includes its kernel and root filesystem, as well as its overlay directory if the --reset option is not provided (otherwise an empty directory is created). The name of the new node image is different than that of the original one.

Copying the running node image may result in corrupt data in the new node image, so this is only allowed if the --force option is provided.

If there is no node image called NAME, the program returns an error code. If some file fails to be copied, the other files are deleted and the program returns an error code.
reset NAME
Reset the node image called NAME.

The overlay directory of the image called NAME is emptied. Resetting the running node image is not allowed, but a reset copy of it can be created instead.

If there is no node image called NAME, the program returns an error code.
delete NAME
Delete the node image called NAME.

Deleting the running node image or the default one is not allowed.
If there is no node image called NAME, the program returns an error code.
select-default NAME
Select the node image called NAME as the default one.

If there is no node image called NAME, the program returns an error code.
get-next-boot
Print what node image is to be used on the next boot.

The printed value is either default or the name of a node image.

The node image selected after booting a node is always the default one until explicitly set to a different one.
set-next-boot IMAGE
Select what node IMAGE to use on the next boot.

IMAGE must be either default, none or the name of a node image. If IMAGE is none the node is configured to boot to the maintenance root shell provided by the initial system.

If IMAGE is a name but there is no such node image, the program returns an error code.
This program works by mounting the system filesystem, operating on node images and overlays, and finally unmounting that filesystem.

Directories and mount procedure

Initial OpenWrt system's preinit (first pass):

  1. /mnt/sys: The node's system filesystem is mounted here (read-only).
  2. The node image to boot next is obtained from the confine_image=IMAGE kernel argument in /proc/cmdline (missing equals next).
    1. If IMAGE == none, OpenWrt's init is run and the boot process finishes.
    2. If IMAGE == next|default, the symbolic link /mnt/sys/confine/image-v1/IMAGE is recursively resolved until the name of the node image /mnt/sys/confine/image-v1/NAME (where NAME != none|default|next) is found.
  3. The default image is selected as the next image:
    1. /mnt/sys is remounted read-write.
    2. /mnt/sys/confine/image-v1/next is changed to point to default.
    3. /mnt/sys is remounted read-only.
  4. If the resolution of the symbolic link in step 2 failed or any of the expected components of the resolved NAME node image is missing, OpenWrt's init is run and the boot process finishes.
  5. /mnt/sys/confine/image-v1/NAME/kernel is booted via kexec with kernel argments root=CONFINE_INIT confine_image=NAME, where CONFINE_INIT is the root device of the initial OpenWrt system.
  6. If the previous boot fails, OpenWrt's init is run and the boot process finishes.

Initial OpenWrt system's preinit (second pass):

  1. /mnt/sys: The node's system filesystem is mounted here (read-only).
  2. The name of the node image to boot next is obtained from the confine_image=NAME kernel argument in /proc/cmdline.
  3. /mnt/root: The root filesystem file /mnt/sys/confine/image-v1/NAME/rootfs is loop-mounted here (read-only).
  4. /mnt/sys is unmounted.
  5. The current root is replaced by /mnt/root and the node image's OpenWrt preinit takes over.

Node image's preinit:

  1. /var/run/confine/sys: The node's system filesystem is mounted here (read-write).
  2. The name of the booted node image is obtained from the confine_image=NAME kernel arument in /proc/cmdline.
  3. /var/run/confine/overlay: The overlay directory/var/run/confine/sys/confine/images-v1/NAME/overlay is created if missing and then bind-mounted here.
  4. /var/run/confine/sys is unmounted.
  5. /overlay: The overlay directory /var/run/confine/overlay is move-mounted here (using the special entry in fstab).
  6. The node image's OpenWrt init takes over.

Node image's normal operation:

  • /var/run/confine/sys: The node's system filesystem is mounted here when needed (read-only or read-write as needed), then unmounted as soon as possible (using a disabled entry in fstab).
soft/node-init.txt · Last modified: 2014/12/29 12:39 by ivilata