User Tools

Site Tools


usage:node-admin

Node administrator's guide

This guide describes how to register, install and maintain a node in a CONFINE testbed like Community-Lab. When interacting with a testbed controller, you will need to authenticate as a user with at least a node administrator role (also known as a technician) in a group which is allowed to manage nodes in the testbed. We recommend that you first get some experience by managing some virtual nodes under VCT before moving to real testbeds. See Using the Virtual CONFINE Testbed.

You can see some screenshots and more detailed information on the administration of nodes in Node installation (which is tailored for the installation of nodes belonging to Guifi.net), in CONFINE Research Device HOWTO (for Ninux) or in the tutorial section Creating Nodes (which covers the management of VCT nodes). For mode information on upgrading a node, see Node upgrade.

Registering a node

To register a new node for your group in the Community-Lab controller:

  1. Log as a node administrator into the controller web interface. You are presented with the dashboard.
  2. Click on the Nodes icon to get the list of nodes in the testbed. Please note that only the nodes from your groups are listed (use the links in the Filter box to show other nodes).
  3. Click on the Add node button to register a new node.

Basic configuration

You shall fill the configuration items in the new node page:

  • Enter a name for the new node. This is a free-form string, but please use informative and concise names that help testbed users know the location and nature of the node, like MyGroup-SystemsLab-BareBones04.
  • Use the description field for providing additional information on the node, like its hardware resources, their connectivity with community networks or whether they belong to a cloud of neighbour nodes that may communicate directly.
  • If you belong to several groups, choose the one you want to associate this node with.
  • If you know it, select the network island where the node is to be located. This may be useful for locating the most convenient providers of geographically distributed services (like management network gateways and alternate registry API providers).

That should be enough in most cases for indoor nodes. Other items you may want to change:

  • If the architecture of your node is different than the default i686, you may change it under the Advanced section. Keep it unless your processor is very low-power (Via C7 or AMD Geode). x86_64 is not yet directly supported, so choose i686 instead.
  • You may want to attach arbitray string key/value pairs under Node properties for other uses.
Network configration

It is also important to properly configure the network connectivity of the node under Firmware configuration:

  • The name of the local interface, a wired interface used by the research device (RD, i.e. the node) to connect to the local network where the community device (CD) is also connected (see Node architecture). Usually eth0.
  • The sliver public IPv4 support, if the CD can offer public community network addresses to the local network:
    • If you choose “DHCP” then the sliver public IPv4 range must be #N, where N is the number of DHCP addresses reserved for slivers.
    • If you choose “Range” then the sliver public IPv4 range must be BASE_IP#N, indicating that N addresses are reserved for slivers after and including an initial base IP address in the local network.
  • Sliver public IPv6 is still not supported, so keep “none”.
  • You may also want to change the private IPv4 prefix (see Addressing in CONFINE) if the provided default clashes with the prefix used in your local network.

Finally, if your node has additional network interfaces which can be used to reach other nodes at the raw link layer:

  • Under Direct interfaces, add an entry for each interface name and remember to document the fact in the node description (e.g. “Direct interface wlan0 is configured in ad-hoc mode with BSSID XX:XX:XX:XX:XX:XX and channel N.”).

You may need to manually configure these interfaces in OpenWrt after installation.

Completing node configuration

Once the desired fields are filled, save the node. You are brought to the node list where it appears in set state “DEBUG”, which indicates that the node configuration is invalid or incomplete. If you visit the node page by clicking on its name you will see two warnings about the node missing some keys.

  • In the unusual case that you plan to use a generic node image and configure it yourself manually in your node (which is not covered by this guide), you may now provide your node's tinc public key under Tinc configuration and save. You will also need to use the Request certificate… button and upload a certificate signing request for your node's API service.
  • In the most usual case you will be generating a customized firmware for your node from the controller interface (see Generating node firmware). This will also generate the missing keys and assign them to your node's description.

Generating node firmware

The CONFINE controller includes an application that lets you build a firmware image already customized for your node so that you may install and start using it straight away. Of course, this implies that the controller gets to know all private data about your node (keys, certificates, passwords…), but on the other hand you can avoid struggling with the manual configuration of OpenWrt and the CONFINE node system.

Once you have registered the new node in the controller (see Registering a node), go to the node page (e.g. via Nodes / Nodes in the main menu, then click on the node) and click on the Download firmware… button. In the firmware generation screen you may fine-tune some aspects of the process:

  • Choose the most adequate base image if several are available. Please use master or default images rather than testing ones unless you are testing node developments. There may be several flavors for the same version of the image, e.g. for different hardware (i586, i686, virtual nodes…).
  • Enable the optional files to be generated. These are usually node keys and certificates, so leave the check boxes on unless you want to add them manually at a later moment, or the node may not be reachable by the controller or other testbed components.
  • Yoy may provide a root password or disable it completely. For security reasons it is recommended to disable the root password and use SSH keys instead.
  • You may configure the SSH authorized keys for root access (please remind node administrators in your group to provide their public SSH keys as authentication tokens, see Changing user settings):
    • If you allow current node admins, the SSH keys of node administrators in the node's group will be allowed root access in the generated image.
    • Any additional keys added in the text box will also be allowed root access. The controller may provide you some default keys to allow it to perform remote maintenance of nodes by testbed operators, but you may decline it by deleting the keys if you do not trust them.
    • If you check synchronize node admins, the node will periodically update the set of allowed keys by asking the registry about node administrators in its group. Do not check this if you prefer to maintain the list of allowed keys yourself after node installation or if you do not trust the integrity of the registry.
  • Choose whether to generate a USB image or a raw one.

Please set the desired values, and remember to enable the USB image for the first node installation (not for node upgrades).

Now click on Build firmware, you will see a progress bar and after a while you will be presented with a generation summary screen. Check that the new firmware is available (e.g. that its build did not fail), and fetch it to your computer using the image download link.

Installing a node via USB

The recommended way of installing a node for the first time is doing it via a USB flash drive (unless you have a virtual node, in which case please check vnode instead). Please have in mind that both the USB drive and the node's disk will be erased during the installation process.

To install the node for the first time:

  1. Use the controller to generate and download a customized USB image for the node (see Generating node firmware). Let NODE_IMAGE.img.gz be the name of the image file.
  2. Plug the USB drive in your computer.
  3. Dump the image to the USB drive. In a Linux box:
    1. Get the device name of the drive by running dmesg | tail. Look for a line like [sdb] Attached SCSI removable disk at the end: the sdX name between brackets is that of the drive.
    2. Unmount any filesystems in the device which were already mounted by running mount | cut -f1 -d' ' | grep ^/dev/sdX | xargs -L1 sudo umount (replace sdX with the drive's name).
    3. Uncompress and write the node image to the drive by running zcat NODE_IMAGE.img.gz | sudo dd of=/dev/sdX bs=1M; sync (replace sdX with the drive's name).
  4. Plug a keyboard/mouse and a screen in the node.
  5. Boot into its BIOS and configure it to power on automatically after a power failure. This will save you from moving to the physical location of the node to turn it on after such an incident.
  6. Plug the USB drive in the node.
  7. Boot the node and use the boot device selection menu (usually reachable by pressing one of F2, F5, F12, Space, Delete or Escape) to choose the USB drive. Please note that from this point on, if you leave the machine unattended for more than a minute it will automatically proceed to install the image to /dev/sda (first internal hard disk) and reboot.
  8. You will see the GRUB menu screen. Simply wait until it boots OpenWrt.
  9. You will see many kernel messages. When they stop for a little moment and you see the message Please press Enter to activate this console., press Enter.
  10. The installation program shows a screen with a list of existing devices and asks to confirm installation in /dev/sda.
    1. If the program is right, simply press y and Enter, or wait for a minute.
    2. If you want to install to another partition, press n and Enter. You will be left in a console shell, from there run /confine/install.sh /dev/sdX (where sdX is the name of the right disk) and accept the selection.

The installation will proceed and the node will reboot. At this point you may remove the USB drive. The node will still reboot several times more, so give it around 10 minutes to ensure it has finished. If messages in the screen remain still for more than a couple of minutes, you may press Enter, run mount and see that /home is mounted to check that the installation went well.

If /home is not mounted, maybe your controller did not configure the generated image to partition the node's disk automatically (VCT and Community-Lab controllers do); in this case run confine.disk-parted to create the missing partitions and reboot if the script asks you to. The node may reboot itself once more yet. Enter the console shell when the messages stop once more.

Finally, shut the node down by running halt in the shell and wait for the message System halted. to turn it off. Now you may move the node to its final location, connect it to the local network and turn it on again.

Putting a node into production or maintenance

Once a node has been installed (see Installing a node via USB), it must be put into production for it to be able to deploy slivers.

  1. Log as a node administrator into the controller web interface. You are presented with the dashboard.
  2. Click on the Nodes icon to get the list of nodes in the testbed. After installation, the node's current state should read SAFE.
    • If the state is OFFLINE there may be problems with network connectivity.
    • If the state is DEBUG the node may have an invalid configuration.
    • You may click on the current state to see the node state retrieved by the controller, which shows any errors reported by the node. The state can be queried again by clicking on the Refresh button in the state page.
  3. When you see current state SAFE reported by the node (either in the node's state page or in the node list), go to the node's page, change its set state to PRODUCTION and click on Save. This enables the node to deploy slivers.
  4. You are brought back to the node list. While the node's set state is PRODUCTION, its current state may still read SAFE. You may click on it to check in the node state whether it has already seen the change in the server.

Whenever you plan to perform some maintenance work on the node, or you plan to turn it off or expect bad connectivity for a while, it is strongly advised that you put the node's set state back to SAFE until the incidence is over. If the node crashed due to some hardware failure, you may warn others by explicitly putting a FAILURE set state. See Node states for more information.

Monitoring a node

To know the state of a node you may either:

  • Use the testbed controller, which periodically queries nodes' API to retrieve their state and makes it available via the State button of the node's page. The state page also contains some ping statistics (Pings button) and some historical data about node state (History button) along some graphs.
  • Query its API directly (see CONFINE REST API) and look for the node resource description. It is usually located at https://NODE_ADDRESS/confine/api/node.
  • Look at the Community-Lab monitor to see details about the status of nodes in this testbed.

The controller also offers a testbed status report summarizing node availability and software version per group. It can be accessed via Nodes / Summary in the main menu.

Logging into a node

CONFINE nodes use to run an SSH service configured to accept some root logins according to the settings provided when its firmware was configured (see Generating node firmware). Please note that this kind of access is used for managing the node itself; for entering a sliver in a node, see Logging into a sliver.

Usually as a node administrator in a node's group you should be able to use the SSH public keys configured as authentication tokens for your testbed account (see Changing user settings) for logging in nodes.

You need a NODE_ADDRESS to provide to your SSH client to connect to the node as root (see Addressing in CONFINE). For instance, with OpenSSH you may run ssh root@NODE_ADDRESS. If you need to provide a different authentication key, you may use something like ssh -i /path/to/id_rsa root@NODE_ADDRESS.

  • If the node is working correctly you should be able to use its management network address (see The management network) as a NODE_ADDRESS, as long as your computer is also configured as a host in that network (see Adding a host to the management network). The node's address is available under the Management network section in the node's page in the controller.
  • If the node is is connected to the same local link as your computer, you may use the node's recovery address (fdbd:e804:6aa9:1::2/64). First you need to add your computer to that network (e.g. with ip addr add fdbd:e804:6aa9:1:2000::1234/64 dev eth0).
  • If there are several testbed nodes in the same link, you may use the node's unique recovery address (see below).

Please note that you can get the relevant addresses of a node by looking at the /addrs member in the node's state (e.g. in the controller).

Using unique recovery addresses

To add your computer to the unique recovery network:

MY_IFACE=eth0
UREC_PFX=fdbd:e804:6aa9:2
MY_SFX=$(ip a s dev eth0 | sed -rn 's#.* fe80::([^/]+)/64 .*#\1#p')
ip addr add $UREC_PFX::$MY_SFX/64 dev $MY_IFACE

To detect the unique recovery addresses of nodes in your link (as long as their firmware is newer than 2014-05-20 and thus not affected by issue #444):

for host in $(ping6 -Lc2 ff02::1%$MY_IFACE | sed -ne "s/.* bytes from fe80::\([:0-9a-f]*\): .*/$UREC_PFX:\1/p" | sort -u); do
  ping6 -c2 -w2 $host > /dev/null && echo $host
done

To remove your computer from the unique recovery network when done:

ip addr del $UREC_PFX::$MY_SFX/64 dev $MY_IFACE

Exchanging files with a node

You may use an SCP client to upload files to a node and download files from it, e.g. with OpenSSH you may run scp FILES root@NODE_ADDRESS:/destination/path. However, please note that when using an IPv6 NODE_ADDRESS you should enclose it in brackets so that the colons do not confuse scp. For instance, to upload a new firmware image to a node's temporary directory via its recovery address you may run scp NODE_IMAGE.img.gz root@[fdbd:e804:6aa9:1::2]:/tmp.

Upgrading a node

The CONFINE node system software supports several ways of upgrading a node's firmware. Please remember that while the maintenance of the node is taking place, it is recommended to make its set state SAFE in the controller (see Putting a node into production or maintenance). This will stop and probably undeploy slivers, so you may want to warn users running slivers in your node with anticipation.

The most common methods of upgrading a CONFINE node are summarized here, but you may want to see Node upgrade for further detail, additional possibilities and handling of special cases. Also, if you have a virtual node, please check vnode for alternative upgrade methods.

Package-only upgrade

Te lightest upgrade method, it uses OpenWrt's package manager (OPKG) to install a newer version of the confine-system package, available at the CONFINE package repository. Both procedures below will stop the CONFINE daemon, install the package and continue the daemon without losing already deployed slivers:

  • If the node can safely reach the repository:
    1. Copy the URL of the package you want to upgrade.
    2. Enter the node as root (see Logging into a node).
    3. Run the command confine_daemon_update URL.
  • Otherwise:
    1. Download the package to your computer and copy it to the node's temporary directory /tmp (see Exchanging files with a node).
    2. Enter the node as root (see Logging into a node).
    3. Run the following commands:
confine_daemon_stop
opkg --force-depends install /tmp/confine-system_VERSION_x86.ipk
confine_daemon_continue

Automatic upgrade

The simplest method, it downloads a full generic, non-customized node image from the CONFINE image repository, installs it over the old image while keeping the current configuration and deployed slivers, and reboots the node:

  1. Enter the node as root (see Logging into a node).
  2. Run the command confine.remote-upgrade.
  3. Wait for the image to be downloaded and written, and the node to reboot and become stable.

Shall you encounter any problems (e.g. lack of access to the repository or problems with partitions), try the manual method below.

Manual upgrade

A safer but slightly more complex method, it relies on having the new node image locally available in the node's file system.

  1. Enter the node as root (see Logging into a node).
  2. Copy the GENERIC_IMAGE file from the CONFINE image repository (usually CONFINE-owrt-master-atom-current.img.gz) to your node's temporary directory /tmp.
    • If the node can safely reach the repository, run wget -P /tmp GENERIC_IMAGE_URL in the node.
    • Otherwise, download the file at GENERIC_IMAGE_URL to your computer and upload it to the node (see Exchanging files with a node).
  3. Run confine.sysupgrade /tmp/GENERIC_IMAGE.img.gz in the node.
  4. Wait for the image to be written and the node to reboot and become stable.

This should preserve all configuration and deployed slivers. If there are problems with the upgrade, you will need to perform a harder upgrade using a customized image (see Generating node firmware), which will remove all data in the node:

  1. Copy the CUSTOM_IMAGE file to the node's temporary directory /tmp.
  2. Run confine.sysupgrade -n /tmp/CUSTOM_IMAGE.img.gz in the node.
  3. Wait for the image to be written and the node to reboot and become stable.

It is advisable to check that the /home and /overlay partitions have been created and mounted by running mount in the node. Otherwise run confine.disk-parted in the node and follow the instructions.

Clock synchronization via NTP

To keep testbed node clocks synchronized, we recommend that you run an NTP server in your management network and configure your node's ntpd via /etc/ntp.conf, if it has not been already done by the firmware generation process. Enter the node as root (see Logging into a node) and run the following two commands:

cat > /etc/ntp.conf << EOF
# Add as NTP server a host connected to the management network.
restrict default ignore
restrict 127.0.0.1
driftfile /var/lib/ntp/ntp.drift
server NTP_SERVER_MGMT_ADDR prefer iburst
EOF
/etc/init.d/ntpd restart

Where NTP_SERVER_MGMT_ADDR is the management address of the NTP server (fdf5:5351:1dfd:0:0:0:0:2 in Community-Lab).

usage/node-admin.txt · Last modified: 2016/12/20 11:47 by ivilata