This document presents a design for the architecture of CONFINE nodes based on several discussions held both by email and on meetings. The described node architecture consists mainly of a device1 used for running applications (like experiments and services), connected to the community network (CN) by another device which is part of the CN. In contrast with a boundary host, this last device can relay other CN traffic not related with CONFINE.
This design intends to facilitate the addition of CONFINE nodes in any location of a CN by connecting additional devices to existing CN ones with minimum or no CONFINE-specific changes to their configuration. At the same time, it provides in-sliver users with familiar Linux environments with root access and rich connectivity options (see Node and sliver connectivity) including:
Node architecture shows a diagram of the CONFINE node architecture explained in the following sections.
The architecture of a CONFINE node (source)
A CONFINE node (node) consists of a research device (RD) using a community device (CD) and an optional recovery device. This separation tries to preserve the stability of the CD (and thus of the CN) and is most compatible with any frmware it may run. All these devices are linked by a wired local network (LN), possibly shared with other non-CONFINE devices like CN clients using the CD as a gateway. For the values of addresses mentioned below, see Addressing in CONFINE.
The CD is a completely normal community network device running whatever firmware required for it. It has at least two different network interfaces: a wired or wireless one connecting to the CN (a community interface with a community address) and a wired one to the LN (the gateway interface with its gateway address). It can act as a simple gateway for hosts connected to the LN.
The node is accessed by applications as well as node and slice administrators via the community interfaces of the CD, although node administration may occasionally proceed directly through the LN.
The CD may be able to allocate to certain hosts in the LN some fixed addresses based solely on their MAC addresses, or it may set apart an address range for them which is not assigned to other devices via DHCP or similar. This allows the use of a uniform address scheme (see Addressing in CONFINE) for CONFINE elements in the node. Please note that IPv6 autoconfiguration or MAC-only-based static or predictable DHCP leases should work for fixed addresses, while other kinds of DUIDs or DHCP client IDs may yield unwanted results.
The RD is a relatively powerful2 device running a custom firmware (based
on OpenWrt @OpenWrt@) provided by CONFINE which allows
simultaneously running several slivers implemented as Linux containers
@LXC@. Slivers have limited access to the device's resources and
to one another, thus ensuring slice and network isolation. This is guaranteed
by the control software run in the RD through tools like
tc, Open vSwitch…
The RD implements an internal bridge. The internal address of the RD in this bridge is the same in all testbed nodes and it belongs to a private network which does not clash with CN or LN addresses. This and other predictable addresses mentioned below are computed according to a uniform address scheme (see Addressing in CONFINE). The RD offers some basic sliver services (see Sliver services) on the internal address, including NAT gateway access to the CN.
The RD also implements a local bridge which connects to the LN through a wired interface (the local interface). The bridge is used for simple network layer access to the CN through the CD's gateway address, and the local address of the RD in the bridge is fixed and may be used for remote administration. The management address of the RD in the bridge is easily predictable and belongs to the node's subnet of the testbed's management network; it may be used both for testbed management and remote administration. For easy RD setup and local administration, the local interface may also include a recovery address that is easily predictable or the same in all testbed nodes and that belongs to a private network which does not clash with CN or LN addresses (nor those of the internal bridge). A debug address which is also private and easily predictable can be used to access different RDs in the same local network for debugging purposes.
The RD may have additional direct interfaces, each one connected to its own direct bridge. These interfaces may be connected to the CN at the link layer and used for communication below the network level (see Node and sliver connectivity). The local interface and bridge can also be used as direct interface and bridge, with certain limitations.
All the aforementioned bridges are managed by the control software in order to ensure network isolation between slices (i.e. between slivers running in the RD) as mentioned above, and to keep CN stability and privacy.
The node may include some simple recovery device whose purpose is to force reboot of the RD in case of malfunction using some direct hardware mechanism which must be supported by both devices (like a GPIO port connected to the power supply of the RD), thus avoiding the need for physical presence for rebooting devices in places with difficult access.
The recovery device may get remote instructions from the CN (via the LN) or via different sensors, preferably based on wide-range technologies suffering from low interference and differing from those used by the CN (like ham radio signals, GSM calls or SMS). It may also receive a heartbeat signal kept by control software via some direct link like a serial line; when the recovery device misses a number of heartbeats, it reboots the RD.
A more advanced version of this device may help the RD boot some recovery environment (e.g. via PXE) or collaborate in some techniques for safe device upgrade and recovery3 to allow restoring its firmware to a known state.
The LN may use public or private addresses (from the perspective of the CN). In the latter case, the CD forwards the necessary ports from one of its community addresses to the local addresses of the research and recovery devices for them to be reachable from the CN. In any case, the addresses of the relevant devices in the node's LN are fixed and no routing protocol software is needed (i.e. static configuration, stateless autoconfiguration, or DHCP is enough). Hosts in the LN simply use the CD's gateway address to reach the CN.
The connectivity of a sliver is determined by the network interfaces it includes, which are requested by the slice administrator at sliver definition time and depend on the interfaces provided by the RD and their features. Although reasonable defaults can be provided, in-sliver users should be able to explicitly control the default routing configuration to avoid traffic unexpectedly flowing through unwanted interfaces. For the values of addresses mentioned below, see Addressing in CONFINE.
Every sliver has a private interface and address whose host side veth interface is placed in the internal bridge. The address is automatically assigned from the RD's private network, thus allowing access to the RD's internal address and services (see Sliver services). In-sliver users may choose the latter address as the default gateway, in which case traffic is routed by the RD through the local bridge to the CD's gateway address after performing NAT. This allows client access to the CN but not connections from the CN to the sliver (similar to a home computer behind a NAT gateway on the Internet) nor traffic between slivers. Thus the sliver is ensured that there will be no incoming connections on that interface, obviating the need for firewalls or access control.
All sliver containers in Node architecture have a private interface in the internal bridge.
The slice administrator may request a debug interface and address whose host side veth interface will be placed in the local bridge, thus allowing access to the debug network. The address is easily predictable and computed according to the address scheme, and no gateway is expected to exist in this network. This interface allows connections to other nodes and slivers in the same local network, which should be useful for debugging purposes.
Sliver container 1 in Node architecture has a debug interface in the local bridge.
The slice administrator may request a management interface and address whose host side veth interface will be placed in the local bridge, thus allowing access to the node's management subnet. The address is easily predictable and computed according to the address scheme, and in-sliver users may choose the RD's management address as the default gateway. This interface allows connections from the management network to the sliver and optional access to whatever other networks are routed by testbed gateways in the management network. These interfaces are not intended for transferring massive application traffic nor for performing traffic measurements, since the management network may not properly reflect the features of the CN.
Sliver container 2 in Node architecture has a management interface in the local bridge.
If the RD has been allocated some public addresses, the slice administrator may request a public interface and address whose host side veth interface will be placed in the local bridge, thus allowing access to the LN. The address is automatically assigned from the RD's pool of public addresses, and in-sliver users may choose the CD's gateway address as the default gateway. This allows connections from the CN to the sliver (similar to a computer directly connected to the Internet through a normal gateway).
Sliver container 2 in Node architecture has a public interface in the local bridge.
If the RD has a direct interface, the slice administrator may request an isolated interface (with no pre-assigned network address), i.e. a VLAN interface on the associated direct bridge using one of the VLAN tags allocated to the slice at creation time. Any kind of traffic can be transmitted and captured on such an interface at the cost of being isolated from the CN at the link layer and delivered only to neighboring slivers of the same slice. This allows experimentation with network layer protocols and addresses (e.g. routing experiments) to operate safely on groups of nearby located CONFINE nodes.
Sliver container 3 in Node architecture has an isolated interface on direct bridge X.
(Not yet available.) If the RD has a direct interface, the slice administrator may request a passive interface (with no network address) whose host side veth interface is placed in the associated direct bridge, thus allowing direct access to the CN. Permission is granted only for traffic capture on the passive interface, which is anonymised by control software (e.g. an OpenFlow controller @OpenFlow@ on an Open vSwitch-based bridge @OvS@). This allows CN traffic analysis while respecting privacy.
Sliver containers 4 and 5 in Node architecture have passive interfaces in direct bridge X.
(Not yet available.) If the RD has a direct interface, the slice administrator may request raw access to the interface's network device. The raw interface is moved into the sliver container4 and the associated direct bridge is disabled while the sliver is running. Since the sliver has full physical control on the network device, network isolation can not be guaranteed, so only that sliver is allowed to run in the RD. Moreover, this access can disrupt CN operation and privacy, so it should only be allowed under very particular circumstances (e.g. out of reach of the CN).
Sliver container 5 in Node architecture owns the RD's direct interface Y.
Besides the included private interface, a sliver (like #2 and #5 in Node architecture) may be granted several interfaces using the local bridge or any direct bridge, e.g. a sliver may include a private interface, two public ones, a passive interface in direct bridge X, and an isolated interface on bridge X. Conversely, the local bridge or a direct bridge may provide several slivers with shared access to the same interface (like direct bridge X in Node architecture). Also, node administrators may set limits on the number of interfaces requested for a sliver globally or per bridge (e.g. to avoid running out of public addresses).
When the node administrator includes the node's local interface among its direct interfaces, passive and isolated sliver interfaces can use the local interface and bridge as direct ones. Raw access to the local interface is not allowed, though.
The setup of a sliver's networking is accomplished by populating its image
with appropriate configuration files for the setup of interfaces,
routes, DNS domain and servers… (e.g. in a Debian-based sliver this implies
/etc/network/interfaces file). Static configuration can
be helpful in the configuration of routing, filtering and traffic control
rules in the RD.
The RD offers on its internal address some basic services which become available to slivers through their private interfaces. This setup intends to relieve in-sliver users from configuring those services themselves while providing some features tailored for CONFINE slivers (some of which are not feasible at the sliver level), all in a trivial and reliable manner by accessing an address which is always the same on an interface which is always available and closed to external access. Factoring services out of the slivers also saves resources, minimises the changes of breakage by accidental misconfiguration on the sliver and relieves developers from having to configure them in each sliver template. Please note that none of these services is mandatory for any sliver.
Some examples of sliver services, in no particular order, are:
DNS: The RD acts as a name server, possibly with additional domains for
internal slice use (like a top-level
SMTP: The RD acts as a mail gateway, possibly rewriting mail addresses for
easy sliver identification (e.g.
root@localhost in the sliver becomes
sliceNNNN+nodeNNNN@my-confine-testbed.net when sent to the outside).
A NAT gateway for slivers needing basic client-only connectivity to the network.
DHCP, which can be used by a sliver that has lost its network configuration as a last resort to regain minimum (NAT) connectivity to the network. Please note that normal network setup during sliver boot may use statically generated configuration files, which provide more flexibility than DHCP.
DLEP @DLEP@ is a working draft by the IETF for a protocol which provides link quality and speed information from network devices via IP-based communication. Slivers may use DLEP to access information about the RD's interfaces with no need for raw access. This should ease the deployment of routing experiments using cross-layer information.
Other CONFINE-specific services, e.g. for querying the testbed about slice parameters (e.g. list participating nodes or slivers' public addresses), controlling the slice (e.g. stopping all slivers) or running privileged operations in the RD (like PlanetLab's vSys @vSys@).
Regardless of whether a slice has network access at a given moment, a slice administrator can access any of its slivers through the RD's local address, e.g. at some SSH port (possibly forwarded via one of the CD's community addresses) giving access to the sliver's console5, terminal or SSH server. That connection or a similar one allows the slice administrator to retrieve files from the sliver (e.g. using SCP) to collect application results.
For more information on this method of interaction with slivers see Out-of-band remote access to CONFINE slivers.
This architecture also allows running the community device inside the RD as a community container (CC) thus saving on hardware at the price of stability. In this case each community interface is considered a direct interface in the RD, and the CC has a veth interface placed in its associated bridge6. The gateway interface of the CC is a veth interface placed in the local bridge, and the RD's local interface can be kept for wired node administration and for other non-CONFINE devices. The CC has few restrictions on the local and direct interfaces while slivers can still access them via passive or isolated interfaces (but not as raw interfaces). The CC may manage several direct interfaces in this way.
When the CD is a host with virtualisation capabilities and enough memory, this architecture opens yet another possibility: running the RD inside the CD as a virtual research device. The RD becomes a KVM @KVM@ in the CD and direct interfaces (which are physically plugged into the CD but not used by it) are logically moved into the VM for its exclusive use. The CD places its gateway interface into a bridge together with the RD's local interface. This setup does also reduce the number of physical devices, but it is capable of better keeping the stability of the CD, even in the case of node firmware update. Some nodes in Funkfeuer are already using this technique successfully (see KVM virtualized RD's for Boxes with hardware virtualization).
This model has the advantage of keeping the community side of CONFINE nodes stable while providing the testbed side with a great deal of flexibility regarding how slivers connect to the network. Research devices can have as few as one wired interface or as many interfaces as desired, and tests can be reasonably carried using only research devices on a wired local network. The design also includes the option to force remote reboot and recovery of the research device for nodes with specially difficult access. This architecture is also flexible enough to allow alternative node arrangements which can help reducing hardware costs under certain circumstances.
The following questions remain open:
How should a direct interface be configured to get link layer access to the CN? How should dynamic changes to link layer configuration (e.g. caused by 802.11h/DFS) be handled? For the moment we may assume static configuration by the node administrator.
According to Christoph Barz, adding a Wi-Fi interface in client mode using WPA to a bridge causes problems like loss of association. This needs to be checked.
In the case of a Wi-Fi direct interface, which exact setup (mac80211 virtual interfaces (vifs) and modes, bridges, container interface types…) allows traffic capture besides sending VLAN-tagged and CC traffic?
In the case of Wi-Fi direct interfaces it is convenient that isolated traffic flows straight between interfaces. However in a cell using managed mode (AP + clients) the AP also relays isolated traffic, which decreases its performance and makes airtime control difficult (except when the AP interface is the direct interface itself). Which setup (interface modes like WDS or ad hoc, extensions like 802.11z/TDLS…) avoids this problem?
According to Markus Kittenberger, using several source MAC addresses on the same Wi-Fi interface may be problematic. Passive interfaces do not send data and isolated interfaces use the direct interface's MAC address. Which setup allows the CC to send frames using that same MAC address? Maybe Felix Fietkau's trelay module would help.
Axel Neumann suggests an additional access point interface for slivers implemented as a vif on top of a Wi-Fi direct interface, which would allow easy participation in experiments by joining a cell with a different ESSID. Compatibility with passive and isolated interface support could be problematic, and there may be difficulties in exercising traffic control and shaping, since neither the vif nor its traffic would be visible to the RD.
A raw Wi-Fi interface may interfere with the CD's community link leading to its isolation. Shall some mechanism be put in place to keep raw interfaces away from certain channels? Shall the RD track raw interfaces' channels and stop an offending applications? What about dynamic changes to link layer configuration?
What mechanisms for safe boot, update and recovery shall we implement? This would mainly affect the storage part of the architecture.
Here we will be using a PlanetLab-like terminology in contrast with the common node DB terminology where a node is a location consisting of several devices. ↩
Powerful in comparison with the typical embedded device used as the CD. For instance, an Alix (http://pcengines.ch/alix.htm) or Commel (http://www.commell.com.tw/product/sbc.htm) board with enough RAM and storage can be used. ↩
As commented in this presentation: https://archive.fosdem.org/2012/schedule/event/safe_upgrade_of_embedded_systems.html ↩
For instance, using
iw phy WIRELESS_DEVICE set netns CONTAINER_PID for
wireless interfaces or LXC's
phys network type for others. ↩
Similar to Linode's Lish: http://library.linode.com/troubleshooting/using-lish-the-linode-shell ↩
Since the CC may lack full access to the community interface, CD firmware used for the CC may need some adaptations like leaving link layer configuration to the RD, which may make some firmwares (esp. proprietary ones) not amenable to conversion into a CC. ↩