User Tools

Site Tools


arch:node-states

Node states

For information on additional states reported by the controller state app, please see Definition of reported states.

Due to the raw PULL behaviour of a CONFINE testbed (which provides eventual consistency), there are two kinds of states which apply to a CONFINE node:

  • Set states are the states established at the registry on the node (node.set_state), available from the registry API. They are intended to both set the availability of the node for registry operations like slice instantiation, and to be fetched and applied by the node.
  • Current states or simply states are provided by the node itself as local state (node.state), available from the node API. They represent the actual state of a node at a given moment, derived from the set state it got from the registry and other local factors like the availability of resources or the node's hardware configuration and state.

The set of possible values is the exact same for both set and current states. They are based on those used by @SFA@ and carry similar meanings:

debug

The node has an incomplete configuration1 (e.g. lacking a valid certificate), or an invalid one that can not be applied (e.g. referring to a non-existing interface). This last case only applies to the current state, since the registry simply rejects invalid configurations if detected.

Changes to the configuration of a node are allowed in this state.

This state is entered automatically on node creation or because of certain configuration changes (like removing a certificate or key). It can not be entered manually. It can only be exited by establishing or applying a complete and valid configuration.

safe

The node is executing a complete and valid configuration but it is not available for running slivers or accepting new ones (although already registered or deployed slivers may remain so). This state is useful to indicate that the node is having some maintenance, so it is only available to node administrators.

Changes to the configuration of a node are allowed in this state.

This state is entered automatically after establishing or applying a complete and valid configuration. This state can also be entered manually.

production

The node is working and available for registering, deploying and running slivers. This is the normal state of a node.

Changes to the configuration of a node are not allowed in this state. A node may transition automatically to the safe current state when configuration changes are detected.

This state can be entered manually.

failure

The node is experiencing unexpected software or hardware problems and it is not available for hosting slivers.

Changes to the configuration of a node are not allowed in this state. Of course, this applies to the configuration published by the registry API, not other system or hardware configuration.

As a set state, this state can be entered manually. This is to allow indicating that a node has problems that prevent it from reporting its own state (e.g. a broken network card). In this case it can also be exited manually.

As a current state, this state is entered automatically when a problem is detected. It can only be exited by repairing the problem.

Set states

Node set states
The set states of a CONFINE node (source)

Node set states shows a diagram of the finite-state machine for CONFINE node set states. The meanings of transitions are:

bad conf
An incomplete configuration (e.g. lacking a valid certificate) has been established.
good conf
A complete configuration has been established.
set STATE
Manual change of node.set_state to the given STATE.

As mentioned above, changes to the node configuration are only allowed while in the debug or safe set states, with the node automatically transitioning to one or the other depending on the completeness of the configuration. Once the configuration is complete, the safe, production and failure set states can be established freely, with the exception that the only exit from failure is the safe set state.

As shown, when in safe set state configuration changes and a set state change are not allowed to happen simultaneously, so the node must first be put into safe set state to perform the configuration changes, and the set state can be established to a different one afterwards.

Current states

Node states
The states of a CONFINE node (source)

Node states shows a diagram of the finite-state machine for CONFINE node states. The meanings of transitions are:

bad conf
An incomplete configuration (e.g. lacking a valid certificate) or an invalid one that can not be applied (e.g. referring to a non-existing interface) has been received.
good conf
A complete and valid configuration has been received.
conf
A change to configuration (either good or bad) has been received.
set STATE
A change of node.set_state to the given STATE has been received.
error
A software or hardware error has been detected.
fix
All detected software or hardware errors have been fixed.

Although this FSM looks very similar to that shown in Node set states, some differences exist because i) the node may or may not be able to apply the received configuration, ii) the node's software and hardware can experience errors, and iii) the node may be missing changes that happened at the registry between the received and its last known status.

The node generally heeds what the registry says regarding the node's set state, except when a configuration can not be applied (it moves into debug state), when a configuration change is detected out of the safe state (it moves into the safe state) or when an error occurs (it moves into failure state). Regarding errors, the node ignores "fake failures" received from the registry: the only way of entering the failure state is detecting a real error.

Automatically moving the node into safe state when any configuration change is detected while in production allows a node administrator to perform a series of changes at the registry at once and have the node apply them in a safe manner without manual intervention, as shown in the example below. The same path through safe state is followed when fixing errors.

Example

This shows how a series of configuration and state changes performed by a node administrator on the registry propagate to the affected node.

  1. The node starts with node.set_state == node.state == production.
  2. The node administrator (at the registry) makes node.set_state = safe, then changes some configuration item resulting in node.set_state == safe and goes on to make node.set_state = production again.
  3. After a while, the node notices the configuration change and it makes node.state = safe. The change of set state is present but ignored.
  4. While in node.state = safe, the node sees the configuration change again and applies it. If successful, it remanins in safe state (debug otherwise). The change of set state is present but ignored.
  5. While in node.state = safe, the node sees the change of set state and makes node.state = production.

Effectively, the node managed to approximate the individual steps followed by the node administrator in the registry, without the need for manual intervention.

References

@SFA@
Slice-based Federation Architecture: http://groups.geni.net/geni/wiki/SliceFedArch

  1. The configuration of a node as mentioned here includes most of its attributes as published by the registry API, with some exceptions like the node's set state and boot sequence number, and other attributes that do not affect the node's operation like its name, description, properties and CN web app URL. 

arch/node-states.txt · Last modified: 2015/04/21 17:05 by ivilata