User Tools

Site Tools


arch:slice-sliver-states

Slice and sliver states

For information on additional states reported by the controller state app, please see Definition of reported states.

Summary

A slice administrator can configure a slice to be in one of several set states: register, deploy and start. Slivers in the slice also have a set state: it can be explicitly configured by the slice administrator for each sliver, or it can implicitly take its value from a sliver default provided by the slice (which is not the same as the slice's set state). The sliver's effective set state is the lowest one between the sliver's set state and the slice's set state. Once a node knows a slice and the sliver assigned to itself, it tries to apply the sliver's effective set state, resulting in the sliver's current set state: registered, deployed, started or fail_SET-STATE.

How transitions work

It is very important to note that transitions work by checking the current known slice or sliver descriptions for matches with the transitions leaving the current state, then following the applicable transition and repeating the process until no transition applies (end of round). During the round no new descriptions are fetched. The next time a change occurs in the registry or a description is fetched by the node, the process starts anew from the last state (start of round).

Actions

Actions can be triggered by following transitions or entering states. They are prefixed with a slash:

alloc
Allocates the sliver (i.e. reserves its resources) in the node according to its current sliver configuration if enough resources are available. The allocation of a slice in the registry reserves global testbed resources like VLAN tags if available. A sliver allocation is atomic to the node performing it: if it fails it is automatically undone, but it does not affect other nodes or the registry.
deploy
Deploys the sliver in the node (i.e. creates and prepares its filesystem) according to its current sliver configuration.
start
Starts the sliver in the node (i.e. boots its container).
dealloc
Deallocates the sliver in the node or the slice in the registry (i.e. frees its resources). It is assumed to always be successful.
undeploy
Undeploys the sliver in the node (i.e. erases its filesystem). It is assumed to always be successful.
stop
Stops the sliver in the node (i.e. shuts down its container). It is assumed to always be successful.
store
Saves the new sliver configuration as the current one. It is assumed to always be successful.
stsn
Saves the slice sequence number in the current sliver configuration. It is assumed to always be successful.
forget
Deletes the current sliver configuration. It is assumed to always be successful.

Transitions

Transition labels indicate the conditions that must be accomplished to follow a transition. Their meanings are:

set X
X==(register|deploy|start) (shortened respectively as X==(r|d|s)) where X is the effective set state. For a slice this is the slice.set_state itself; for a sliver this is at any time min(slice.set_state, sliver.set_state) (with register < deploy < start), according to their last known values from the registry. See a description of set states further below.
add conf
The node fetched a description for a sliver that is currently unknown to it.
del conf
The sliver configuration ceased to be available at the registry, i.e. the sliver or slice description was deleted.
new conf
The new sliver configuration fetched by the node has some value differing from the current, known one. A sliver configuration is a merge of all slice and sliver attributes (including slice and sliver sequence numbers); however the following attributes are excluded so changing them is not considered a change in the sliver configuration: slice.name, slice.description, slice.expires_on, slice.set_state (handled by set X), slice.properties, slice.sliver_defaults.instance_sn, slice.sliver_defaults.set_state (handled by set X), sliver.description, sliver.set_state (handled by set X), sliver.properties.
ok_(alloc|deploy|start)
The last alloc, deploy or start action for the sliver was sucessful. The ok_alloc condition can also reflect the result of an alloc action on the whole slice for set states.
err_(alloc|deploy|start)
The last alloc, deploy or start action for the sliver failed because the sliver tried to allocate some resources in the node which do not exist or are not available, or some run-time error happened. The err_alloc condition can also reflect the result of an alloc action on the whole slice for set states.
err_run
The container of the sliver was unexpectedly shut down. This is checked at the beginning of the round and the condition is set accordingly.
(slice rst|sliver up)
An increment of slice.instance_sn or sliver.instance_sn from the one stored in the current sliver configuration, signaling a slice reset or a sliver update, respectively.
wait
A special condition which is only true right after the node starts a new round of checking transitions, and becomes false after following any transition (i.e. it is true once per round).

Since simultaneous changes could trigger several transitions, the one with a greater priority is followed: del conf > new conf > sliver up > slice rst > others.

Set states

The states selectable by a slice administrator are based on those of SFA: register, deploy (SFA's "instantiate"), start (SFA's "activate"). Slice set states indicate the desired future state for slivers in the slice, although any sliver may try to override its slice's set state (slice.set_state) with its own sliver set state (sliver.set_state), if the effective set state computation allows it. A sliver set state which is undefined gets its default value at any time from the last known value of slice.sliver_defaults.set_state from the registry.

A set state implies the previous ones, so a slice or sliver set to be started is also set to be deployed, and one set to be deployed is also set to be registered, thus register < deploy < start. This also means that the only allowed combinations of slice.set_state and sliver.set_state are those where sliver.set_state <= slice.set_state.

register
The slice and corresponding sliver descriptions are known by the registry and they are correct.
deploy
The slice and corresponding slivers are to have their requested resources allocated and their data installed.
start
The slice and corresponding slivers are to have their components started (e.g. containers booted).
(allocating)
A transitory state that runs the alloc action on entry.

As an example, to have all slivers in a slice running except a special one, slice.set_state can be set to start, slice.sliver_defaults.set_state left in the default start value, sliver.set_state left in the default undefined value in all slivers, and sliver.set_state set to register in the special sliver.

In a similar way, to only have one sliver in a slice running, slice.set_state can be set to start, slice.sliver_defaults.set_state set to register, sliver.set_state left in the default undefined value in all slivers, and sliver.set_state set to start in the special sliver.

Current states

Sliver current states indicate that the state has been successfully reached by the sliver in its node. A state implies the previous ones, so a started sliver is also deployed, and a deployed one is also registered, thus registered < deployed < started.

Sliver current states also include states indicating several kinds of failure, and transitory states (in parentheses) which are never kept between rounds.

registered
The sliver and corresponding slice descriptions are known by the node.
(allocated)
The resources requested by the sliver have been successfully allocated. Transitory state.
deployed
The data associated with the sliver has been installed.
started
The sliver's components have been started (e.g. containers booted).
((allocating|deploying|starting))
Transitory states that run the alloc, deploy or start actions on entry.
fail_allocate
A run-time failure occured while allocating the sliver, or the sliver tried to allocate some resources in the node which do not exist or are not available.
fail_deploy
A run-time failure occured while deploying the sliver.
fail_start
A run-time failure occured while starting the sliver, or the sliver unexpectedly stopped running.

Set states DFA

Slice set states
The set states of a CONFINE slice (source)

This DFA is quite straightforward, and it allows for the slice administrator to directly set a state that is not adjacent to the current one, although the definition of the state machine makes the registry go through intermediate states. As an example:

  1. A slice administrator creates some descriptions for a slice and its slivers, which makes slice.set_state==register.
  2. The reseacher later sets slice.set_state=start and the registry starts a new round.
  3. Since the set state is start, the registry follows the set (d|s) transition to (allocating), which runs the alloc action successfully, so ok_alloc is followed to deploy.
  4. Since the set state is start, the registry follows the set s transition to start. No more transitions are are applicable, so the round ends here.

Current states DFA

Sliver current states
The current states of a CONFINE sliver (source)

Although this DFA may look complex, it can be splitted into a simple, step-by-step forward/backward chain of states (registered (allocated) deployed started) for the normal operation of slices and slivers, some interspersed (ACTIONing) states to run an action and check its result, and some (ACTIONing) -> fail_ACTION -> STATE loops for the handling and recovery of failures.

The DFA ensures that the sliver configuration can only be changed while in the registered state (because of the transition with a store action). In the other states, changes to the sliver configuration are ignored, and a sliver update is required to force the application of changes to an already deployed sliver.

The sliver state moves along the main chain of states responding to changes in slice.set_state and sliver.set_state, however a slice reset can force a sliver step-by-step back to the (allocated) state and get it stopped, redeployed and restarted (but still heeding changes in the set state), and the same with a sliver update to the registered state that gets the sliver stopped, undeployed, deallocated, reconfigured, reallocated, redeployed and restarted. Of course the reallocation may fail if the new set of requested resources is not available. To allow the chain of reset or update transitions to complete, the slice and sliver sequence numbers (whose change triggers the reset or update) are only stored when the operation is completed, either separately by the stsn action on the last reset transition that reaches the (allocated) state, or together with the rest of the sliver configuration by the normal store action coming from the registered state.

Run-time failures in allocation, deployment, startup or execution of the sliver move its state to the proper fail_ACTION one. To allow the sliver to recover from such failures, on the start of the next round the state is forced by a wait transition back to a state in the main chain, from where it can retry. In particular, a sliver whose container gets shut down while in the started state will follow the err_run transition and use the fail_start -> deployed -> (starting) -> started loop to get the container started again (rebooted) on the next round.

Finally, if the slice or sliver description is deleted from the registry, del conf transitions are followed and the sliver is stopped, deallocated, and its current configuration deleted.

An example:

  1. A slice administrator creates a slice and sliver description and sets slice.set_state=start in the registry. By default and unless explicitly changed, slice.sliver_defaults.set_state==start.
  2. The node where the sliver is intended to run on fetches both descriptions for the previously unknown configuration, so it follows the add conf/store transition to the registered state and stores the new current sliver configuration.
  3. Since the effective set state is start, the node follows the set (d|s) transition to (allocating), which runs the alloc action successfully, so ok_alloc is followed to (allocated).
  4. Since the effective set state is start, the node follows the set (d|s) transition to (deploying), which runs the deploy action, but it fails e.g. due to a temporary network shortage, so err_deploy is followed to fail_deploy. No more transitions are applicable, so the round ends here.
  5. On the beginning of the next round no changes are fetched and wait is true, so it is followed back to (allocated). Since the effective set state is still start, the node follows the set (d|s) transition once again to (deploying), which runs the deploy action successfully this time, so ok_deploy is followed to deployed.
  6. Since the effective set state is start, the node follows the set s transition to (starting), which runs the start action successfully, so ok_start is followed to started. The sliver is running and operative now.

Now let us imagine that the slice administrator wants to force a particular sliver to apply its latest configuration.

  1. The slice administrator commands a sliver update in the registry.
  2. The node fetches both slice and sliver descriptions, notices the greater sliver sequence number in the new sliver configuration and follows the sliver up/stop transition to deployed, stopping the sliver.
  3. Since the sliver sequence number in the current sliver configuration is still the same, the node still notices the increment and follows the sliver up/undeploy transition to (allocated), erasing the sliver's filesystem.
  4. Since the sliver sequence number in the current sliver configuration is still the same, the node still notices the increment and follows the sliver up/dealloc transition to registered, deallocating the sliver.
  5. Since the sliver sequence number in the current sliver configuration is still the same, the node still notices a change in the configuration and follows the usual new conf/store transition back to registered, saving the new sliver configuration (which includes the sliver sequence number) as the current one.
  6. Now the sliver sequence numbers in the new and current sliver configurations are the same, so the sliver update no longer holds and normal transition conditions apply, e.g. if the effective set state is still start, the sliver follows the steps described above until it gets running once again.
arch/slice-sliver-states.txt · Last modified: 2014/11/27 11:55 by ivilata