User Tools

Site Tools


requirements:slice-activation

Slice activation/deactivation

Code SRSM-5
Responsible Davide Vega
Components testbed server*

* the main work for this requirement will do on testbed server, but testbed node needs to provide some API to perform actions managed by the server.

Description

When the experiment needs to be run, the server in charge* of the selected resources needs to perform the necessary operations so that the slice is allocated and all the necessary steps to start running the experiment are done over the nodes.

*NOTE: I had not realized until now, but this phrasal assumes that Confine has more than one server. So, consistency and replication information protocols will be require (and have not yet analyzed on requirements list)

Comments

This requirement is about the necessary steps to provide a slice ready to be used from the resource allocation decided previously and freeing the resources after the slice lifecycle. It involves the creation of the slivers on each of the nodes involved on the slice deployment and removing them on the slice decommission.

It involves all the necessary operations until the slice is ready to be used. For instance, if the slice requests consists on a set of VMs over a set of nodes, the creation of those VMs on the node belongs to this phase, so also implies the resource allocation.

This requirement is close related to SRSM-6 requirement because the mechanism to activate/deactivate slices might be provided by SRSM-6.

Analysis

Details

Researchers need a way to Start and Stop or schedule an experiment (so: activate, deactivate and put on-hold a slice) For this analysis, it is assumed that this mechanism exists. In the same way, it is assumed that exist a set of valid R resources announced over a set of N nodes into the server and a valid user U. Below is described the set of steps that a server should make in order to check that the requested operation is performed ok.

SFA provides some information about what they consider as this operations:

  • StopSlice(Credential)
  • StartSlice(Credential)
  • ResetSlice(Credential)
  • DeleteSlice(Credential)

“The first two operations stop and start the execution of any active slivers within an existing slice. The slice retains any resources it holds, although a component that uses work-conserving schedulers is free to utilize those resources for the duration of the suspension. The slice should not expect threads running in the slice to resume at the point the slice was suspended, as the implementation of StopSlice is free to kill all running threads, in which case, StartSlice effectively reboots the slice. However, the slice’s on-disk state should remain unaffected by the operations. The third operation resets a slice to its initial state. This includes clearing any on-disk state associated with the slice. Thus, ResetSlice is effectively equivalent to deleting and re-creating the slice, but without freeing the slice’s resources. The fourth operation removes the slice from the aggregate and releases all of its resources.”

And finally, explains: “Note that these operations might be invoked by a user responsible for the slice (e.g., a researcher associated with the slice with the slice or a suitably authorized administrative entity responding to unexpected behavior in the slice), or by a user responsible for the component or aggregate (e.g., an operator affiliated with the MA). In the latter case, the operator might not know that the slice exists on the component, but is terminating or suspending the slice on all components it manages. This permits an operator to control a slice on all of the components it manages without the cooperation of a slice manager that knows all the components on which the slice has been embedded. These four control operations affect the slice state on a particular aggregate, but not on other aggregates where the slice may also have a presence.”

Activation (StartSlice) { sliceUID, UserName, AuthenticationData }

  1. User request to start a slice into a Global-Server Webpage
  2. Server Location Service find the right server to handle the request (this will be a load-balancing policy on server - advanced feature)
  3. User must be authenticated on Global-Server
  4. Server Location Service fins the list of Nodes that are registered inside the slice and sends a Reply-request to their Local-Servers (each of the servers that control the nodes)
  5. Local-Servers ask to each node to create the VM with the registered resources
  6. Nodes create a new VM and reserve the resources necessary to use
  7. Nodes updated the status resources file
  8. When last request is ok, nodes send their response to the Local-server
  9. Local-server notifies that is ready to User Server and starts the slice

Deactivation

  1. This is the same process as before, but the resources have to be deallocated. Our policy have to assure that this happens on a maximum time. Also is important to consider that if the VM is performing any work (experiment) the deallocation have to save the statistical information before stop the sliver.

Open discussions

  1. What profiles exists on Confine? ¿What are their roles?
  2. A more strictly policy about slice deactivation can be considered as a option on their creation. Using this, when a user with low privileges tries to deactivate a slice in use they receive a warning or a non available action.
  3. Need to define the concrete protocol of communication between Server and Slices and User and Server to perform the activation/deactivation.
  4. Think about the policies to apply when some of the operations fails.

Recommendations

I'm not sure about the expected recommendation… To Do: discuss on new week-meeting (2012.01.25)

requirements/slice-activation.txt · Last modified: 2012/01/24 19:57 by dvladek