User Tools

Site Tools


requirements:slice-operation

Slice operation

Code SRSM-6
Responsible Xavi León
Components testbed server, testbed node

Description

Some basic operations e.g. start/stop over the slices must be provided

Comments

Slice operation covers the set of operations a researcher is allowed to do over a slice. Some possible operations are:

  • Start/stop.
  • Cancel/Extend slice reservation (look at open discussions)
  • Add/remove slice user.

We can borrow most of the terminology from PlanetLab (do we have a definitive terminology for confine?), if not suggested otherwise.

Some of the details may overlap with other requirements, but I have added them here for the sake of clarity.

Analysis

Details

Researchers need a way to manage their slices. Researchers are assumed to be registered on the central server beforehand with a unique id (e-mail?). The list of high level management functions they can perform through the main web server interface would be (TODO: look closely to the SFA interface but it is something like that):

  • Add/Delete/Modify slices: give a name and a description for the slice.
  • Add/Delete/Modify users: bind a existing user/researcher to a slice so they have access.
  • Add/Delete/Modify slivers: bind a existing slice to a node. This will trigger the creation of the container (sliver) on the given node and grant access to users of that particular slice.
  • Start/Stop slivers: start and stop the associated sliver (virtual machine). It may be usefull to simulate failures or to avoid unnecessary resource consumption.

Another different point is what operations (API) the node must support. The operations on the node would be as follow:

  • Instantiate sliver: when a node is added to a slice for the first time, it will need to deploy the corresponding image, set up the sliver (virtual machine on the node), add users to the sliver (maybe ssh keys), etc.
  • Remove sliver: when a node is removed from a slice, it will stop the sliver and remove the associated filesystem (the image file + user data).
  • Start sliver: the sliver MUST be previously instantiated. It will boot up the sliver (boot the virtual machine) to allow its users to access it.
  • Stop sliver: as its name suggest, it will shut down the sliver (virtual machine) and free any used resource (except the associated filesystem).
  • Reconfigure sliver: when a slice is modified (e.g. user added), we need to update the sliver with the new configuration. In the case of a new user added to the slice, it may be necessary to include the credentials to grant the new user access to the sliver.

Why the instantiation and the start of a sliver are separated? Consider the case of a node reboot. If the sliver is already instantiated on that node, it would be necessary to just start the virtual machine.

Open discussions

  1. Need to specify the list of configuration parameters and the exact protocol (e.g. configuration file format) between nodes and confine server.
  2. How the configuration data is spread across nodes?
    • Polling the server every now and then (Planetlab polls the configuration every 30 minutes with a random deviation to avoid flash crowds). They use polling because some nodes are behind a NAT/Firewalls and do not allow direct access. Is that our case?
    • Something similar to the sms plugin of bmx would be suitable?
  3. Need to discuss further how the sliver images are deployed. I may have missed some previous discussions about this but I still don't understand why researchers need to “pre-configure” a sliver image and send it the “portal” for its distribution. I'd go for the simple solution of just having the same image for every sliver and let the researcher configure it as he wants after instantiation (like planetlab). We can provide the distribution of “diff images” as a separate service which could be implemented on top of the simple solution.
  4. There has been some discussions about node reservation and calendar services. They are not at the heart of SFA. I'd suggest leaving this out because I understand reservation as exclusive access and the objective should be to have a shared infrastructure. Reservations can be implemented on top of the basic operations described here by a higher level component or service. For points 2 and 3, look at unbundled management (section 2.2)

Recommendations

Confine is quite similar to Planetlab in the sense that its main advantage over others are concurrent experiments over a set of nodes inside community networks. I'd suggest looking at the NodeManager code of PlanetLab as it already supports all these requirements. I'm not saying using the actual code (it's actually a set of python files with a lot 'magic stuff' implemented over the last 5-6 years), but we may reuse some or get some ideas from there.

requirements/slice-operation.txt · Last modified: 2012/01/24 10:32 by xleon