User Tools

Site Tools


requirements:slice-isolation

Sliver isolation

Code SRSM-9
Responsible Xavi León
Components testbed node

Description

We need to define to what extend a slice needs to be isolated from another one.

Comments

Throughout this page, I'm referring to the “research device”, not the whole CONFINE node (which is under discussion and may have another “control device” to access the research device).

Analysis

Details

A slice is composed of a set of node resources (CPU, memory, disk) and network resources (links and bandwidth). Isolation among experiments means that experiments running on the same node but belonging to different slices will not be able to affect the other's performance (or access its data). In other words, a slice should get the feeling of being the only one executing something on a given node.

To achieve such isolation, we consider two different aspects for each type of resource (node and network):

  • Resource isolation: an slice does not interfere with the operation of other slices. Each slice is completely separated from the other slices meaning that a slice cannot access the data of other slices, cannot kill others processes, etc. and cannot access the core management system of the testbed (the actual host). As stated in some mail discussion on the mailing lists, we should provide enough isolation so that researchers does not harm the testbed and other slices by mistake. It's more about security.
  • Performance isolation: there must be mechanisms to provide performance guarantees to slices. In other words, each slice should have a predictable (up to a certain point) amount of available resources so the experiments are repeatable and measurable regardless the number of concurrent experiments.

Proposal

Here is a table summarizing the two isolation properties with respect to the two type of resources we consider (node and network) and the proposed mechanisms to provide such isolation:

Node (cpu/mem/disk) Network
Resource isolation LXC containers Virtual interfaces + linux bridge/click modular router/openvswitch/openflow
Performance isolation Cgroups + disk quota BW management through ebtables, iptables, tc (qdisc, filters)

Comments:

- LXC provides basic isolation in terms of processes, memory and network stack. Allows the execution of processes in a chrooted environment.

- Cgroups is the basic interface to manage the resource allocation to LXC containers (in terms of shares of cpu, amount of memory).

- Standard networking linux tools (linux bridge, tc, ebtables) can be used to allocate BW to virtual interfaces

And a proposal on how everything fits on the research node. High level diagram:

Open discussions

It's far from complete but it's a starting point to discuss the software stack architecture on the research node and how we isolate stuff.

So far, the proposal deals with node performance and resource isolation and the network performance isolation. What about network resource isolation?

  • For L4 and above experiments, using the underlying IP routing protocol and managing the bandwidth each sliver can produce is enough.
  • For L3 experiments, we need to use separate IP networks (so they don't interefere with each other?)?
    • Is that enough to allow experimentation at L3?
    • Should we consider flowvisor/openvswitch?
    • What about the solution discussed on the list based on click?
requirements/slice-isolation.txt · Last modified: 2012/02/02 18:13 by xleon