User Tools

Site Tools


requirements:resource-allocation

Resource allocation

Code SRSM-2
Responsible Davide Vega
Components testbed server, testbed node

Description

There must be a mechanism that allows the selection and allocation of a subset of resources available on the testbed for a slice.

Comments

“Resource allocation” is the mechanism that maps the researcher requirements into concrete physical nodes (and interfaces). Resource allocation can be either done directly by the researcher or by the testbed administrator. In the former, the researcher will select among the physical nodes that satisfy its requirements the ones that will set up its slice and this selection will be part of the slice description. In the latter, the testbed administrator will receive the slice description from the researcher and select the available nodes that satisfy the requirements expressed on it.

Resource allocation will be intrinsically related with SRSM1, depending on the slice description, the allocation process will be directly especified on the slice description, or an additional process will be needed. Some details overlap other requirements and can complement their analysis.

Analysis

This requirement should satisfy the resource discovery / allocation problem on distributed systems from the users (researchers) and system administrator (server) point of view. In other words, the requirement has to solve three different problems:

  • How to allocate some resources on a given node? How the system reserve a portion or full physical resource to be used for a defined sliver?
  • What server allocation API we provide? (This will be solved by SRRM-5)
  • What users allocation API we provide?

In addition, this requirement should be take into account that resource description information can change (as show SRRM-6) On next details we refer to CPU, memory, disk as a common examples of resources, but the mechanism can be expanded to include other kind of resources.

Details

Resources description are assumed to be registered on the central server (as SRRM-6 points) This information might include:

  • Type [String-code]: a brief resources categorization of resources that serves as a minimal description of what it does like: CPU, RAM, HD, NetworkInterface, etc.
  • UID [numerical]: unique identifier of the resource on the system.
  • Node [numerical]: unique identifier of the node where it resides.
  • Description [String]: a one line description of the resource (i.e. Intel Core 2 Duo 2.2GHz)
  • Value [numerical]: amount of this resources on their units (i.e. 2.2G for CPU GHz, 340M for 240Mbi memory disk space)
  • Value [vector]: list of other important information to consider (i.e. for memory: { latency: 2ms, max-read: Xus, max-write: Xus } )
  • Owners [vector]: list of UID's of slivers that have a portion of this resource.
  1. The open source Globus® Toolkit is a fundamental enabling technology for the “Grid,” letting people share computing power, databases, and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing local autonomy. The toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. (See Globus Toolkit) This tool offers a resource allocation management system through their Web Services Resource Framework that uses a Virtual Organizations (like slivers concept on PlanetLab) as a structure to manage their resources information.
  2. The goal of the Condor® Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput. (See Condor) As a part of the project, Condor group offers a set of tools and C++/Java code (mainly Classified Advertisements) to manage resources and allocate tasks. Different project supports or are supported by Condor, as for example Globus or RedLine (a resource ontology language for expressing constraints associated with resource consumers - requests - and resource provider))
  3. PlanetLab is a group of computers available as a testbed for computer networking and distributed systems research. It provides a collection of machines distributed over the globe and an experiment deployment platform fro researchers among the world. While a commercial variant of PlanetLab might have sufcient resources (and cost recovery mechanisms) to ensure that each slice can be guaranteed all the resources it needs, PlanetLab must operate in an under-provisioned environment. (See PlanetLab Architecture: An Overview) In this context, PlanetLab adopts adopts a two-pronged resource allocation strategy. First, it decouples slice creation and resource allocation. This means all slices are given only best effort promises when they are rst created. They then acquire and release resources over time, assisted by available brokerage services. It is not the case that a slice is bound to a guaranteed set of resources during its entire lifetime. Second, even though some slices can acquire guaranteed resources for a period of time, we expect overbooking to be the norm. PlanetLab must provide mechanisms that recover from thrashing due to heavy resource utilization. Each VM is specied (abstractly represented) by a set of attributes, called a resource specication (RSpec). An RSpec denes how much of the node's resources are allocated to the VM; it also species the VM's type. PlanetLab currently supports a single Linux-based VMM, and so denes a single VM type (linux-vserver-x86). However, other VM types are possible (e.g., linux-xen-x86). So, PlanetLab allocates resources through RSpec information using Linux-VServers on higher level slivers structure.
  4. Finally, exists the possibility to create a personal toolkit to allocate resources and manage it through Linux containers (See lxc Linux Containers that implements Resource management via “process control groups” (implemented via the cgroup file-system). Some tools commented above and others discussed during this days on Confine BSCW can be useful to build this system.

For more useful information of Resource allocation, node architecture and Slices management on planet lab three readings are highly recommended: Globus and PlanetLab Resource Management Solutions Compared, Understanding and Characterizing PlanetLab Resource Usage for Federated Network Testbeds and Operating System Support for Planetary-Scale Network Services

Open discussions

Firstly, Condor is discarded due not provides a complete set of tools to manage resources directly, only a higher language to work with and communicate their attributes. So, the main discussion between the other systems would be focused on if is useful and practical to build our own system or if it is better to use some other approach. If we decide to build our system an extended discussion about their functionalities and interaction with other components/requirements will be necessary.

If the recommendation is to use an external tool to allocate (and manage) resources we have two different approach: Globus or Linux-base VMM (Planetlab).

  • Globus. The aim advantage of Globus is that provides the compelte toolkit and it is open-source. Their installation guide and instructions are well-writed and seems feasiable base our system on it. In addition, it supports other *UNIX based platforms than Linux. Their two main advantages are that: (a) implies a high dependency on Globus technology and (b) Is based on Grids, so their resource management and execution assumes a high performance computing rather spread network nodes based computing.
  • PlanetLab. Comparing this solution with Globus is it highly open to changes and we have more information on our development team about how it works. In addition, the whole system is better prepared to be used on a spread topology network where HPC isn't to important.

Finally, both technologies assumes the resource allocation on different workflow phase. While Globus can due it independently of slivers and virtual machine creation, Planetlab not.

Recommendations

My recommendation is to build resource allocation system from scratch based virtual containers or Linux base VM. The pros are:

  1. Higher knowledge about their operations and lower dependency of other technologies.
  2. We can use the API that would like to use inside it (XML, RPC, REST), making easy the server development.
  3. We have a base software provided by Axel, so we don't start from zero.
  4. Maybe avoid one level of abstraction (slivers)

the main cons are:

  1. Increases the development time.
requirements/resource-allocation.txt · Last modified: 2012/01/24 15:51 by dvladek