User Tools

Site Tools


oml:start

OML

OML ( OML wiki) is an instrumentation tool to support monitoring systems in testbeds. It allows application writers to define customisable measurement points (MP) inside new or pre-existing applications. The measurement points define the metrics that will be monitored; he data is collected for each measurement point in OML stream format. Experimenters running the applications can then direct the measurement streams (MS) from these MPs to remote collection points, for storage in measurement databases.

In more detail, the OML framework works as follows:

  1. Measurement Points (MP) are defined for applications/experiments running on the resources of the testbed or on these resources themselves. These MP define the metrics that will be collected as monitoring data.
  2. The MP collect monitoring data that is converted to a stream format becoming measurement streams (MS).
  3. The MS are exported to a local or remote OML server. The OML server will receive the monitoring data as streams and will store this data in a SQL database.
  4. Any entity (experimenters or services) willing to get the monitoring data can access the OML server and query the back-end database to get the monitoring information that has been collected.

OML, Fed4FIRE and Community-Lab

The usage of OML in the Community-Lab testbed is related to the Fed4FIRE project ( OML in F4F) in which our testbed contributes. The idea is to use OML to standardize the monitoring of the different testbeds in the federation scenario.

According to the Fed4FIRE model, the testbeds deploy locally a monitoring tool, which can be any existing standard tool (like Zabbix, Nagios, Zenoss …) or a specific tool designed for the testbed (like in Community-Lab, see Monitoring). This local monitoring tool will collect a set of measurements for the testbed. It is assumed that the testbeds contributing in the project will probably already have their own monitoring system to check the health of the testbed and its resources. Thereofre, the idea is to take advantage of these already existing tools and exploit their capabilities to provide data to the monitoring system of Fed4FIRE.

In this scenario, each federated testbed in Fed4FIRE has its own monitoring system. The standardization of the monitoring procedures is needed to allow the Fed4FIRE users to obtain such monitoring information in a homogeneous way regardless of which testbed is accessed. This standardization consists of storing the monitoring data collected by each testbed in a central database of the Fed4FIRE authority so that the data of any testbed can be got with a query to the database. Thus, all the federated testbeds will provide the Fed4FIRE authority with the necessary monitoring data (usually a subset of the monitoring data collected by their local monitoring system). OML is the framework used to carry out the sending of the monitoring data.

The Fed4FIRE authority has deployed an OML server that acts as a server endpoint to receive data according to the OML protocol. The OML server is deployed with a SQL database backend in which the monitoring data received by the server is stored. The OML server creates a new database in the SQL backend for each testbed reporting monitoring data; there is a table for each monitoring metric reported. In the testbed side, an OML client library (C, Python, Ruby) is used to define the monitoring data for each testbed and send it as it is required to the OML server of Fed4FIRE, and send it to the corresponding OML server. Additionally, the testbeds can deploy their own local OML server. This would allow the testbed to report the monitoring data to both the Fed4FIRE server and the local OML server, so that the testbeds have a local database with their own monitoring data.

There is also an standardization effort regarding which information (metrics) is provided by the monitoring systems of the testbeds and which format is used. In this sense, the Fed4FIRE projects distinguishes different categories of monitoring:

  • Facility monitoring: the status of the testbed facility as a whole is monitored. The information is provided to the OML server and displayed as GAR (Green/Amber/Red) status in the First Level Support (FLS) dashboard (FLS Dashboard)
  • Infrastructure monitoring: contains monitoring information about the infrastructure of the testbed, that is, the nodes and resources that integrate the testbed. Such monitoring information can be used by different stakeholders: experimenters might be interested in this information to evaluate and understand the results of their experiments; other federation services (SLA, reputation, reservation) might also use this information to provide their specific service.
  • Experiment monitoring: consists of providing the means for experimenters to define their own measurement points in their experiments. In other words, provides the OML libraries in the resources to allow the experimenters to define new measurement points that they are interested in the code of their application/experiment.

In summary, OML is used as a protocol to standardize the monitoring for the Fed4FIRE testbeds. It allows to report and store in a standard way the subset of monitoring data collected by each testbeds that is relevant for the Fed4FIRE project. Such data is basically used by two different stakeholders: the users/experimenters (who can be interested in monitoring data of the testbeds in which their experiments are perfromed) and the federation services (that might need to use the monitoring data of each testbed. Examples of federation services: SLA, reputation and reservation). The monitoring data of all the testbeds can be accessed in a standard way and it has a standard format.

OML in Community-Lab

The next sections detail how the explained monitoring levels are designed and implemented in the Community-Lab testbed.

Facility Monitoring

The facility monitoring for Community-Lab consists of monitoring the status of the Controller to see if the testbed is properly working. The status of the controller is checked with a ping operation to the IPv4 address of the Controller host, and an HTTP GET Request to the website of the Controller to see if the web server is working well. The result of each check is an integer value that acts a boolean (1 means OK, 0 means KO). These values are exported as OML streams and sent to the OML central server of Fed4FIRE using the OML Python Library (oml4py), together with a timestamp indicating when the check was performed.

In the Fed4FIRE server side, the GAR (Green/Amber/Red) status for each testbed is calculated as the logic AND operation of the two exported values. The Community-Lab testbed has only two possible states:

  • Green (both ping and web server are OK)
  • Red (ping or/and web server is not OK)

The following picture shows how the Facility Monitoring component for Community-Lab works.

 Facility Monitoring for Community-Lab

Every 10 minutes, the Facility monitoring component pings the Community-Lab Controller machine and checks the Dashboard website, obtaining the two integer values described above. These values are exported as OML streams to the OML server of Fed4FIRE that uses them to calculate the GAR status for Community-Lab and update the First Level Support (FLS) dashboard ( FLS Dashboard).

The Facility Monitoring in Community-Lab is implemented as a Python script whose periodic execution is controlled by crontab. The execution of such script is carried out by the Monitoring server of Community-Lab (the virtual machine that hosts the local Monitoring system).

Infrastructure Monitoring

The infrastructure monitoring uses the already deployed local Monitoring system of Community-Lab (see CONFINE monitor). The Community-Lab Monitoring system periodically monitors the status of all the testbed nodes and stores the data in a CouchBase database backend.

The component that implements the Infrastructure Monitoring exploit this local monitoring system. It gets a set of metrics from the CouchBase database and converts them into OML streams using the OML Python library. Then these OML streams are exported to the OML central server of Fed4FIRE, where they are stored in a database. Both users and other federation services can access the database of the central server to get the monitoring data.

The OML central server of Fed4FIRE for Infrastructure Monitoring metrics is not available yet. Its deployment is planned for Cycle 3 of the project (year 2015).

As a part of the Infrastructure Monitoring for Community-Lab we locally deployed our own local OML Server with a PostgreSQL database backend. The infrastructure monitoring metrics are also exported to our OML Server, so we keep a local replica of the infrastructure monitoring data exported to the Fed4FIRE server. The endpoint of the C-Lab OML server is tcp:84.88.85.24:3003.

The set of nodes’ metrics monitorized in the Infrastructure Monitoring are:

  • Availability: boolean variable for each node that tells if the node is available.
  • CPU: variables indicating the total, free and available amount of cpu (%) for each node.
  • Memory: variables indicating the total, free and available amount of RAM memory (MBs) for each node.
  • Running VMs: a variable that indicates the number of VMs (slivers) running in each node.
  • Storage: variable that indicates the amount of disk space (MBs) in each node.

The Infrastructure Monitoring component of Community-Lab consists of two different subcomponents.

The CouchBase Retriever component that, as its name says, is responsible for retrieving the required monitoring data from the CouchBase database. This components uses the CouchBase Python library (couchbase-python-client) to access the CouchBase server of the Community-Lab monitoring and get the information. Different views in CouchBase (one per metric) have been specified to get the most recent data of the nodes.

The other subcomponent is the OML Wrapper, which is responsible to convert the retrieved monitoring data into a OML stream and send it to an OML server. The OML server that the streams are sent to can be configured. For the Infrastructure Monitoring component the servers where the data is sent to are the OML central server of the Fed4FIRE and the local OML server of Community-Lab.

The subcomponents CouchBase Retriever and OML Wrapper integrate the Infrastructure Monitoring component of Community-Lab. This component is invoked periodically through a Python script executed by the Monitoring server of Community-Lab.

The following picture shows a summary of how the Infrastructure Monitoring component for Community-Lab works.

 Infrastructure Monitoring for Community-Lab

Experiment Monitoring

In the Experiment Monitoring is left to experimenters to deploy or use whatever measurement tools/frameworks to instrument their experiments. Moreover, the OML library will be also provided by default in the resources of the testbeds. This allows the experimenters to use the framework to perform measurements in their experiments. The measurement framework can be deployed by the experimenter in the testbed resources that have been assigned for his experiment. Optionally, Community-Lab could provide by default a measurement framework/tool pre-installed in the slivers. Using whatever measurement framework the experimenter can collect the monitoring data of the sliver during the execution of the experiment. Then, using the OML client library, the monitoring data can be exported to any OML server. The experimenters can deploy their own OML server (for instance in another sliver) so that the monitoring data can be send to this OML server.

For example, imagine an experimenter that has been assigned three slivers to perform his experiment. The experimenter can deploy an instance of the Zabbix Monitoring System on his slivers to monitor the information that he is interested in during the execution of the experiment. The monitoring information retrieved can then be exported into an OML server deployed in a forth sliver using the OML measurement library. The experimenter will be then able to access the sliver with the OML server and retrieve the monitoring information from the database backend.

Another example could be to directly use the OML library in the definition of the experiment to define software measurement points and export the measured values to an OML server.

To support this feature in Community-Lab, the OML client library needs to be provided by default in the different templates that are available for the slivers.

Experiment Monitoring is in a immature phase and some aspects are still under discussion. What is explained in this section is a first provisional approximation of how this functionality would look like. No OML library is provided by default in the slivers yet.

oml/start.txt · Last modified: 2014/11/07 10:51 by gerardmn