User Tools

Site Tools


monitor:queries

Queries & Open Data

In this section, we explain the process that any CONFINE user can follow to make personalized queries to the monitoring server and to use the information retrieved to generate open data sets.

How to make queries

To be able to make queries you will need access to the CouchBase server. You can ask for a username and password by sending an email to the confine-devel mailing list. Then, you will have full access to the CouchBase sever through the following URL: http://monitor.confine-project.eu:8091/index.html.

Following we will explain how to make queries to the CouchBase server database and retrieve information about specific monitored metrics.

Views is the method used by CouchBase to process the information stored in the CouchBase Server database to allow data indexing and querying. Views basically creates indexes on the information allowing to search and select information stored within the CouchBase Server. In our case, Views are used to process the different JSON document types mentioned previously to obtain specific monitored metrics from a particular node or set of nodes and time range. This way, we facilitate and automate the data querying process for later representation or analysis. To process the information, Views perform MapReduce operations over the JSON files to return key-value pairs.

Figure 19 shows an example of the basic operation of views in CouchBase. As we can see, the output of a View call are key-value pairs, being the key the name of the JSON file and the value the file itself. In this specific example, the output keys corresponds to the node identifier and the values are null and therefore, the resulting JSON files do not have any content but are only pointers to the original files.

                         Figure 19: Example of the operation of views in CouchBase 

The general format of a query using views is the following: View_name (arguments)

In order to query a view, the view definition must include a map operation that uses the emit() function to generate each row of information. The content of the key that is generated by the emit() provides the information on which you can select the data from your view.

When querying a view, the key can be used as the selection mechanism. We can use any of the following options:

  1. Explicit key: (key) — show all the records matching the exact structure of the supplied key.
  2. List of keys: (key_a, key_b, key_c) — show all the records matching the exact structure of each of the supplied keys (effectively showing key_a or key_b or key_c).
  3. Range of keys: (key_a, key_b) — show all the records starting with key_a and stopping on the last instance of key_b.

Some of the most relevant query arguments are the following:

  • key (used for “explicit key” selection mechanism): Is an optional string used to return only documents that match the specified key. Key must be specified as a JSON value.
  • keys (used for “list of keys” selection mechanism): Is an optional array. If set to true, it will return only documents that match each of the keys specified within the given array. Key must be specified as a JSON value. Sorting is not applied when using this option.
  • startkey (corresponds to “key_a” used for “range of keys” selection mechanism): Is an optional string. It returns records with a value equal to or greater than the specified key. Key must be specified as a JSON value.
  • endkey (corresponds to “key_b” used for “range of keys” selection mechanism): Is an optional string. If included in the query, it will stop returning records when the specified key is reached. Key must be specified as a JSON value.
  • reduce: Is an optional boolean. If it is set to “true” the query will use the reduction function.
  • group: Is an optional boolean. It is used to group the results using the reduce function.
  • group_level: Is an optional numeric that specify the group level to be used
  • limit: Is an optional numeric used to limit the number of the returned documents to the specified number
  • descending: Is an optional boolean used to return the documents in descending key order

The output from a view will be a JSON file containing information about the number of rows in the view, and the specific view information.

When a view is called without specifying any parameters, the view will produce results according to the following criteria:

  • Results match the full view specification (i.e., all documents are output according to the view definition).
  • Are limited to 10 items within the Admin Console, unlimited through the REST API.
  • A reduce function is used (if defined in the view).
  • Output items are sorted in ascending order.

It is important to notice that if we use a reduce function in a view definition, we will not have a key value in the resulting output unless we also use grouping.

The format of a general key used in a query to obtain all the information related with an specific node would be:

[“node id”,“year”,“month”,“day”,“hour”,“minute”,“second”]

An example of a view call using this type of key would be:

                 view_name(["[fdf5:5351:1dfd:1::2]","2014","06","24","10","18","25"])

The result from this call would be the JSON file containing all the metrics of the node, whose node id = [fdf5:5351:1dfd:1::2], monitored on the 24th of June 2014 at 18 minutes and 25 seconds past 10.

If we use grouping, the group level will be determined by the order of each one of the fields of the key. According to the previous format, group level 1 corresponds to the node id, group level 2 corresponds to the year, group level 3 corresponds to the month and so on.

It is also important to notice that we can omit some fields of the key using “ “ and therefore, the output will include all possible values from the lowest to the highest. We can also use ”{}“ to specify the highest value. To get any range of values, we will always need 2 keys, the startkey and the endkey. So when if we want the whole range of values, then the startkey is empty (” ”) and the endkey is (“{}”)

An example of a view call to get a range of values would be the following:

  view_name(startkey=["[fdf5:5351:1dfd:1::2]","2013","06"," "], endkey=["[fdf5:5351:1dfd:1::2]","2014", "{}"])

In this case, the result from this call would be a set of JSON files containing all the metrics of node [fdf5:5351:1dfd:1::2] monitored from June 2013 (from lowest to highest) and also all the metrics of the same node monitored in 2014 (up to the highest metric monitored).

To explain the specific querying method used by the CONFINE monitor, we will differentiate between single and set metrics. We refer to single metrics as those whose components are static and do not change from node to node. Examples of single metrics are: CPU utilization or Load. On the other hand, set metrics are those that could have different number of components for different nodes. Examples of these kind of metrics are: network or disk because we can have different number and kind of network interfaces for different nodes, and similarly, we can have different number and types of disk partitions. For more information about the monitor metrics, please visit the Node Metrics section.

Single metrics

The general format of a key used for querying a single metric is the following:

[“node id”,“single metric”,“year”,“month”,“day”,“hour”,“minute”,“second”]

An example of a view call using this type of key would be:

                         view_name(["[fdf5:5351:1dfd:1::2]","cpu_usage","{}"])

The output of this view call will be all the cpu usage values of node [fdf5:5351:1dfd:1::2] starting from the first year monitored up to the last one.

Figure 20 shows an example of how views work for this kind of metrics. The example shows a view called get_node-id. As we can observe, the emit function is returning as key the field nodeid of each one of the most_recent JSON documents. In this case, the values are null and we are not using a reduce function.

                               Figure 20: Example of the get_node-id view   

Figure 21 shows another example of a view call. In this case we are using the view get_node-cpu_usage_percentage . As we can observe in the figure, the view returns as key the node id and the timestamp. In addition, the values returned is the CPU usage percentages of the node monitored at each specific time corresponding with the particular timestamp.

                            Figure 21: Example of the get_node-cpu_usage_percentage     

Figure 22 shows an example of a view using a reduce function to calculate CPU usage statistics from all the nodes. It is important to notice that in this case the output of the view does not have any key. We only have a JSON file containing the returned values after applying the view, which corresponds with all the historical statistics of CPU usage percentage.

                     Figure 22: Example of the view get_all_nodes-cpu_usage_statistics    

Figure 23 shows an example of the previous view where we are using a reduce function but we are also using a level 1 grouping. In this case the output of the view has keys even though we are using a reduce function. According to the format of the keys for single metrics presented above, using a level 1 grouping means that the output of the view is grouped according to the node id (corresponding to the first field of the keys for single metrics).

             Figure 23: Example of the view get_all_nodes-cpu_usage_statistics using group level 1    

Figure 24 shows a similar example but using level 5 grouping. In this case, the output of the view will be grouped according to the hour field of the keys.

             Figure 24: Example of the view get_all_nodes-cpu_usage_statistics using group level 5

Set metrics

The general format of a key used for querying a set metric is the following:

[“node id”,“set metric”,“single metric”,“year”,“month”,“day”,“hour”,“minute”,“second”]

Where, “set metric” can be of two different types:

  1. Network interface: the field set metric can be the name of any of the node's network interfaces (e.g.,wlan0, tunl0, bond0, etc.)
  2. Disk partitions: here set metric can be the name of any of the node's disk partitions (e.g., /dev/loop1, /dev/sda3, rootfs, etc)

An example of a view call using this type of keys would be:

               view_name(["[fdf5:5351:1dfd:1::2]","bond0","bytes_recv_last_sec","{}"])

In this case the output of the view will be the number of bytes received the last second by the network interface bond0 of node [fdf5:5351:1dfd:1::2].

Figure 25 shows an example of the output of a query of a set metric using views.

                         Figure 25: Example of the view output for set metrics

How to get open data sets

As explained in the previous section, the result of making a query in CouchBase using views are key-value pairs, where the keys correspond to identifiers of the resulting JSON files and the values are the content of such files.

The fact that the output of a CouchBase query is presented in JSON format makes very easy the process of downloading data from the server, publishing it in an open data format and deleting it from the server database afterwards. The JSON format is a readable format, where all the data is presented as key-value pairs, and therefore,is very easy to convert to any format desired.

monitor/queries.txt · Last modified: 2014/07/22 18:18 by esunly