Introduction
This article explains how to integrate metrics from Prometheus into Icinga checks using the check_prometheus plugin. There can be multiple reasons why this could be desired: Maybe you have different teams with their own monitoring systems, and you need to bridge the gap, or you want to perform queries that are just better expressed in Prometheus than in plain Icinga check plugins. The latter can be the case if you want to aggregate data from multiple sources or you want to take historic data into account. For this, we will take a look at the check_prometheus plugin developed and maintained by NETWAYS.
Installation
For using check_prometheus with Icinga, two parts are needed: the check binary itself as well as a CheckCommand definition making the command known to Icinga.
A check_prometheus binary can be obtained multiple ways, pick the method that best fits your needs:
- The GitHub Releases of check_prometheus include pre-built binaries for different operating systems and architectures. Download and install the desired file to an appropriate location, for example
/usr/lib/nagios/plugins/check_prometheus. - NETWAYS provides software repositories for various deb- and rpm-based Linux distributions from which their check plugins can be installed. check_prometheus is provided in the package
netways-plugins-prometheus. - If a Go compiler toolchain is available, check_prometheus can easily be downloaded and built from source using the following command:
GOBIN=$(pwd) go install github.com/NETWAYS/check_prometheus@latest
This will create a binary called
check_prometheusthat can then be installed to an appropriate location, for example/usr/lib/nagios/plugins/check_prometheus.
The source code repository of check_prometheus helpfully provides ready-made CheckCommand definitions for Icinga 2. You can just download that file and include it in your Icinga configuration, for example by placing it in /etc/icinga2/zones.d/global-templates/check_prometheus.conf.
Example: Check if node_exporter is running
All examples in this post will make use of metrics provided by node_exporter, which is probably the most common Prometheus exporter. For those less familiar with Prometheus: it exports all kinds of metrics about the local system to Prometheus, like CPU usage, memory usage, network usage, and so on. If you make use of that data in other checks, it’s always a good idea to have a check that node_exporter itself is up and running. Luckily, Prometheus provides an up metric for each exporter it uses. It’s always a good idea to have such a service and add it as a dependency of other services to prevent false alarms.
apply Service "node_exporter" {
# Add the service to all hosts with the custom variable "vars.node_exporter = true" set.
assign where host.vars.node_exporter
# Use the check command defined in the check_prometheus.conf file added in the installation section.
check_command = "prometheus-query"
# Required configuration options.
check_interval = 5m
retry_interval = check_interval
# Query the "up" metric of the "node" job belonging to the same host name this service object belongs to.
vars.prometheus_query = "up{job=\"node\", instance=\"$host.name$\"}"
# Set the thresholds so that everything except 1 is treated as a problem.
vars.prometheus_query_warning = "1:1"
vars.prometheus_query_critical = "1:1"
}
Please note that all examples on this page assume that the default connection details work to reach the Prometheus server, i.e. it can be reached on http://localhost:9090 without requiring authentication. If that’s not the case for your setup, you need to include additional custom variables like prometheus_hostname, prometheus_port, or prometheus_user.
Example: CPU usage
Prometheus provides its own query language, PromQL. The previous example already used a rather simple PromQL expressions, but they can get much more elaborate. The following is a slightly more complex for quering the average CPU utilization of a host. It needs to sum the different types of cpu time (user, system, etc.) per core and then takes the average over all cores.
apply Service "cpu" {
# The same as before.
assign where host.vars.node_exporter
check_command = "prometheus-query"
check_interval = 5m
retry_interval = check_interval
# Different query and thresholds.
vars.prometheus_query = "100*avg(sum by (cpu) (rate(node_cpu_seconds_total{instance=\"$host.name$\",mode!=\"idle\"}[1m])))"
vars.prometheus_query_warning = "50"
vars.prometheus_query_critical = "90"
}
Example: Disk usage prediction
Given that Prometheus is a time series database, it really shines when it comes to using historic data. Something that isn’t really possible with the plain old check_disk is to take into account how fast a disk is filling up. That’s simply because conceptually, check_disk can only look at the current state of the disk. On the other hand, Prometheus can execute the query over all the historic data it has stored. This allows using a query function like predict_linear() to interpolate the graph into the future. The following example query derives the rate at which available disk space increased over the last hour and uses this to predict how mach space will be available in 48 hours. The thresholds are set such that anything less than zero results in a problem state.
apply Service "filesystems-48h" {
# The same as before.
assign where host.vars.node_exporter
check_command = "prometheus-query"
check_interval = 5m
retry_interval = check_interval
# Different query and thresholds.
vars.prometheus_query = "predict_linear(node_filesystem_avail_bytes{instance=\"$host.name$\"}[1h], 48h)"
vars.prometheus_query_warning = "0:"
vars.prometheus_query_critical = "0:"
}
Conclusion
In this article, we focused on executing queries from Icinga using check_prometheus. However, this is only one aspect of what the plugin can do. It can also be used to monitor the health of the Prometheus server itself and to evaluate the status of Prometheus alerts. For these advanced use cases and additional configuration options, refer to the official documentation of check_prometheus.
If your use case works the other way around and you want to expose Icinga check results to Prometheus, you might want to check our article Icinga Event Streams, which describes how to forward Icinga events and metrics into external systems for further processing.






