Making Use of Previous State in Icinga2 Check Commands

by | May 22, 2024

When writing a custom check plugin for Icinga 2, there are situations where in addition to observing the current state of a system, taking the past into account as well can be helpful. A common case for this is when the data source provides counter values, i.e. values that increase over time and you are less interested in the current value but more in how it changes. An example for this are the network interface counters on Linux: if you want to know the data rate on an interface, you need to read a byte counter at two different times and compute the rate from that.

A related and very simple counter will serve as an example for this blog post: Linux provides a virtual file carrier_changes for each network interface where it counts how often the link state has changed between up and down. With this information, one can write a check that returns a critical state when this value increases as this could be a sign of an unstable connection. If we assume that we can pass the previous value as an argument to the check command, the following Bash script could be used for this purpose. In lines 4 to 14, it simply reads the command line arguments -i and -p into the shell variables INTERFACE and PREVIOUS_CARRIER_CHANGES respectively. Line 16 reads the current counter value and the rest of the script compares both values and generates a corresponding message and exit code.

#!/usr/bin/env bash

set -eu

while getopts hi:p: OPT; do
    case "$OPT" in
        i) INTERFACE=$OPTARG ;;
        p) PREVIOUS_CARRIER_CHANGES=$OPTARG ;;
        h|*)
            echo "Usage: $0 -i <interface> -p <previous carrier_changes value>" >&2
            exit 3
            ;;
    esac
done

CARRIER_CHANGES=$(<"/sys/class/net/$INTERFACE/carrier_changes") || exit 3

if [ "$CARRIER_CHANGES" -gt "$PREVIOUS_CARRIER_CHANGES" ]; then
    msg="CRITICAL: $((CARRIER_CHANGES - PREVIOUS_CARRIER_CHANGES)) interface carrier changes on $INTERFACE since last check"
    result=2
else
    msg="OK: no interface carrier changes on $INTERFACE since last check"
    result=0
fi

echo "$msg | carrier_changes=${CARRIER_CHANGES}c"
exit "$result"

So when the current carrier_changes value is 4 and the script is called with a previous value parameter of -p 1 for example, it will report an error. If it’s called with a matching -p 4 instead, it will report OK:

root@my-host:~# cat /sys/class/net/eth0/carrier_changes 
4
root@my-host:~# ./check_linux_carrier_changes.sh -i eth0 -p 1
CRITICAL: 3 interface carrier changes on eth0 since last check | carrier_changes=4c
root@my-host:~# ./check_linux_carrier_changes.sh -i eth0 -p 4
OK: no interface carrier changes on eth0 since last check | carrier_changes=4c

The only remaining question is how the -p argument is set accordingly for the check command. As you may have noticed, the check script also returns the raw counter value as a performance data value. This can be combined with the Icinga 2 feature that allows to dynamically generate the check command arguments. The following CheckCommand definition makes use of this by defining a lambda function extracts the corresponding performance data value from the last check result:

object CheckCommand "linux-carrier-changes" {
	command = ["/path/to/check_linux_carrier_changes.sh"]  
	arguments = {
		"-i" = "$linux_carrier_changes_interface$"
		"-p" = "$linux_carrier_changes_previous$"
	}

	vars.linux_carrier_changes_previous = {{
		var last = macro("$last_check_result$")
		if (last && last.performance_data) {
			for (var p in last.performance_data.map(parse_performance_data)) {
				if (p.label == "carrier_changes") {
					return p.value
				}
			}
		}
		return 0
	}}
}

This can then be used without much extra work in a Service object. The only somewhat unusual setting is volatile = true. This is added due to the fact that this specific check script only reports critical for one execution and then automatically resets to OK.

object Service "carrier-changes-eth0" {
    host_name = "my-host"
    check_command = "linux-carrier-changes"
    volatile = true
    max_check_attempts = 1
    vars.linux_carrier_changes_interface = "eth0"
}

And there we have it: a check that reports an error when a network link reconnects:

Screenshot of a carrier changes service in Icinga Web that shows a critical check result due to 2 interface carrier changes

This can not only be used for accessing performance data, there is other information from the previous check that can be accessed using macro strings as well. There are also some macro shortcuts like $last_state$ for the previous state, $last_check$ for when the last check was executed, $output$ and $perfdata$ for the full output and performance data of the last check execution.

You May Also Like…

Code Reviews – How do they work?

Code Reviews – How do they work?

We at Icinga / NETWAYS (yes, that’s the order) held an internal event recently. It’s name was Knowledge Days and I got...

Subscribe to our Newsletter

A monthly digest of the latest Icinga news, releases, articles and community topics.