Calculating a state over multiple services

by | Apr 29, 2021

These days many setups have a lot of redundancy and you may not want to send notifications during the night, just because one of multiple http servers has a problem. This blog post will show you how to setup a single service with a state combining multiple other services.

Preparation

Before we begin we’ll need a few services to combine the states from. For this example we will just create 20 hosts and apply a service to them. Note the http_cluster variable, this variable will be used later to assign our services to the combined service.

for (id in range(20)) { 
    object Host "http-host-" + id {
        check_command = "dummy" 
        vars.dummy_state = 0 
        vars.check_http = true 

        // Needed for our combined service
        vars.http_cluster = "http-cluster-1" 
    } 
} 

apply Service "http" {
    check_command = "random"
    assign where host.vars.check_http
}

We will also need a dummy host to assign our combined service to.

object Host "combined-host" { 
   check_command = "dummy" 
   vars.dummy_state = 0 
}

 

Functions

Before we begin writing our service, we will also need to create some helper functions. stateToString() can convert our state integers (0-3) into the correct name of the state (OK, WARNING, CRITICAL and UNKNOWN).

function stateToString(state) {
    if (state == 0) {
        return "OK"
    } else if (state == 1) {
        return "WARNING"
    } else if (state == 2) {
        return "CRITICAL"
    } else if (state == 3) {
        return "UNKNOWN"
    }
}

Our second function, getServiceStatesByHttpCluster() gets every services that belongs to our http cluster (vars.http_cluster) and returns the total count of services and the services names sorted by their state.

function getServiceStatesByHttpCluster(cluster) {
    // Prepare a dictionary for counting every service in our cluster and sorting them by state
    var services = {
        count = 0
        serviceStates = {
            "0" = [],
            "1" = [],
            "2" = [],
            "3" = [],
        }
    }

    // Iterate over every service object
    for (var service in (get_objects(Service))) {
        // Check if the http_cluster of the services host matches our cluster
        if (service.host.vars.http_cluster == cluster) {
            // Increase our service count by one
            services.count += 1

            // Get the the services current state
            var state = service.last_check_result.state

            // Add the full service name ("host!service") to corresponding array in services.serviceStates
            services.serviceStates[state].add(service.host.name + "!" + service.name)
        }
    }

    // Return our "services" dictionary
    return services
}

Service Object

After having prepared our functions and objects to calculate our state from, we can finally create our service. Here we’re using the internal check command dummy and it’s variables dummy_state and dummy_text. This allows is to assign functions that will be evaluated on every execution of our check.

object Service "combined-http" {
    check_command = "dummy"
    check_interval = 1m
    retry_interval = 30s
    host_name = "combined-host"

    // The http cluster we want to have combined states from (we've also set this variable on our services)
    vars.http_cluster = "http-cluster-1"
    // The minimum ratio of services that have to be in state OK (0.5 means at least 50% need to be OK)
    vars.ok_min_ratio = 0.5

    // Store our current service object in variable to use it in function scope below
    var service = this

    // Functions stored in the variables dummy_state and dummy_text are evaluated on every execute of the check.
    vars.dummy_state = function() use (service) {
        // Get our services dictionary by calling our previously defined function
        var services = getServiceStatesByHttpCluster(service.vars.http_cluster)

        // Calculate the ratio of services with state OK compared to the total amount of services
        var ratio = services.serviceStates[0].len() / states.count

        // If the ratio is less then what we defined as our minimum, return CRITICAL as state, OK otherwise
        if (ratio < service.vars.ok_min_ratio) {
            return 2
        } else {
            return 0
        }
    }
    vars.dummy_text = function() use (service) {
        // Get our services dictionary by calling our previously defined function
        var services = getServiceStatesByHttpCluster(service.vars.http_cluster)
        
        // Calculate the ratio of services with state OK compared to the total amount of services
        var ratio = services.serviceStates[0].len() / states.count

        // Define an empty string variable which will later contain our status output
        var text = ""

        // If the ratio is less then what we defined as our minimum, add "CRITICAL: " to our output, "OK: " otherwise
        if (ratio < service.vars.ok_min_ratio) {
            text = "CRITICAL: "
        } else {
            text = "OK: "
        }

        // Add the amount of services in state OK and the total amount of services to our output (e.g. "5/20")
        text += services.serviceStates[0].len() + "/" + states.count + " OK\n"
        
        // Iterate over all state types and print the services with those states
        for (state in range(0, 3)) {
            // Check if we even have services with this state
            if (services.serviceStates[state].len() > 0) {
                // Add the state name to our output
                text += stateToString(state) + ":\n"

                // Iterate over all the services with this state and output their name
                for (serviceName in services.serviceStates[state]) {
                    text += serviceName + "\n"
                }
                text += "\n"
            }
        }

        // Return the final output
        return text
    }
}

 

Result

And finally, we have a services that combines the states of multiple services into one which can even be configured by changing the minimum OK state ratio.

This is a basic example on how to combine multiple service states into one service and serves as an example on what’s possible with the Icinga config language. This can be expanded on endlessly by adding custom filters and conditions for when the service should be in a specific state.

The shown approach is ideal for users focused on configuring Icinga 2 through config files. If that’s something you generally don’t want to do, take a look the Icinga Web 2 Business Process Module. It provides you with similar options, a dashboard and a graphical interface to help you with configuring your dependencies.

You May Also Like…

Code Reviews – How do they work?

Code Reviews – How do they work?

We at Icinga / NETWAYS (yes, that’s the order) held an internal event recently. It’s name was Knowledge Days and I got...

Subscribe to our Newsletter

A monthly digest of the latest Icinga news, releases, articles and community topics.