Icinga 2 API and debug console

by | Feb 28, 2024

Have you ever experienced configuration issues, such as notifications not being sent as expected or apply rules not matching all expected objects, probably due to an incorrectly set custom variable? Icinga 2 has several options to assist you in such situations. Last time, Julian demonstrated how to analyse such problems using the icinga2 object list command. Today I will show you how to interactively investigate your problem using the mighty Icinga 2 debug console.

Prerequisites

The debug console lets you gain live access to the entire monitoring infrastructure thus maliciously paralyse everything. Therefore, you will need an authorised API user with permission to access the console. In addition, you need some experience with Icinga 2 DSL as well, to be able to solve your problem in a non-frustrating way and also not to unwittingly mess up your Icinga 2 daemon.

Next, start the icinga2 console with the connect parameter. You can specify the API credentials via the shell environments instead of providing them directly as part of the url string. You can also specify the --sandboxed parameter at the end if you don’t intend to change anything live or don’t want to alter something accidentally. Doing so will prevent you from intentionally or unintentionally modifying things in your running Icinga 2 process.

$ ICINGA2_API_USERNAME=root ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://localhost:5665/'

Service stuck in a pending state?

Have you ever had a strange Icinga 2 problem, be it a single node or HA setup, where one or more services got stuck in a pending state? There are several factors that can lead to such a situation, e.g. a synchronisation issue between your two masters, or your IDO is too slow to keep up with the number of checks. However, using the debug console, you can identify the problem fairly easily in most cases. You first need to check whether the service has ever been checked by Icinga 2.

<15> => var s = get_service("example-1", "realtime-load")
null
<16> => s.last_check_result
null
<17> => s.last_check
-1.000000

You can see from above that the ‘realtime-load’ service has never been checked and therefore has no check result to look at. This can only happen if either the checker has not yet been enabled or the service is within a different child zone. We can easily check this by querying the zone attribute of the service.

<18> => s.zone
"agent"

In my case, the service is in “agent” zone. We can now have a look at the logs to see why the agent is no longer performing checks or retrieve the status of the endpoint directly here.

<19> => var endpoint = get_object(Endpoint, "agent")
null
<20> => endpoint.connected
false

Well, obviously my endpoint isn’t even connected yet. However, that is just a trivial example, and if you’re using the Icinga check, you should have already noticed that the endpoint is gone.

Overdue checks

To determine how many hosts or services in your Icinga 2 setup are currently in overdue, you can do the following. It looks a bit complex and overly technical, but if you are familiar with Icinga 2 DSL it should be fairly straightforward.

<21> => var hosts = get_objects(Host).filter(h => h.last_check < get_time() - 2 * h.check_interval).map(h => h.name)
null
<22> => hosts.len()
277.000000
<23> => hosts
[ "example-248", "example-373", "example-74", "example-148", ...

The above command searches for all hosts in your environment known by the master endpoint and whose last_check exceeds the configured check_interval by at least a factor of two. Next, we map these by their names and store the result in hosts variable. Note, these are all hosts (including those that aren’t in the master zone) that have surpassed their configured check_interval without being checked.

Alternatively, if you want to filter for hosts that are only in a specific zone, use the following expression.

<24> => var hosts = get_objects(Host).filter(h => h.last_check < get_time() - 2 * h.check_interval && h.zone == "satellite").map(h => h.name)
null
<25> => hosts.len()
277.000000

It is now additionally filtered using the && h.zone == "satellite" check. In my example, all overdue hosts are in the satellite zone.

Conclusion

Although, I’ve just shown you trivial examples, my intention with this blog is to give you an impression of what it is capable of. Getting familiar with it could possibly pay off at some point and save you some headaches. Here you can find all the things that you can use in the context of the Icinga 2 debug console + API.

You May Also Like…

Code Reviews – How do they work?

Code Reviews – How do they work?

We at Icinga / NETWAYS (yes, that’s the order) held an internal event recently. It’s name was Knowledge Days and I got...

Subscribe to our Newsletter

A monthly digest of the latest Icinga news, releases, articles and community topics.