The fundamentals of Icinga 2 are check plugins. They are being executed and their return value is mapped to either Host
or Service
objects. Everything else follows on top.
These check plugins can be either from the Monitoring Plugins or custom. While their origin does not matter, they are the building blocks of an Icinga monitoring stack. If a plugin goes CRITICAL
, Icinga 2 alerts the sysadmin.
When Icinga is monitoring another machine, there are check plugins that need to run on that machine, e.g., check_disk
. Typically, this remote execution works by having the Icinga Agent installed on the machine being monitored. A connection will be established between the Icinga 2 Master and the Icinga 2 Agent, resulting in the scheduled check being executed by the Agent.
But then there are situations where installing the Icinga 2 Agent on each machine does not work. This can be due to the unavailability of Icinga 2 for the target platform, limited permissions, company policy, or a general refusal to install additional software. Whatever the reason, there are plenty of distributed setups without Icinga 2 installed on every machine.
When deciding to not use the Icinga 2 Agent, there are alternatives. The most prominent for Unix-like operating systems would be SSH.
check_by_ssh
SSH is the default remote shell for most modern Unix-like operating systems. An SSH daemon runs on the server and allows remote connections to either use a shell or just execute certain commands. This service can also be used by Icinga 2 to execute check plugins.
While no remote Icinga 2 Agent is required, an SSH daemon like OpenSSH is still needed. Whether this is truly “agentless” is up to everyone to decide for themself.
Whatsoever, the Monitoring Plugins are shipping a check_by_ssh
plugin and there is an Icinga Template Library (ITL) CheckCommand
named by_ssh
. Creating a Service
and setting the check_command = "by_ssh"
is enough to get started.
Compared to using the Icinga 2 Agent, there is a limitation when planning the network architecture. While an Icinga 2 Agent can connect to the Icinga 2 Master server or the other way around, check_by_ssh
only works with a connection established from the Master to the Agent (or endpoint in this case). So the host and the SSH port must be reachable for the Icinga 2 Master. If it is not possible to expose this host to a common network (the largest common network might be the Internet), a VPN could be used, which also allows not exposing the SSH daemon to a wider network and potential attackers.
Preparing Both The Icinga Master and The Endpoints
Before getting started, an SSH key pair needs to be generated on the Icinga 2 Master. This SSH key is used to authenticate the Icinga 2 Master to the endpoints. Therefore, it should be both kept secure and backed up.
Following the SSH-related Icinga 2 documentation, a new key pair can be generated by the icinga
user (might be named nagios
on Debian). Contrary to the documentation, I would create an ed25519 key. However, if you are monitoring legacy devices – mostly older routers or switches – you may want to use RSA 4096 instead and potentially even need to alter the SSH configuration to allow outdated HostkeyAlgorithms
or PubkeyAcceptedAlgorithms
again.
Please do not add a passphrase to the SSH private key, as otherwise it would be harder to let Icinga 2 automatically use it. In theory, ssh-agent(1)
could be used, but this would require additional steps not covered in this blog post.
$ # Use a shell as the icinga user, might be nagios on Debian instead. $ whoami icinga $ # Create a keypair w/o a passphrase. $ ssh-keygen -t ed25519 Generating public/private ed25519 key pair. Enter passphrase for ".ssh/id_ed25519" (empty for no passphrase): Enter same passphrase again: Your identification has been saved in .ssh/id_ed25519 Your public key has been saved in .ssh/id_ed25519.pub The key fingerprint is: SHA256:cNOipaM/fpgYmsz7g/7ftp7Ix76Wdy+GvpIQubWeF8Y icinga@icinga.example.com The key's randomart image is: +--[ED25519 256]--+ | | | . | | ..= . | | o*.o | | ++So | | ..o.. E | | o +.o * = o | | * oo+o%.+ + | | .o+o+*OBBo+ o. | +----[SHA256]-----+
On the endpoint to be monitored, a new system user to be used by Icinga 2 must be created, let’s call it icinga
. The created public key ~icinga/.ssh/id_ed25519.pub
from the Icinga 2 Master needs to be present on the endpoint for the monitoring user icinga
in their ~icinga/.ssh/authorized_keys
file. This will allow the Icinga 2 Master to log in to the endpoint as the icinga
user via SSH.
The other way around, the endpoint’s SSH server public key must to be stored on the Icinga 2 Master server in order to authenticate the endpoint and to ensure that no machine-in-the-middle (MITM) attack can occur. This key must be stored in the ~icinga/.ssh/known_hosts
file.
There are several ways to populate the endpoint’s ~icinga/.ssh/authorized_keys
and Icinga 2 Master’s ~icinga/.ssh/known_hosts
file. A manual one would be by using the ssh-copy-id
helper script that is often packaged together with OpenSSH. This is well described in our documentation. Alternatively, configuration management or infrastructure as code tools like Ansible or Puppet can be used to deploy the keys. In the end, the method used will depend on the number of endpoints and the tools the sysadmin is most comfortable with.
After exchanging keys, verify that the connection works from a shell as the icinga
user on the Icinga 2 Master host.
$ ssh icinga@endpoint.example.com whoami icinga
Changes In The Icinga 2 Configuration
The Distributed Monitoring documentation emphasizes the concept of Endpoint
objects and setting the command_endpoint
attribute accordingly for Host
and Service
objects. When using the Icinga 2 Agent, this is necessary as an Endpoint
represents a remote Icinga 2 instance and the command_endpoint
schedules a check to be executed on that Endpoint
.
This is not necessary when using SSH. In fact, it must not be done.
For Icinga 2, executing check plugins via SSH is not a remote task to be scheduled on another Icinga 2, it is just running the check_by_ssh
on itself. The check plugin contains the logic to execute another check plugin remotely, but that happens outside of Icinga 2.
Thus, when executing checks via SSH, no Endpoint
objects are required and the command_endpoint
should be omitted, except when using a satellite-based setup, which is not covered in this post.
Custom CheckCommands
Icinga 2 comes with the Icinga Template Library which contains many predefined CheckCommands
. Using checks from the ITL does not work by default when using SSH.
The check_by_ssh
check plugin mentioned above is also part of the ITL as by_ssh
. Similar to a CheckCommand
, it supports the by_ssh_command
and by_ssh_arguments
custom variables, allowing a check to be wrapped within by_ssh
.
But first start with the Host
objects and unify the common parts with a template. This ssh-agent-host
template sets agent_type = "ssh"
next to other custom variables for by_ssh
. The additional by_ssh_path
will be used later to specify where the check plugins reside on the endpoint, as this might differ between operating systems.
The two hosts are quite similar, but the second uses a non-standard SSH port and does not set a by_ssh_path
, expecting the check plugins to be available in the $PATH
variable.
template Host "ssh-agent-host" { import "generic-host" vars.agent_type = "ssh" vars.by_ssh_address = name vars.by_ssh_logname = "icinga" vars.by_ssh_path = "" } object Host "endpoint1.example.com" { import "ssh-agent-host" address6 = "2001:db8::23" vars += { "os" = "OpenBSD" "by_ssh_path" = "/usr/local/libexec/nagios/" } } object Host "endpoint2.example.com" { import "ssh-agent-host" address = "192.0.2.42" address6 = "2001:db8::42" vars += { "os" = "Linux" "by_ssh_port" = 4223 } }
Following this Host
definition, a trivial Service
can be created that calls check_disk
via by_ssh
on every ssh-agent-host
.
apply Service "disk" { check_command = "by_ssh" vars.by_ssh_command = [ "$by_ssh_path$check_disk", "-c", "10%", "-w", "20%" ] assign where host.vars.agent_type == "ssh" }
However, this uses hardcoded parameters and misses the configurability through custom vars. The parameters can be extracted into by_ssh_arguments
as the follows.
apply Service "disk" { check_command = "by_ssh" vars.by_ssh_command = [ "$by_ssh_path$check_disk" ] vars.by_ssh_arguments = { "-w" = { value = "$disk_wfree$" } "-c" = { value = "$disk_cfree$" } } vars.disk_wfree = "20%" vars.disk_cfree = "10%" vars += host.vars assign where host.vars.agent_type == "ssh" }
This also allows setting a custom disk_wfree
or disk_cfree
value for a Host
.
object Host "endpoint2.example.com" { # [ . . . ] vars += { "disk_wfree" = "50%" } }
But it is possible to go further and reuse predefined ITL definitions by using the get_check_command
function. This version uses both the original command – after substituting its path with by_ssh_path
– and its arguments.
apply Service "disk" { check_command = "by_ssh" vars.by_ssh_command = {{ var cmd = get_check_command("disk").command cmd[0] = macro("$by_ssh_path$") + basename(cmd[0]) return cmd }} vars.by_ssh_arguments = {{ get_check_command("disk").arguments }} vars.disk_wfree = "20%" vars.disk_cfree = "10%" vars += host.vars assign where host.vars.agent_type == "ssh" }
Now it is possible to generalize this into a template, allowing easier SSH-based Service
creation inspired by the ITL. Note that within the Service
, the check_command
must to be defined before the import
, as this command will be backed up in the template.
template Service "generic-ssh-service" { vars.by_ssh_check_command = check_command check_command = "by_ssh" vars.by_ssh_command = {{ var cmd = get_check_command(macro("$by_ssh_check_command$")).command cmd[0] = macro("$by_ssh_path$") + basename(cmd[0]) return cmd }} vars.by_ssh_arguments = {{ get_check_command(macro("$by_ssh_check_command$")).arguments }} } apply Service "disk" { check_command = "disk" import "generic-ssh-service" vars.disk_wfree = "20%" vars.disk_cfree = "10%" vars += host.vars assign where host.vars.agent_type == "ssh" }
For demonstration purposes, repeat this for other check plugins from the ITL.
apply Service "swap" { check_command = "swap" import "generic-ssh-service" vars.swap_wfree = 50 vars.swap_cfree = 25 vars += host.vars assign where host.vars.agent_type == "ssh" } apply Service "load" { check_command = "load" import "generic-ssh-service" vars.load_wload1 = 5.0 vars.load_wload5 = 4.0 vars.load_wload15 = 3.0 vars.load_cload1 = 10.0 vars.load_cload5 = 6.0 vars.load_cload15 = 4.0 vars += host.vars assign where host.vars.agent_type == "ssh" }
Persistent SSH Connection
One downside of using check_by_ssh
is that for every check Icinga 2 performs, another SSH connection will be established. With a large number of services to check, this number can be impressive, even resulting in too many connections and lots of login log entries on the endpoints.
Fortunately, OpenSSH allows sharing multiple sessions over a single network connection. By setting ControlMaster
, ControlPath
, and ControlPersist
accordingly, only one SSH connection will be used per host. In short, a connection is kept open in the background and can be referenced via the socket at ControlPath
.
These parameters can be specified via by_ssh_option
. The ControlPath
allows variables or tokens. Using %C
includes a hash of the remote hostname, port and other useful arguments.
vars.by_ssh_options = [ "ControlMaster=auto", "ControlPath=/tmp/.icinga_ssh_%C", "ControlPersist=30m" ]
Another useful by_ssh
custom variable is by_ssh_timeout
, which one might want to increment if long running checks are being used.
The final template might look like this.
template Service "generic-ssh-service" { vars.by_ssh_check_command = check_command check_command = "by_ssh" vars.by_ssh_command = {{ var cmd = get_check_command(macro("$by_ssh_check_command$")).command cmd[0] = macro("$by_ssh_path$") + basename(cmd[0]) return cmd }} vars.by_ssh_arguments = {{ get_check_command(macro("$by_ssh_check_command$")).arguments }} vars.by_ssh_timeout = "600" vars.by_ssh_options = [ "ControlMaster=auto", "ControlPath=/tmp/.icinga_ssh_%C", "ControlPersist=30m" ] }
Closing
Whether or not to use SSH-based checks should be evaluated on a case-by-case basis. Doing so will result in losing some of the benefits of the Icinga 2 Agent, from remote scheduling to the ITL. On the other hand, it allows monitoring devices not directly supported by Icinga or without having to install another service.
Since there are many setups that use this approach out there, this blog post will hopefully provide some additional ideas on how to improve such a setup.