How to do Agentless Monitoring with check_by_ssh

by | Feb 19, 2025

The fundamentals of Icinga 2 are check plugins. They are being executed and their return value is mapped to either Host or Service objects. Everything else follows on top.

These check plugins can be either from the Monitoring Plugins or custom. While their origin does not matter, they are the building blocks of an Icinga monitoring stack. If a plugin goes CRITICAL, Icinga 2 alerts the sysadmin.

When Icinga is monitoring another machine, there are check plugins that need to run on that machine, e.g., check_disk. Typically, this remote execution works by having the Icinga Agent installed on the machine being monitored. A connection will be established between the Icinga 2 Master and the Icinga 2 Agent, resulting in the scheduled check being executed by the Agent.

But then there are situations where installing the Icinga 2 Agent on each machine does not work. This can be due to the unavailability of Icinga 2 for the target platform, limited permissions, company policy, or a general refusal to install additional software. Whatever the reason, there are plenty of distributed setups without Icinga 2 installed on every machine.

When deciding to not use the Icinga 2 Agent, there are alternatives. The most prominent for Unix-like operating systems would be SSH.

check_by_ssh

SSH is the default remote shell for most modern Unix-like operating systems. An SSH daemon runs on the server and allows remote connections to either use a shell or just execute certain commands. This service can also be used by Icinga 2 to execute check plugins.

While no remote Icinga 2 Agent is required, an SSH daemon like OpenSSH is still needed. Whether this is truly “agentless” is up to everyone to decide for themself.

Whatsoever, the Monitoring Plugins are shipping a check_by_ssh plugin and there is an Icinga Template Library (ITL) CheckCommand named by_ssh. Creating a Service and setting the check_command = "by_ssh" is enough to get started.

Compared to using the Icinga 2 Agent, there is a limitation when planning the network architecture. While an Icinga 2 Agent can connect to the Icinga 2 Master server or the other way around, check_by_ssh only works with a connection established from the Master to the Agent (or endpoint in this case). So the host and the SSH port must be reachable for the Icinga 2 Master. If it is not possible to expose this host to a common network (the largest common network might be the Internet), a VPN could be used, which also allows not exposing the SSH daemon to a wider network and potential attackers.

Preparing Both The Icinga Master and The Endpoints

Before getting started, an SSH key pair needs to be generated on the Icinga 2 Master. This SSH key is used to authenticate the Icinga 2 Master to the endpoints. Therefore, it should be both kept secure and backed up.

Following the SSH-related Icinga 2 documentation, a new key pair can be generated by the icinga user (might be named nagios on Debian). Contrary to the documentation, I would create an ed25519 key. However, if you are monitoring legacy devices – mostly older routers or switches – you may want to use RSA 4096 instead and potentially even need to alter the SSH configuration to allow outdated HostkeyAlgorithms or PubkeyAcceptedAlgorithms again.

Please do not add a passphrase to the SSH private key, as otherwise it would be harder to let Icinga 2 automatically use it. In theory, ssh-agent(1) could be used, but this would require additional steps not covered in this blog post.

$ # Use a shell as the icinga user, might be nagios on Debian instead.
$ whoami
icinga

$ # Create a keypair w/o a passphrase.
$ ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter passphrase for ".ssh/id_ed25519" (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in .ssh/id_ed25519
Your public key has been saved in .ssh/id_ed25519.pub
The key fingerprint is:
SHA256:cNOipaM/fpgYmsz7g/7ftp7Ix76Wdy+GvpIQubWeF8Y icinga@icinga.example.com
The key's randomart image is:
+--[ED25519 256]--+
|                 |
|         .       |
|      ..= .      |
|      o*.o       |
|      ++So       |
|    ..o.. E      |
| o +.o * = o     |
|  * oo+o%.+ +    |
| .o+o+*OBBo+ o.  |
+----[SHA256]-----+

On the endpoint to be monitored, a new system user to be used by Icinga 2 must be created, let’s call it icinga. The created public key ~icinga/.ssh/id_ed25519.pub from the Icinga 2 Master needs to be present on the endpoint for the monitoring user icinga in their ~icinga/.ssh/authorized_keys file. This will allow the Icinga 2 Master to log in to the endpoint as the icinga user via SSH.

The other way around, the endpoint’s SSH server public key must to be stored on the Icinga 2 Master server in order to authenticate the endpoint and to ensure that no machine-in-the-middle (MITM) attack can occur. This key must be stored in the ~icinga/.ssh/known_hosts file.

There are several ways to populate the endpoint’s ~icinga/.ssh/authorized_keys and Icinga 2 Master’s ~icinga/.ssh/known_hosts file. A manual one would be by using the ssh-copy-id helper script that is often packaged together with OpenSSH. This is well described in our documentation. Alternatively, configuration management or infrastructure as code tools like Ansible or Puppet can be used to deploy the keys. In the end, the method used will depend on the number of endpoints and the tools the sysadmin is most comfortable with.

After exchanging keys, verify that the connection works from a shell as the icinga user on the Icinga 2 Master host.

$ ssh icinga@endpoint.example.com whoami
icinga

Changes In The Icinga 2 Configuration

The Distributed Monitoring documentation emphasizes the concept of Endpoint objects and setting the command_endpoint attribute accordingly for Host and Service objects. When using the Icinga 2 Agent, this is necessary as an Endpoint represents a remote Icinga 2 instance and the command_endpoint schedules a check to be executed on that Endpoint.

This is not necessary when using SSH. In fact, it must not be done.

For Icinga 2, executing check plugins via SSH is not a remote task to be scheduled on another Icinga 2, it is just running the check_by_ssh on itself. The check plugin contains the logic to execute another check plugin remotely, but that happens outside of Icinga 2.

Thus, when executing checks via SSH, no Endpoint objects are required and the command_endpoint should be omitted, except when using a satellite-based setup, which is not covered in this post.

Custom CheckCommands

Icinga 2 comes with the Icinga Template Library which contains many predefined CheckCommands. Using checks from the ITL does not work by default when using SSH.

The check_by_ssh check plugin mentioned above is also part of the ITL as by_ssh. Similar to a CheckCommand, it supports the by_ssh_command and by_ssh_arguments custom variables, allowing a check to be wrapped within by_ssh.

But first start with the Host objects and unify the common parts with a template. This ssh-agent-host template sets agent_type = "ssh" next to other custom variables for by_ssh. The additional by_ssh_path will be used later to specify where the check plugins reside on the endpoint, as this might differ between operating systems.

The two hosts are quite similar, but the second uses a non-standard SSH port and does not set a by_ssh_path, expecting the check plugins to be available in the $PATH variable.

template Host "ssh-agent-host" {
  import "generic-host"

  vars.agent_type = "ssh"

  vars.by_ssh_address = name
  vars.by_ssh_logname = "icinga"
  vars.by_ssh_path = ""
}

object Host "endpoint1.example.com" {
  import "ssh-agent-host"

  address6 = "2001:db8::23"

  vars += {
    "os" = "OpenBSD"
    "by_ssh_path" = "/usr/local/libexec/nagios/"
  }
}

object Host "endpoint2.example.com" {
  import "ssh-agent-host"

  address = "192.0.2.42"
  address6 = "2001:db8::42"

  vars += {
    "os" = "Linux"
    "by_ssh_port" = 4223
  }
}

Following this Host definition, a trivial Service can be created that calls check_disk via by_ssh on every ssh-agent-host.

apply Service "disk" {
  check_command = "by_ssh"

  vars.by_ssh_command = [ "$by_ssh_path$check_disk", "-c", "10%", "-w", "20%" ]

  assign where host.vars.agent_type == "ssh"
}

However, this uses hardcoded parameters and misses the configurability through custom vars. The parameters can be extracted into by_ssh_arguments as the follows.

apply Service "disk" {
  check_command = "by_ssh"

  vars.by_ssh_command = [ "$by_ssh_path$check_disk" ]
  vars.by_ssh_arguments = {
    "-w" = { value = "$disk_wfree$" }
    "-c" = { value = "$disk_cfree$" }
  }

  vars.disk_wfree = "20%"
  vars.disk_cfree = "10%"
  vars += host.vars

  assign where host.vars.agent_type == "ssh"
}

This also allows setting a custom disk_wfree or disk_cfree value for a Host.

object Host "endpoint2.example.com" {
  # [ . . . ]
  vars += {
    "disk_wfree" = "50%"
  }
}

But it is possible to go further and reuse predefined ITL definitions by using the get_check_command function. This version uses both the original command – after substituting its path with by_ssh_path – and its arguments.

apply Service "disk" {
  check_command = "by_ssh"

  vars.by_ssh_command = {{
    var cmd = get_check_command("disk").command
    cmd[0] = macro("$by_ssh_path$") + basename(cmd[0])
    return cmd
  }}
  vars.by_ssh_arguments = {{ get_check_command("disk").arguments }}

  vars.disk_wfree = "20%"
  vars.disk_cfree = "10%"
  vars += host.vars

  assign where host.vars.agent_type == "ssh"
}

Now it is possible to generalize this into a template, allowing easier SSH-based Service creation inspired by the ITL. Note that within the Service, the check_command must to be defined before the import, as this command will be backed up in the template.

template Service "generic-ssh-service" {
  vars.by_ssh_check_command = check_command
  check_command = "by_ssh"

  vars.by_ssh_command = {{
    var cmd = get_check_command(macro("$by_ssh_check_command$")).command
    cmd[0] = macro("$by_ssh_path$") + basename(cmd[0])
    return cmd
  }}
  vars.by_ssh_arguments = {{ get_check_command(macro("$by_ssh_check_command$")).arguments }}
}

apply Service "disk" {
  check_command = "disk"
  import "generic-ssh-service"

  vars.disk_wfree = "20%"
  vars.disk_cfree = "10%"
  vars += host.vars

  assign where host.vars.agent_type == "ssh"
}

For demonstration purposes, repeat this for other check plugins from the ITL.

apply Service "swap" {
  check_command = "swap"
  import "generic-ssh-service"

  vars.swap_wfree = 50
  vars.swap_cfree = 25
  vars += host.vars

  assign where host.vars.agent_type == "ssh"
}

apply Service "load" {
  check_command = "load"
  import "generic-ssh-service"

  vars.load_wload1 = 5.0
  vars.load_wload5 = 4.0
  vars.load_wload15 = 3.0
  vars.load_cload1 = 10.0
  vars.load_cload5 = 6.0
  vars.load_cload15 = 4.0
  vars += host.vars

  assign where host.vars.agent_type == "ssh"
}

Persistent SSH Connection

One downside of using check_by_ssh is that for every check Icinga 2 performs, another SSH connection will be established. With a large number of services to check, this number can be impressive, even resulting in too many connections and lots of login log entries on the endpoints.

Fortunately, OpenSSH allows sharing multiple sessions over a single network connection. By setting ControlMaster, ControlPath, and ControlPersist accordingly, only one SSH connection will be used per host. In short, a connection is kept open in the background and can be referenced via the socket at ControlPath.

These parameters can be specified via by_ssh_option. The ControlPath allows variables or tokens. Using %C includes a hash of the remote hostname, port and other useful arguments.

vars.by_ssh_options = [
  "ControlMaster=auto",
  "ControlPath=/tmp/.icinga_ssh_%C",
  "ControlPersist=30m"
]

Another useful by_ssh custom variable is by_ssh_timeout, which one might want to increment if long running checks are being used.

The final template might look like this.

template Service "generic-ssh-service" {
  vars.by_ssh_check_command = check_command
  check_command = "by_ssh"

  vars.by_ssh_command = {{
    var cmd = get_check_command(macro("$by_ssh_check_command$")).command
    cmd[0] = macro("$by_ssh_path$") + basename(cmd[0])
    return cmd
  }}
  vars.by_ssh_arguments = {{ get_check_command(macro("$by_ssh_check_command$")).arguments }}

  vars.by_ssh_timeout = "600"
  vars.by_ssh_options = [
    "ControlMaster=auto",
    "ControlPath=/tmp/.icinga_ssh_%C",
    "ControlPersist=30m"
  ]
}

Closing

Whether or not to use SSH-based checks should be evaluated on a case-by-case basis. Doing so will result in losing some of the benefits of the Icinga 2 Agent, from remote scheduling to the ITL. On the other hand, it allows monitoring devices not directly supported by Icinga or without having to install another service.

Since there are many setups that use this approach out there, this blog post will hopefully provide some additional ideas on how to improve such a setup.

You May Also Like…

Subscribe to our Newsletter

A monthly digest of the latest Icinga news, releases, articles and community topics.