New features in Icinga 2.3

The next major version of Icinga 2 will introduce a bunch of interesting features which should make it even easier to define exceptions for services (as in, all services have a check_interval of 5 minutes except for…). Version 2.3 won’t be available for another couple of weeks but here’s a quick introduction to some of its features:

Conditional Statements

More commonly known as if/else: In 2.3 it’s possible to set attributes based on whether some condition is true. Here’s an example:

object Host "localhost" {
  check_command = "hostalive"
  address = "127.0.0.1"
  vars.http_vhosts["icinga.org"] = {
    http_address = "icinga.org"
    interval = 1m
  }
  vars.http_vhosts["dev.icinga.com"] = {
    http_address = "dev.icinga.com"
  }
}
apply Service "vhost " for (vhost => config in host.vars.http_vhosts) {
  host_name = "localhost"
  check_command = "http"
  if (config.interval) {
    check_interval = config.interval
  } else {
    check_interval = 5m
  }
  assign where host.vars.http_vhosts
}

Debug Console

In order to make testing filter rules for “apply” as well as other expressions easier we have implemented a CLI-based console which can be used to evaluate arbitrary expressions and show their results:

$ icinga2 console
Icinga (version: v2.2.0-282-g9898971)
<1> => config = { http_address = "icinga.org", interval = 1m }
null
<2> => if (config.interval) { check_interval = config.interval } else { check_interval = 5m }
null
<3> => check_interval
60.0

Prototypes

All built-in data types (i.e. strings, numbers, arrays and dictionaries) now have their own methods. Here’s an example how we can use those methods to manipulate dictionaries:

<1> => vhosts = { "icinga.org" = { http_address = "icinga.org" },
"dev.icinga.com" = { http_address = "dev.icinga.com" } }
null
<2> => vhosts.remove("icinga.org")
null
<3> => vhosts
{"dev.icinga.com":{"http_address":"dev.icinga.com"}}
<4> => vhosts.len()
1.0

Using Dictionary#remove we can remove specific dictionary items which is rather useful if those dictionary keys should not be set for some hosts or services.

A Preview to Icinga Web 2

As we tinker away at Icinga Web 2, we’re finding it harder to keep it secret. So we thought: Why not show you what we have cooking under the hood?
Icinga Web 2 – Dashboard

Effortless Dashboards

When logging into Icinga Web 2 the first port of call will be your dashboard. These can be configured to show as many modules as you like – be it views, filters, or even multiple dashboards. Timestamps and status updates are refreshed automatically every second, rolling along smoothly saving your eyes the strain of flickering displays.
Icinga Web 2 - Colors
 

Problems and Problems Only

The way we see it, as sys admins we’re only ever interested in seeing where the problems are, and more importantly, which ones are still unhandled. To make it easier to make sense of problems, Icinga Web 2 will sort all views in order of urgency by default. Unhandled, new issues will be listed first with full colouring in strong shades, while acknowledged problems follow with partial colouring in mild shades. This way you know at a glance if something is in need of your attention.
Icinga Web 2 - PDF Export
 

Export of All Sorts

With just a click, you will be able to export all data given in a particular view in CSV, JSON and PDF formats. Similarly, a view can be filtered and sorted as you need and quickly added to a new dashboard from the same, easy-to-find menu.
 

Welcome to the Matrix

Always wanted to see your hosts and services on one page? Welcome to Icinga Web 2’s new matrix view. Filterable and scrollable in any direction, the matrix is particularly useful in large environments. Be it all hosts and services, just those with problems, or just problems yet to be acknowledged, the matrix can be filtered as you please.

Icinga Web 2 - Filtered Service Matrix

Write History

Want to see events from multiple months or an entire year at a glance? Not a problem. The screenshot below was generated from an environment that had a few million events in its database. This particular overview presents only critical events – which amounted to a good 160,000 in the past four months. Thanks to the graduated intensity of colour, it is easy to distinguish especially busy days from the rest.

Icinga Web 2 – Historical Overview

Neither cache nor buffer comes into play here, as Icinga Web 2 handpicks its data straight from the IDODB. A click on a particular day calls up a list of filtered events to the right of the overview. From here, they can be exported or further filtered.

Icinga Web 2 – Sending Commands

At Your Command

All the usual commands from Icinga Classic and Icinga Web will also be available in Icinga Web 2. However, instead of searching through a long, static list, Icinga Web 2 will offer you command links exactly where you need them. So you can run the next check right now with a single click.
What about downtimes, comments or acknowledgements? Create and remove them at a click, when viewing a host or service. Same goes for enabling and disabling checks and notifications. The same view shows you the current host and service settings compared to your configuration, and allows you to see the history of a related object, so that you can restore an accidentally deleted comment for example.
Icinga Web 2 – Login Screen
 

Interface On Speed

Icinga Web 2 is fast, impressively fast. This is evident from the login screen.
With an empty cache, just four requests are sent from the browser, amounting to a crazy total of 60KB to be loaded by the server. This is the HTML code, a large Icinga logo, Javascript for the application and various style sheets.
From this point on, no single JS or CSS file needs to be loaded.

Icinga Web 2 Performance – Chrome

Only because it is so cool, here it is again – this time in a different browser:

Icinga Web 2 Performance – Firefox

Everything that follows the username and password entry is just lean HTML snippets and a few icons. This means it takes just milliseconds from clicking on “login” to seeing a fully loaded dashboard. Even when all icons are still to be loaded, as little as 20KB needs to flow through the cables.

Icinga Web 2 is not only frugal in terms of bandwidth, but also when it comes to your browser’s memory – something not to be taken for granted in web applications that are constantly refreshing data. We want to be sure that there are no memory leaks or lost elements in DOM. Though we may not be immune to browser bugs, we are doing our best to ensure that you can enjoy your future Icinga Web 2 dashboard on your TV on the wall, for months on end without needing to restart your browser daily.

Curious?

Then have a play with our latest development versions on our GIT master. Your feedback is welcome.

Icinga 2 – Current state

icinga2_core_logoLong time no see, or, from an Icinga developer’s view – too busy developing in order to write a blog post 🙂 Anyways, since this blog post isn’t about the upcoming Icinga 1.10 release (which is due on OSMC this year including live demos during our presentation) I’ll skip the 1.x part here. Just a personal note to all those “2.x means 1.x development is dead” opinion makers – you’re wrong. Look at the roadmap or get in touch with your ideas for 1.x, tell the team about your concerns and help out even!
If you remember the 0.0.2 technology preview release of Icinga 2, the overall idea was to provide a feature equivalent version of what Icinga Core 1.x now provides for your monitoring. That included a compat component, writing status.dat/objects.cache/icinga.log serving necessary data for the Classic UI to be run standalone with Icinga 2. Livestatus was somehow prototyped but not yet ready, and neither was the IDO component as compatible database backend.
We have our regular development meetings on the Icinga 2 status where we also discuss the progress and to-dos. After 0.0.2 we decided to split to-dos into finishing the data providers for the LivestatusListener while designing and implementing a completely new IDO database component – only keeping the existing database schema (1.10 to be exact). We’ve also decided to keep the name “IDO” for now- even if some people tend to generally blame it for bad performance. The Icinga 2 IdoMysqlConnection works as a single connection firing database queries from received framework events. It’s backed by a generic library named “db_ido” which knows about the database table relationships and objects. That way it will be easier to add more databases than just MySQL in the future. Please keep in mind that we might have our own database backend schema somewhere in the future – that’s just not designed or prototyped yet. Having the IDO schema compatibility on-board enables us to offer support for our very own tool stack (Web, Reporting) as well as other addons (NagVis, etc).
Once we finished the config and status data providers for LivestatusListener and IdoMysqlConnection in late summer we’ve bashed our head up against ‘historical’ issues. While it seems reasonable to just subscribe to all the event signals happening and stash them into logs or a database backend, there were 2 problems with that: The native livestatus module in 1.x loads icinga.log and archives into core memory and filters that to receive log and statehistory data. That method is pretty ugly but given no other backend (a database) it seems reasonable – at least for the moment. The second problem was more generic – Icinga 2 will have cluster and distributed features built-in. This seriously changes the way data is being processed or synchronized, different to just a single standalone instance.
Furthermore we weren’t satisfied with the built-in replication on cluster setups that require all instances to ‘see’ each other. In an ideal world that would work, but in custom environments with different network segments, policies, firewalls it will become pretty easy “does not work ™”. Since the historical data depends on the events a cluster with more than just one node would generate, we’ve had several meetings and lots of discussions ever since. The most significant change to the existing implementation is now that we have a) sort of bin-log replay messages to keep nodes in sync b) if enabled, a master node syncs config parts to the slave node (which must accept configs from that specific node). They’re trusting themselves based on ssl certificates and local configuration. It’s even possible to have a “slave” node as dumb checker (like a mod_gearman worker) and sending data back to more than one “master” node. Since that’s a special case, the more interesting part is that on connection loss, the slave instance will continue to do what it’s gotta do (check services) and on connection restore, it will sync all the data back to the master.
You may find similar approaches with the LConf distributed mechanism done with 1.x, or mod_gearman workers but now built-in into Icinga 2. At the time of writing the code is being run on our test labs in pre-production, standalone and also clustered using live config converted by the script provided by Icinga 2. Regarding check delegation – the instances will calculate by themselves who’s responsible for the current check. Meaning to say, there’s only an internal cluster heartbeat getting all nodes alive, but no broadcast message on “who could do the check?”. That can be stacked into more levels of clustering, leaving the instance in the middle as proxy node.
If you’re asking yourself – the connection between nodes can be established bi-directional. A slave may connect from the local DMZ to the master listening on e.g. port 7777, or a master may connect to a slave listening on e.g. port 7778. And it’s not really master/slave by default – those identifiers originate from the setup you’re designing. They may just be Icinga 2 instances working in a distributed cluster setup. Or, and that’s one of the “oh, this is fucking great” ideas during discussions, an Icinga 2 agent shall be an Icinga 2 mini-instance with local configuration and its own check scheduler, either sending back the data or being polled by the Icinga 2 “master” node. The main difference to the clustered instance would be the non-persistent connection. That sounds great but isn’t implemented yet 🙂
Under normal circumstances you won’t need the cluster distributed magic because Icinga 2 will use all available cpu cores to fire checks, notifications, eventhandlers in a non-blocking, multi-threaded way. So basically if you’re up to like 100k checks now, you should try the config conversion script and fire up a single Icinga 2 instance (and please don’t use a vm with 1 cpu and 1gb ram then 😉 ).
Given the way checks and data can be distributed is now working as redesigned, we’ve again evaluated our to-dos. We’re planning to release 0.0.3 at OSMC in ~3,5 weeks and therefore we need to bring everything into shape while finishing up with the rest. The IDO database backend now received all the historical data required for Web/Reporting, plus a timer-based cleanup mechanism in order to delete table entries older than XYZ_age (e.g. notifications, downtimehistory). Documentation will be provided using markdown, and packages (deb, rpm) are in the making as well. The installation is being revamped and updated – to note, there will be “i2enfeature compat” to automatically enable/disable feature components within Icinga 2 (thanks Debian and Apache for the idea). Furthermore, we’ve done a final review of the configuration itself and corrected some parts with duplicated information or wrong naming (i.e. a host’s hostcheck is now a host’s check, and a notification requires user_groups, not just groups)
In the end, Icinga 2 0.0.3 will feature

  • StatusDataWriter and ExternalCommandListener (former Compat) and CompatLogger (former CompatLog) for status.dat/objects.cache/icinga2.cmd/icinga.log for Icinga 1.x Classic UI
  • IdoMysqlConnection and ExternalCommandListener for Icinga 1.x Web
  • IdoMysqlConnection for Icinga 1.x Reporting, NagVis
  • LivestatusListener for addons using the livestatus interface (history tables tbd)
  • PerfDataWriter for graphing addons such as PNP/inGraph/graphite (can be loaded multiple times!)
  • CheckResultReader to collect Icinga 1.x slave checkresults (migrate your distributed setup step-by-step)
  • template focused configuration language, backed with the conversion script for Icinga 1.x configuration
  • base feature-set of Icinga 2 as monitoring framework itsself

 
We need feedback from your tests with

  • Installation (Source, Packages tba)
  • Config conversion script
  • Configuration syntax and handling (create your own from scratch)
  • Features available in Icinga 1.x we might have missed

Our presentation on OSMC will include a live hands-on with all the mentioned features, especially to note the cluster functionality. Join us there for discussion & beer, or hop onto the community channels where developers lurk as well 🙂
Thanks for testing in advance!
 

Feature Preview: IcingaMQ for High Performance Large Scale Monitoring

In the lead up to version 1.6, we have been busy working on a new feature to improve Icinga’s performance in monitoring larger environments – IcingaMQ. Based on the ZeroMQ messaging library, IcingaMQ will improve core functionality in three areas.

At the moment, Icinga can monitor thousands of hosts and services with the help of add ons  such as mod gear man or DNX. IcingaMQ will remove the need for such external plugins, integrating service check distribution to multiple instances into the Core project.

Secondly, IcingaMQ will make NEB (Nagios Event Broker) events available to external clients. What previously was multiple NEB modules docked onto the core, IcingaMQ will be able to offer in one – improving Icinga’s and in particular IDOUtils’, performance.

In addition, API queries will no longer need to be run through the command pipe as IcingaMQ will enable Icinga Core to perform API commands itself. This will allow information such as confirmation of check execution, to be gathered too.

All in all IcingaMQ will make distributed environments faster, more efficient and easier to manage. It will be offered as an optional NEB module to add to the core. So you will be able to specify checks to be made via IcingaMQ and user access to the event and API interface in the configuration.

Cheers to our Core team for the ongoing work, and thanks must also go to the active community behind ZeroMQ, for their compact library that is helping us to make IcingaMQ.

Fingers crossed, we may have a prototype out as early as the next release. We hope you look forward to the coming developments as much as we do.