Icinga 1.10.2 Bug Fix Release

icingacoreIcinga 1.10.2 is out for download and is our prompt response to potential security issues. In particular, this release is recommended for users who allow public access to their Classic UI.

Aside from this, Icinga 1.10.2 irons out Oracle compiling and upgrading in IDOUtils and adds a few minor config related fixes to the Core. See our change log for more details.

Thanks to all users who have contributed their patches and bug reports, and special kudos goes to DTAG Group Information Security for alerting us to the security threats. Our development tracker is always open and we look forward to receiving your continued feedback.

CHANGE LOG

CORE

  • Add an Icinga syntax plugin for Vim #4150 – LE/MF
  • Document dropped options log_external_commands_user and event_profiling_enabled #4957 – BA
  • Type in spec file on ido2db startup #5000 – MF
  • Build fails: xdata/xodtemplate.c requires stdint.h #5021 – SH

CLASSIC UI

  • Fix status output in JSON format not including short and long plugin output properly #5217 – RB
  • Fix possible buffer overflows #5250 – RB
  • Fix Off-by-one memory access in process_cgivars() #5251 – RB

IDOUTILS

  • IDOUtils Oracle compile error #5059 – TD
  • Oracle update script 1.10.0 failes while trying to drop nonexisting index #5256 – RB

Thanks to ImmobilienScout24

Recently Team Icinga released a new core version 1.9.2 with significantly increased performance in the IDO component. All team members invest a lot of their time to make Icinga better and better, but sometimes we are really glad to receive some external support. In this particular case ImmobilienScout24 helped us by sponsoring the necessary changes to the code base via a project supporter.

In detail, Icinga 1.9.2 features a number of changes to the IDO database module, which should greatly improve performance when using MySQL or PostgreSQL as a database backend.

  • First of all support for database transactions is now enabled by default. This should reduce I/O on the database server
  •  Where possible, multiple queries are merged into one query (e.g. when updating the is_active column for more than one object)
  •  Thanks to improvements to the object cache a significant number of queries can be eliminated – thereby reducing the number of required database round-trips
  • Reduced network round-trips by removing redundant pings before each query

logo_is24With about 10 million unique visitors each month ImmobilienScout24 is the biggest and most popular German web marketplace for real estate. The company has its headquarter in Berlin and has about 600 employees. ImmobilienScout24 is on the market for about 15 years and supports Open Source solutions.

Again, we are happy for their support in helping us make Icinga the best monitoring tool on the planet!

Icinga Core Reload Problems addressed in 1.9

There are many reports about the core reload/restart taking ages. This mostly happens when you have IDOUtils and a database backend enabled for Icinga Web and/or Reporting. You may ask “How about dropping the database and use something else?”. Well, that’s not really the point. It won’t solve the problem for everyone out there. Even Icinga 2 is not yet production ready to act as a drop-in replacement.

So, what’s the problem at all? The core doesn’t know about config diffs – newly added or deleted objects. When idomod detects a core reload (re)start, it will dump all the config information to the ido socket. ido2db reads from there and pushes the database insert/updates for the configuration objects. This amount of data may get huge in large setups and takes a while being processed.

The configuration dump needs to be finished before any other updates (status, check history) for data integrity reasons (check #1934 for some deeper thoughts). Rewriting the core for config diffs was an idea, but will cost too much resources right now (the configuration format and parsing is one of the major reasons to develop Icinga 2 from scratch).

During Icinga 2 development, we discussed an idomod connector (Compat IDO) and reusing ido2db from Icinga 1.x. That prototyping unveiled these bottlenecks even more, as Icinga 2 is designed for large-scale systems and may generate 100k service checks in  5 minute interval – ido2db did not have fun back there.

We’ve decided to drop that idea (Icinga 2 will add its own ido compatible layer), but the prototyping added 2 nice enhancements for Icinga IDOUtils 1.9:

  • a socket queue (which does not use a kernel message queue, but a thread to proxy the socket data) #3533
  • transactions around large objects (e.g. a service with groups, contacts, dependencies, etc wrapped as single transaction) #3527

Check module/idoutils/config/updates/ido2db.cfg_added_1.8_to_1.9.cfg in Icinga 1.9 for details. These options features are disabled enabled by default (and tagged experimental) not to harm existing installations, but to allow everyone else to test and use them :-)

Known caveats:

  • ido2db requires more CPU and RAM in order to cache and process data (socket queue only)
  • your database must allow transactions for the database user (transactions only)
  • the insert/update performance still depends on your database – database tuning still required

Below is a small comparison of 4k services test config, Debian 6.0.7 VM, 4 Cores, 2GB RAM, MySQL 5.1.66 without tuning. Icinga adds “Eventloop started…” onto logs, but there’s also a dedicated service check in your sample configuration.

Core Startup with pre 1.9, no options enabled (short log):

Apr 15 18:01:32 sol icinga: Icinga 1.9.0 starting... (PID=4699)
Apr 15 18:01:32 sol icinga: Event broker module '/usr/lib/idomod.so' initialized successfully.
Apr 15 18:01:32 sol ido2db: Client connected, data available.
Apr 15 18:04:22 sol icinga: Event loop started...

 

Core Startup with pre 1.9 and both options enabled (short log):

Apr 15 18:07:35 sol icinga: Icinga 1.9.0 starting... (PID=5336)
Apr 15 18:07:35 sol icinga: Event broker module 'IDOMOD' version '1.9.0' from '/usr/lib/idomod.so' initialized successfully.
Apr 15 18:07:35 sol ido2db: Client connected, data available.
Apr 15 18:07:38 sol icinga: Event loop started...

Apr 15 18:07:52 sol ido2db: IDO2DB buffer sizes: left=5946260, right=0

Apr 15 18:10:04 sol ido2db: IDO2DB buffer sizes: left=10586, right=0

Tip: The buffer size output is logged every ~15 seconds if there’s data waiting. From left (queued socket input) to right (output towards db). If there are no more log entries, the queue is idle and data falling through.

Memory and CPU consumption is pretty moderate in exchange of having the core checking hosts/services directly after event loop started :-)

icinga_1.9_ido2db_socket_queue

Please test those options in your setup (git next snapshot or wait til 1.9 on 25.4.2013), and provide feedback to our community support channels! Thanks in advance for helping make Icinga better :-)

Update 4.5.2013: Core release team decided to mark another milestone with 1.9 and set those enhancements the default without any configuration. They’ve been running for months now on our test platforms and we do not want to miss the enhancements. Latest GIT release branch reflects those changes.

SLA Reporting with Added Precision

If you have upgraded to Icinga Web 1.6 you may already be familiar with the new SLA extension in IDOUtils. The optional module is our response to the old niggle from the community that data written to database could be better used. So we have taken the opportunity to add a table to the database model and fiddle with IDO2DB. The end result is ‘enable_sla’ in your IDOUtils configuration files, which takes events and identifies the periods of scheduled downtime and acknowledgment for more accurate SLA reporting.

You could say, it improves SLA results too, by making it clearer, to what extent a critical event is actually critical or being resolved ;)

Coding and coordination aside, the concept behind the SLA extension is actually quite simple. We added a SLA history table to the database model, which organizes event start and end times, object id, state and state type as well as acknowledgement and scheduled downtime. Then in IDO2DB we added extra logic to write data from the core to the aforementioned table correctly.

At the moment you can view SLA data in the Icinga Web interface’s new tackle cronk, in the form of a pie chart. This is just the beginning though. We hope to integrate SLA history into Icinga Reporting with even more refined metrics.

 

So perhaps in the (not too far) future, you may be able to open up Icinga Reporting and call up a diagram that shows service availability for the year, though only from Monday – Friday, 9 am – 5 pm, discounting scheduled downtimes and acknowledgement periods. That maybe something worth waiting for – or even better, contributing to.

Icinga vs Nagios – a developer's comparison

It’s been nearly 2.5 years/906 days  full of enhancements and refreshing development. Many things happen(ed) in the background which are not visible to everyone. Especially when it comes to comparing Icinga with its predecessor Nagios, it’s always hard to show Icinga in its best light and avoid bias at the same time.

As you might be well aware, I lead the development of Icinga Core and its related sub-projects. So, I am very active in all spaces, seeking new ideas but also patches from the community – in many worlds, not only Icinga, but Nagios, Opsview, OP5, Shinken and many other projects with similar origins in Nagios.

Once in a while people ask for a different kind of comparison. Not a fancy feature comparison designed for managers, but one which takes lost patches from the Nagios developer lists, from Nagios Portal and other Nagios community sources into account and tells them exactly how Icinga is different – from a core developer’s point of view.

You may also be interested to hear another side to the story – patches actually developed in the Icinga space, which have been backported to Nagios. Because we want to give something back to the Nagios community, work side-by-side and share knowledge.

And to stay fair – you can view yet another table which lists work done by Nagios developers that we have ported into Icinga. On a personal note – it’s always a pleasure reworking patches from Andreas Ericsson. Learned a lot in the past year on actual core development :-)

Last but not least, we are trying to add more noticeable configs and also configure options to make life easier for packagers. You are welcome slip into that part of the project too- packagers are always (Ubuntu and Fedora especially) needed!

So here it is, the bug and feature comparison table is divided into following sections:

  • Core
  • Classic UI
  • *DOUtils
  • Docs
  • Configure Options and Configs
  • Backported to Nagios
  • Ported from Nagios and variants

URLs to both the worlds of Nagios and Icinga have been added where available, so you may take a deeper look into the details…

Icinga 1.5.1 released

As you may have noticed, the web developers already released a bugfixed 1.5.1 Icinga Web version (and 1.5.2 is to be announced soon). Now it’s time to fix some Core, Classic UI and IDOUtils related issues – so the core team is releasing 1.5.1 too :-)

Changelog

* core: free memory allocated notification macros right after sending the notification, not in next notification

* classic ui: fix Localization: Form validation message could be improved (thx Mario Rimann) #1849
* classic ui: fix wrong titles in list of scheduled downtimes (thx Mario Rimann) #1848
* classic ui: fix host and service names are not allowed to have a ‘+’ included #1843

* idoutils: idomod: change stacked memory allocation for broker_data IDO_MAX_BUFLEN #1879
* idoutils: fix idomod should log more verbose on errors, asking for a running ido2db process #1885

* spec file: re-add processing headers

As usual, please download from sourceforge and report any bugs or features requests to our dev tracker and/or support channels.