If fiddling with email to SMS gateways, managing on-call schedules and keeping track of monitoring issues sounds like a nightmare, you may be happy to learn that you can outsource it.
A web-based notification management SaaS, PagerDuty offers an all-inclusive solution that provides alerts in all forms, on-call duty scheduling and ticketing. Icinga integration is simple and secure thanks to SSL communication between Icinga and PagerDuty’s cloud servers.
PagerDuty is a fee-based service – good for users who have little hardware onsite or no desire to set up their own SMS infrastructure. For peace of mind they boast 24/7 support as well as distributed and failsafe infrastructure, that is replicated across multiple data centers too.
Requisites: Perl and a relatively up-to-date web browser
- Alerts of all sorts (SMS, email, phone – automatic retries, multiple and international numbers possible) with automated escalation
- User-friendly web interface to manage notification and escalation settings as well as on-call rosters
- Incident tracking for an overview of monitoring issues
- Mobile incident management apps (iOS, Android) to acknowledge, resolve or reassign issues, also with push notifications
- Integration with popular ticketing, chat and other monitoring related systems
Version compatibility: Both Icinga 1.x and Icinga 2 integrate into PagerDuty through a plugin – a simple integration guide is provided. A community contributed Puppet module and Chef recipe are available.
More info & documentation: www.pagerduty.com
While surfing around for ideas to improve business monitoring in Icinga, we stumbled upon Bischeck and it’s creator Anders Haal. So we thought we’d share what we got to learn about teaming Icinga up with Bischeck for dynamic and adaptive thresholds – straight from the maker’s mouth:
What is Bischeck?
Bischeck is an open source project with the goal to provide dynamic and adaptive threshold logic for Nagios based monitoring solutions and forks such as Icinga.
Until now, Nagios based monitoring has only supported static thresholds. With static thresholds we are limited to define one maximum or one minimum value to express the threshold that is valid in every situation for the service that is monitored. To have one single value that is correct for each day of the week and for every hour of the day is not very likely. The risk is that we will get too many or too few alarms and there is even some service metrics that we will not be able to set a threshold due to their dynamic behavior. This is especially true when monitoring application and business related services that follow the dynamics of business load.
Dynamic thresholds suited to business load visualised in PNP4Nagios
What can you do with Bischeck?
With Bischeck you have a solution that allows for dynamic and adaptive thresholds to complement the traditional static threshold solution. So dynamic and adaptive thresholds give you the ability to:
- Define different threshold profiles depending on the time of the day and day of the week or month: We can set thresholds for any service where we expect some increase and/or decrease in the metric during the day.
- Define thresholds based on historical data: This enables us to express different kinds of threshold baselines. For example, we can specify that the expected threshold at 12:00 should not be 5% higher or lower than the calculated average of the measured metrics at the same time from the previous 5 days. Bischeck supports several mathematical functions to calculate thresholds at run-time.
- Set multiple thresholds rules for the same service: E.g. for a file system utilization service we can combine the classic 90% file system utilization with a threshold that checks how quickly the utilization changes by using historical data to calculate a utilization delta over some time period.
- Use data collected for one or multiple service as an input in the calculation of the threshold for a different service: This adaptiveness is excellent when you have some service metrics that drive the business process. For instance the number of visits to your web shop is likely to have some affect on the number of expected orders, CPU utilization, application threads, etc. This means we can set the thresholds in relation to data that matters and not just a single value.
- Create virtual services: A virtual service would be a metric that is not possible to measure at a single source, but can only be calculated from other metrics. This can typically be ratios, aggregations, etc that can not be measured as a single metric by itself.
How does it work?
Bischeck can collect metrics in several ways, e.g. execute SQL queries, query Icinga/Nagios data over Livestatus, execute normal Icinga/Nagios check commands but bypassing state and just retrieve the performance data, etc. Both collection and threshold classes is simple to extend and customize.
Bischeck integrates with Icinga and Nagios by sending passive checks. Passive checks are supported over NSCA, NRDP and Livestatus. Bischeck data can also be sent to Graphite and OpenTSDB for graphing visualization.
Bischeck is written in Java and runs as a standalone daemon. It is “supported” on all major Linux distributions. It has also been tested on Windows, but installations scripts are currently not supplied for Windows. For more on how Bischeck works and its architecture see our documentation.
Where have you seen Bischeck in production environments?
DHL Freight in Sweden was our first “user”. It has been in production at DHL for over 2 years. They use monitored data like shipments orders to calculate the threshold in next step of the process e.g. monitoring how many of these shipments orders are geographically coded for delivery and truck loading.
DHL has been a great sponsor to the project and you can read more about what they use it in our testimony page. We now start to see some more companies testing it and hopefully we can disclose some more interesting production cases in the near future.
Why did you decide to create Bischeck?
Like so many developers, especially in the open source space, you develop solutions because you need some functionality and you can not find it. The pleasure is of course when you see other people that have the same need can gain from what you have done.
Any future development plans?
Absolutely. We will soon release 0.4.3 with just some minor fixes and improvements. At the same time we are working on the next major release that we think will be our 1.0.0. What we currently are targeting as the major feature is threshold baselining. With threshold baselining you will use the historical data that Bischeck collects and apply mathematical filters to the data to get a comparative threshold baseline. This will minimize configuration and hopefully a threshold that is very adaptive to the production environment. The benefit is of course less configuration, but more important, a better threshold management that only triggers adequate alarms. This feature will demand some changes to our historical cache storage and currently we are leaning against Redis which seems to work well for the our time series data model. Feedback and ideas are of course appreciated.
What’s the coolest thing about Bischeck for you?
I think the coolest thing is that it solves the problem that it was meant to solve. Hopefully the rest of the world will find dynamic and adaptive thresholds as cool and useful as we do.
Requisites for installation: See Bischeck quick start guide and documentation.
Version compatibility: All Bischeck versions (0.4.2 at the time of writing) with all Icinga versions
More information: www.bischeck.org
Though Icinga is great for monitoring the availability and status of hosts and services, it’s often good to reflect the monitored performance and plan ahead. Being able to view performance data in the form of graphs allows trends and potential problems to be detected early. The following tools are just a handful of popular open source graphing addons compatible with Icinga.
Providing data collection and display, PNP4Nagios stores plugin output in a Round Robin database via RRDtool and features a user interface based on Kohanna and JQuery. Once set up it is very user friendly. To reduce storage load, it consolidates old data by averaging values. However this results in lower resolution graphs for time intervals further in the past.
Requisites: Perl, RRD tool and PHP.
Features: Template-based graphs; view all services of a host; define graph objects and time intervals freely; zooming; mouse-over thumbnail graphs in Icinga Classic & Icinga Web; CSV, JSON, XML export.
Version compatibility: All PNP4Nagios versions (v0.6 at time of writing) with all Icinga versions, Classic and Web.
More info & documentation: www.pnp4nagios.org
Graphite stores any type of time-series data and generates real-time graphs out of them, making it ideal for performance trending. Carbon, a Twisted daemon receives data and stores it in a Whisper database. Similar to RRDtool, older data loses resolution quality. On the flip side it offers high resolution, per-second precision for new data and allows for irregular data intervals. Finally graphs are generated via Cairo on the fly and displayed in a Django based web application. Data collection is achieved through third party tools.
Requisites: Python and Pycairo; Django and django-tagging; Twisted and zope-interface; fontconfig and a font package; a WSGI server and web server.
Features: Scalable system, generate graphs on demand – metrics need not be preconfigured; define graph objects and time intervals freely; URL API with JSON, CSV and PNG output.
Version compatibility: All Graphite (v0.9.10 at time of writing) with all Icinga versions. However a 3rd party tool to transport Icinga performance data to Graphite is necessary, e.g. via script or forwarder such as Metricinga and icinga-to-graphite. A tool to assist Graphite integration into PNP4Nagios is also available, though unstable.
More info & documentation: http://graphite.wikidot.com
Unlike most graphing tools, inGraph stores performance data in a relational database, supporting MySQL, PostgreSQL and SQLite. It comes with a check_ingraph plugin to retrieve relevant data for graphs and displays them with the help of NodeJS. Compared to the RRD based tools, inGraph offers detailed graphs regardless of the age of the performance data and interval definition after initiation of the monitoring process too.
Requisites: Curl and xmlrpc enabled PHP, Apache2, Python with python-devel and python-setuptools, SQLAlchemy, MySQL or PostgreSQL python drivers.
Features: Comments; template-based graphs; view all services of a host; define graph objects and time intervals freely; zooming; mouse-over thumbnail graphs in Icinga Classic & Icinga Web; CSV and XML export.
Version compatibility: All versions of InGraph (v1.0.1 at time of writing) and Icinga Classic; Icinga Web 1.5.0 or newer.
More info & documentation: www.netways.org/projects/ingraph
NagiosGraph offers self-contained data collection and display. Performance data is collected from plugin output and stored as RRD files. Graphs are then generated and managed mostly through CGI scripts via the RRDtool perl interface, RRDs.
Requisites: RRDtool recommended; CGI and RRDs perl modules; GD perl module recommended.
Features: Parameter-based graph generation; view all services of a host and vice versa; define graph objects and time intervals freely; zooming within graph; mouse-over thumbnail graphs in Icinga Classic & Icinga Web; CSV, XML export.
Version compatibility: All NagiosGraph (v1.4.4 at time of writing) with all Icinga versions.
More info & documentation: http://nagiosgraph.sourceforge.net
Every now and again, admins can make their life easier. Below are just a couple configuration helpers that we know to work well with Icinga. All are open source and free to download!
This PHP based tool saves data to MySQL database from which it exports the finished configuration files via local file or remote access. NagiosQL offers a structured interface with a side menu of configuration objects grouped by supervision, alerting, commands, specialties and other tools.
Requisites: Web server, a PEAR module, MySQL database, PHP and a few extensions.
NagiosQL also comes with installation assistant to check for missing packages.
Features: Host and service templates, cloning, auto backup of configuration files, consistency checks, syntax verification, user group view restrictions, configuration importer, translations in various languages, support for large and distributed environments… and more.
Version compatibility: All NagiosQL versions (latest v3.1.1) with all Icinga versions.
More info & demo system: www.nagiosql.org
Similarly, NConf is based on PHP and saves data into a MySQL database. It was designed with large-scale distributed environments in mind, and presents configuration objects in a menu broken down into basic, additional, server and administration groups.
Requisites: Apache web server, PHP, MySQL and Perl.
Just to be sure, NConf even comes with a web-based pre-installation checker.
Features: Host and service templates, cloning, multi-modification, host dependency viewer (define and view parent-child relationships), multiple authentication modes, CSV file importer, configuration importer…. and more.
Version compatibility: All NConf versions (latest v1.2.6) with all Icinga versions.
More info & demo system: www.nconf.org
LConf stores configuration objects on a LDAP server and exports text config files that run independently when in operation. It uses any LDAP browser as an interface, which is then structured by the user. So configuration objects can be organised into a tree view by host or application groups, location or instances for example..
Requisites: OpenLDAP server, Perl with LDAP libraries, and LDAP utils for corresponding operating system.
A small shell script installer is included to help with installation and distribute LConf scripts to the desired directory.
Features: Drag’n’drop configuration (if used with LDAP Admin browser), search/replace function, host and service templates, host and service dependencies, inheritance, custom variables, configuration import script, support for large distributed environments… and more.
Version compatibility: All LConf versions (latest v1.1.1) with all Icinga versions.
This addon also includes an ‘LConf for Icinga’ sub-project which offers a special module for integration into the new interface, Icinga Web 1.3+.
More info: www.netways.org/projects/lconf
[NOTE (added 21.10.13): The Icingen project is no longer active and links may be outdated.]
A relatively new config tool, Icingen is a bash script configuration generator built just for Icinga. It applies existing Icinga service templates to hosts and host groups.
Requisites: SNMP plugins, Catdoc
Features: host and service templates, passive clusters, supports distributed environments.
Version compatibility: All Icingen versions (latest v0.2.1) with all Icinga versions.
More info: http://icingen.opendoc.net
English language docs are available at http://opendoc.net/icingen/dev/README.txt
If you have a favourite configuration interface or tool for Icinga we haven’t covered, please let us know in the comments below.