Today we will be talking to Noah Hilverling, or “N-o-X” on Github.
He’s been with the Icinga team for almost 4 years now and is responsible for a whole lot of bug fixes and shiny features in Icinga 2.
So tell me something about yourself…
Could you just quickly describe your role in the Icinga DB project?
I started into the project as a programmer with small tasks and eventually took over the maintainer role.
I mostly managed tasks, planned and implemented ninety percent of the Golang daemon.
Which tasks were you responsible for in Icinga DB?
When it comes to coding, mostly implementing the daemon, so the Icinga DB itself, of which the configuration synch was the biggest part.
Apart from that, a lot of work went into implementing the feature in Icinga 2.
I also took care of managing the issues, reviews and milestones as the maintainer.
What do you need to get going?
To work effectively I need a quiet place and some good music.
And I really appreciate my second screen which my to do list, a browser for research, and my mail client.
Let’s talk Icinga DB!
Which technologies did you use?
We used Golang, the database systems Redis and MySQL, and JSON for the communication between Icinga, Redis and Icinga DB.
How would you describe the communication and management over the course of the project?
Communication flow started with Eric, our team lead, who had the vision for the Icinga DB.
Then Alex and I split up the tasks we got among ourselves and pinballed issues and reviews.
All together I would say it worked quite well.
So, how does it work?
What exactly is it Icinga DB does?
The main purpose is to read all data from Redis and write the important stuff into MySQL.
First Icinga 2 fills Redis with data, which is then handled by Icinga DB.
The Icinga DB uses checksums to identify updated values in order to put those into the MySQL database.
This applies for configuration, history, and current object states – so all data you want to use or display in Icinga Web 2.
Best practice would be to run the daemon on your Icinga 2 host, because Redis is a in memory database and you don’t want to limit its performance with network communication latency.
What was the initial plan for the task?
We didn’t have a lot of technical concepts on what we wanted it to look like.
So we had the idea that we have Redis on the Icinga side and we have MySQL on the Icinga Web side – and we need to synchronise those two with our Golang daemon.
We also wanted to make it super dynamic and have code that works in every scenario and with different kinds of inputs.
How does the final version differ from the original plan?
It turned out that when you have everything super dynamic, you start to have overcomplicated code, which we did not need in the sort of fixed environment we have in our Icingaverse – because we simply don’t have changing types of configuration.
We have our hosts and services, we have check commands, and this bit of Icinga will not really change anyway – so we did not need such a dynamic way of synchronising that.
In the end we went with more static, hard coded object types, which is more stable and performant.
Which challenges did you have to overcome?
I think the biggest challenge was to synchronise everything in parallel. The reason why we needed the Icinga DB in the first place was performance, after all.
In our prototypes the performance increase didn’t turn out to be as much as we hoped it would be, so we went and tweaked it as much as possible. A lot of work went into that.
What we did was separate what we could. Not every configuration object needs to wait for, say, a host. Not every config type needs to be loaded in order.
We are now trying to run things parallely wherever we can. For example, have a special worker that does nothing but encode JSON. Another one that handles nothing but config objects. And yet another one for history. A special worker for object states. …you see where this is going.
So every worker is just doing its thing in parallel with the others, which is a huge boost.
If Icinga DB was a house which part would the daemon be?
The foundation. It’s the biggest part and the base of the project – the rest of the house rests on it.
You said something about the config sync?
Tell me a bit about this, what’s the general use for it in the project? Which features does it add?
The config sync is the main task of the daemon. When Icinga 2 starts, it renders all available config in memory objects, which then get written into Redis.
Those objects are the ones that need to be displayed in Icinga Web 2.
The reason why we need the daemon – specifically the config sync – is to transfer and synchronise the data from Redis to MySQL.
This is all, so that you can see what your infrastructure looks like and visualise it properly.
Can you go into a bit more detail about how it works?
So for a quick overview, it looks something like this…
[ starts scribbeling ]
Sync Operator Insert Worker Preparation Worker Execution Worker Update Worker Comparison Worker Preparation Worker Execution Worker Deletion Worker Execution Worker
From top down the “Sync Operator” controls the three workers directly below.
It fetches IDs from Redis and MySQL and generates a delta.
Depending on the delta it feeds the IDs to the respective worker.
The sub workers insert, update and delete – equivalent the operations there are in MySQL.
All of these workers have more sub workers (or children) as well.
The insert workers’ children would be the preparation worker and the execution worker.
The preparation worker reads from Redis, the insert execution worker inserts into the DB.
Those two work in parallel.
The update workers’ children are the comparison worker, the preparation worker and the execution worker.
The comparison worker is responsible for comparing the checksums from Redis and MySQL. This way it figures out whether anything needs to be updated.
The preparation worker does the reading from the MySQL.
If it finds something that needs updating, this data is sent to the execution worker.
The deletion worker only has an execution worker as a child.
It doesn’t need to read any data, since we’re not writing any data. It just takes the IDs from the operator and deletes them.
What were the things you struggled with the most?
Like everywhere else in Icinga DB, performance was the biggest issue.
Another thing was that we wanted to create a codebase that isn’t too complex. It’s a big task and it can get out of control very fast.
All in all the goal was to make it simple but still as performant as possible. This took lots of planning and the reworking of those plans.
If Icinga DB was a house which part would the config sync be?
The living room.
It’s a big part of the house, most family members spend time there and the daemon spends a lot of time on this.
A few final questions!
If you had the time and resources, what else would you add or improve?
Write more tests. Tests are good. And add more user and developer documentation.
What we have is not bad, but documentation is always something you can improve on, no matter how much you invested in it already.
Did you enjoy working on IcingaDB?
Yes, absolutely! It was really fun. It was a new project that we started completely from scratch.
For me it was also a new experience to fill the maintainer role and do the planning myself. I would love to do that again!
What I love about this job is, that you have a lot of challenges you don’t know how to tackle at first, and the process of figuring it out, and the feeling when you do, is amazing.
What did you learn that could be of use in future projects?
It’s useful to write tests early on.
Starting early with taking notes and having design and concept documents ready proved to be very helpful as well.
It’s important to write a lot – and by that I do not mean code but everything surrounding it!
What will the future of Icinga DB look like in your opinion?
Currently it’s in a pretty good state, there are still a few different aspects we want to improve and we are still missing Postgres support.
As long as Icinga doesn’t change the way it handles config objects, I’d say it’s pretty much finished and there is not too much new stuff to come for now.
Do you recall your first / last line of code?
Haha, well, the answer to that question is probably not that exciting:
One of the first lines I wrote was for the configuration of the logging library.
Concerning the last change I made was … uhmmm … I think I did that yesterday, was fixing a small issue with a ‘int’ to ‘float’ data type conversion.