This is going to be my last post in the series. There are a few loose ends to tighten up and some more questions to answer. I’ll also explain some of the missing pieces to our puzzle.
Our Bacula deployment is actually really simple. We are only using the most basic features that Bacula has to offer. It has a ton of advanced functionality so please bear in mind that we currently are only using the basics. I know some rather small deployments that are far more complex than ours. What sets us apart from most others is sheer volume.
If nothing else, my series of posts should be seen as a testimony the the power, performance and scalability that those basics provide — not as an example of what is possible in terms of functionality.
Some people have questioned my motivation for writing these posts, insinuating that I am in some way affiliated with Bacula Systems or perhaps am a “hired gun”. Well, I am neither. I work for a Bacula Systems customer were I have been chief bat and my fascination over what Bacula has to offer in comparison with its competition is my sole reason for sharing my experiences.
How many enterprise software vendors do you know where you can drop an email to the lead developer and get an authoritative answer within a few hours ? How many times have you stood with a critical problem at hand and battled imbecile first-level support drones ? How many truly enterprise grade backup systems are open source with all the goodness that comes with it?
We have a lot of support contracts at work, including some with the largest software companies in existence and very few match the level of skill, understanding, support and general helpfulness that we have gotten from Bacula Systems. The company may be small but they rock when it comes to backup in general and Bacula in particular. Period.
I have also been asked why I did not simply post my paper instead of blogging bits and pieces at a time. Well, the paper was written during work hours so I would need permission from my employer to do so … and since I have resigned my position with the company that would most likely have gotten me nowhere. The blog posts were all written in my spare time and I have been cautious not to release too much specific information, such as entire config files, test results, graphs, and so on to avoid possible confrontations.
But lets get on with the interesting stuff …
Now that we have completed the migration to Bacula and have seen it work very reliably for a couple of month we are ready to move forward and embrace the more advanced features it has to offer. Our backup storage needs have an annual growth rate of ~25% so something has do be done if we want to avoid drowning in disk arrays. The new 48 core Dell R815 storage servers that will replace our current storage heads have enough excess horsepower to allow us to enable compression on our ZFS filesystems. Testing has shown that good results can be achieved, but the results vary from host to host. There will most definitely be changes in our schedule since there really is no need to do weekly full backups any longer.
There have been some question from readers about restore speed. Well, a bare metal restore of a Windows 2003 DC with 50 GB data takes less than an hours start to finish. A Windows 2008 DC slightly longer. Single file restores take minutes. We have a special 10 Gbit drop which is reserved for restore purposes should the need arise to restore a really fat client. And yes, our storage servers can deliver that speed.
Other have asked what we use for monitoring. We use Webacula and our own control panel to do all Bacula monitoring. The main reason for choosing Webacula over Bweb was the simple fact that we have a lot more people in house that know PHP than Perl.
Monitoring is done by our Windows team. The majority of servers we have are running Windows so it seemed to be a logical course of action. It still is a mystery to me why Windows VSS can work flawlessly one day and then fail for obscure reasons the next. In practice this approach has worked well. Even though the Windows team only has a very limited knowledge of UNIX / Linux they have no problems adding new clients, modifiying configuration files, monitoring jobs and fixing most problems by themselves. Heck, they even prefer bconsole over Webacula for certain tasks.
The only drawback so far is that Windows people after a few years apparently loose the ability to read and understand more than a couple of lines of text which can be a challenge when the answer to your question is “buried” below a whole 70 line job report.
I have also been asked if we were missing stuff in Bacula and the answer is a resounding yes.
From the top my head :
We really,really, really need a deduplication friendly volume format in Bacula and preferably one that works well with ZFS. The potential savings in both space and speed (yes, dedup can act as an I/O accelerator) are enormous in installations as large as ours.
Yes, I know about base jobs and we have some very good reasons to prefer doing dedup directly on our storage servers instead.
Our setup requires us to add devices to our storage servers all the time. Currently there is no way to simply reload the storage daemon and restarting it can be disruptive. We have worked our way around it but it is a hack. A proper implementation would be preferable.
Increment the FD non-fatal error counter from a runscript
It is often desirable to indicate that some runscript (SQL backups, etc) has failed without either failing the entire job or resulting to grep’ing after certain output in the job logs. I would love to have the ability to increment the non-fatal FD error counter from a runscript to indicate that an error has occured, for example by using a predetermined exit code. This would further ease our job management.
All in all we are more than happy with Bacula and Bacula Systems. We ended up with a very solid and scalable platform that was surprisingly easy to integrate into our existing business tools and outperformed our expectations. But most importantly it solved the problem at hand to ours and our customers satisfaction. Saving a 6 digit amount of Euro’s was not exactly bad, either