Over the past year I have been deeply involved in the nitty gritty details of choosing, designing, building, deploying and managing a new backup infrastructur at work. It has been a very educational experience.
Our old backup platform consisted of various different tools and technologies and the resulting spaghetti bowl got more and more difficult and time consuming to manage. So we set out to find a suitable replacement, something that was both manageable and scalable while providing optimal data protection for us and our customers.
As you may have guessed from the title of this post, we chose Bacula. After building a demo setup and having 2 day onsite proof-of-concept we closed a deal with Bacula Systems primo 2010.
We did investigate other products ( IBM TSM, Symantec NetBackup and CommVault Simpana ) but they could not compete with what Bacula Systems could deliver.
Our Bacula infrastructure is somewhat extensive and consists of 9 servers and over 25 disk arrays. All our backups are going offsite and the uplink between our primary datacenter and our backup location consists of 3 x 10 Gb fiber links. Backup data is stored on 3 large ZFS storage pools, spanning 375 individual disks.
Let me cough up some facts :
We are currently backing up close to a 4 digit number of machines, containing a multitude of Operating Systems (Windows, Linux, Solaris, BSD) and over 300 MS-SQL instances. In terms of size, a standard 2 week rotation amounts to ~190 TB of data containing ~490.000.000 files.
In daily use we are seeing network-to-disk speeds in excess of 600 MB/s per Bacula storage server, totalling ~2 GB/s in terms of throughput when running 300 simultaneous jobs while maintaining an acceptable level of CPU utilisation on our storage servers.
So far we have run 96.594 jobs. Over 90% of all jobs have run without issue and out of the remaining 10 percent 3 (!) jobs failed because of an error in Bacula itself. The remaining jobs failed either due to operator or client side error.
In order to facilitate the backup of MS-SQL instances we wrote our own tool which Bacula starts before running the file level backup. This also enabled us to apply compression before storing the databases on disk for Bacula to pick up. The average compression ratio is well over 80% so it has been worth the effort.
We are spending a mere ~30 minutes a day to oversee all Bacula operations, including checking for and restarting all failed jobs. Integration af Bacula into our control panel, billing system, etc has been a very straightforward process and consisted mainly of writing a simple XML API against our Bacula Catalog servers.
Admittedly, Bacula lacks some of the bells and whistles of it’s competition but for us that has not been a problem. It is reliable, scalable and makes overall management a breeze. The advice and support that we have received from Bacula Systems has been more than worth the money we spend.
IBM, Symantec and CommVault should start looking over their sholder … the bat is coming to get them !
Part II of the series can be found here