Large scale disk-to-disk backups using Bacula, Part III

My last two posts have sparked a lot of inter­est — my mail­box has been over­flow­ing with lots and lots of ques­tions from all over the world. This post will pro­vide more details about our Bac­ula infra­struc­ture while the next post will dis­cuss our Bac­ula configuration.

Let me give the impa­tient read­ers a sum­mary : We have applied  absolutely no tun­ing to Bac­ula itself. A lot of time and effort has been spend on actu­ally try­ing to under­stand how it works and how this relates to the resources that Bac­ula was given in our envi­ron­ment. This has been key to our suc­cess with Bac­ula.

Nev­er­the­less it is kind of amaz­ing that Bac­ula is able to orches­trate such vast resources in a very effi­cient manor. Kern and his crew know their stuff, that’s for sure.

I rec­om­mend that you go and fetch your favourite bev­er­age before con­tin­u­ing;  it’s a rather long post.

Now, we have a total of 9 servers pow­er­ing our Bac­ula infra­struc­ture. Three Direc­tors, three Stor­age servers and three MySQL Cat­a­log servers. As such each Direc­tor “owns” a ded­i­cated Stor­age and Cat­a­log server. Our Direc­tors run Linux and both our Stor­age and Cat­a­log servers run Solaris.

Let me give you an overview of how our Bac­ula envi­ron­ment is constituted.

Backup net­work

All our machines have access to a spe­cial net­work which is phys­i­cally sep­a­rated from our var­i­ous pro­duc­tion net­works and 100% ded­i­cated to backup and restore oper­a­tions. We are talk­ing 100+ switches, ded­i­cated NIC’s, fibers, cables, etc. Each rack has a ded­i­cated switch for doing backup oper­a­tions. These switches each have a 2 Gbit uplink to a 30 Gbit net­work back­bone (3 core switches with redun­dant 10 Gbit uplinks each), that also is entirely ded­i­cated to backup and restore oper­a­tions. The back­bone is con­nected to our remote backup loca­tion via a num­ber of 10 Gbit fiber links that in turn pro­vide 10 Gbit net­work con­nec­tiv­ity to each of our Bac­ula stor­age servers.

Net­work oper­a­tions are highly effi­cient and are con­tin­u­ously mon­i­tored for both through­put and latency. We treat this net­work exactly like our pro­duc­tion net­works; solid design, a high degree of redun­dancy  good qual­ity equip­ment and usable mon­i­tor­ing are essen­tial to its oper­a­tion. Both net­work through­put and latency are cru­cial parts of our Bac­ula performance.

Stor­age Servers

All our Bac­ula stor­age servers run Solaris and all Bac­ula vol­umes are stored on three large ZFS stor­age pools. I have to admit that I am a Solaris fan­boy. I could rant for hours about the won­ders of ZFS , dtrace, COMSTAR, cross­bow and so on but I will reserve this for a future post. For those amongst you that are unfa­mil­iar with ZFS, you really should check it out. ZFS is a mod­ern filesys­tem invented by Sun that allows you to turn your hard­ware into extremely effi­cient and highly scal­able stor­age machines. It offers unprece­dented data pro­tec­tion and a wealth of advanced features.

ZFS

  • is  always con­sis­tent on disk (repeat after me : No more fsck. No more fsck. No more fsck.)
  • check­sums every block writ­ten to disk to pre­vent silent data corruption
  • has self heal­ing capabilities
  • utilises a pooled stor­age model
  • has a pipelined I/O engine (mas­sive scalability)
  • has no arbi­trary lim­its (sup­ports unlim­ited filesys­tems, files, links, direc­tory entries, file­size and so on)
  • offers advanced fea­tures (caching, inline com­pres­sion, inline encryp­tion, zero cost snap­shots, clones, repli­ca­tion, etc.)
  • offers a very sim­ple admin­is­tra­tion model

I have seen a lot of bad, bad things hap­pen to our ZFS stor­age servers in the last three years. We encoun­tered defec­tive back­planes, bad cables and mem­ory sticks, faulty con­trollers and lost a lot of disks all with­out loos­ing as much as a sin­gle byte of data. If you care about your data you want ZFS.

Now, every stor­age server has a 120+ disk RAID-Z2  pool com­prised of mul­ti­ple 7 disk wide vdevs,  build entirely of 7200 RPM near­line SAS disks. We have three PERC 6/E con­trollers per Stor­age server, which allows us to attach up to 9 MD1000 disk arrays. All arrays have dual I/O mod­ules and mul­ti­pathing enabled.

Every Bac­ula client has its own ded­i­cated ZFS filesys­tem, allow­ing us to con­trol com­pres­sion, quota, reser­va­tion, encryp­tion and so on on a per client basis.

I spend a con­sid­er­able amount of time pro­fil­ing the Bac­ula stor­age dae­mon under pro­duc­tion load.  Dtrace and Solaris in gen­eral were a phe­nom­e­nal aid in dis­cov­er­ing stuff like I/O pat­terns and I/O sizes, check­ing for signs of pos­si­ble lock con­tention and so on. Solaris pro­vides a lot of trans­parency com­pared to other Oper­at­ing Systems.

We used a lot of this infor­ma­tion to run sim­u­la­tions, estab­lish­ing base­lines for test­ing, etc and dis­cov­ered a num­ber of things :

  1. Adding devices to the stor­age dae­mon increased par­al­lelism and thus performance.
  2. No major lock­ing issues were detected when run­ning with 100 active devices (Kern had some con­cerns caus­ing our investigation).
  3. The Bac­ula stor­age dae­mon gen­er­ates asyn­chro­nous sequen­tial I/O.
  4. The record­size (block­size) of 64k per ZFS filesys­tem was most effi­cient for both backup and restore operations.
  5. 16 GB DRAM per stor­age server was a suf­fi­cient ZFS ARC size
  6. CPU util­i­sa­tion was acceptable.

The Solaris ver­sion of the Bac­ula stor­age dae­mon is sta­ble and per­forms extremely well. We run 100 jobs at a time per Direc­tor — Stor­age — Cat­a­log pair, pump­ing ~2 GB/s into our stor­age servers. CPU uti­liza­tion is < 90% (2 x 3 Ghz Quad XEON’s) and is almost evenly shared by Bac­ula and ZFS / Solaris.

All three stor­age servers are sched­uled to be replaced with 48 core Dell R815’s. Once we have them in place and have con­verted our afore­men­tioned backup net­work to utilise jumbo frames we should get really close to sat­u­rat­ing those 10 Gbit links.

MySQL Cat­a­log Servers

Our Cat­a­log servers run MySQL on Solaris for much of the same rea­sons that our Stor­age servers do. ZFS pro­vides us with hybrid stor­age pools capa­ble of enor­mous IOPS rates, low I/O lat­cency, easy and pro­fi­cient snap­shot­ting of our MySQL data and the abil­ity to match the under­ly­ing ZFS filesys­tem record­size to the acual I/O size used by Inn­oDB. Dtrace pro­vided us with an insane amount of infor­ma­tion about exactly what our data­bases are doing and how con­fig­u­ra­tion changes influ­ence stuff like query response times and so on.

There has been a lot of talk about Bac­ula and MySQL recently and if we cut way all the crap, reli­gion and per­sonal feel­ings we are left with 2 certainties :

  1. The Bac­ula DB schema and asso­ci­ated queries are not writ­ten specif­i­cally for MySQL performance
  2. The Bac­ula devel­op­ers have more expe­ri­ence with Post­greSQL when it comes to larger installations.

How­ever, this does not mean that MySQL and Bac­ula are incom­pat­i­ble. We have some very large jobs (15+ mil­lion files and more) which behave fine when doing both backup and restore oper­a­tions. But hey, per­haps the MySQL force is par­tic­u­larly strong in us ? A search of the bacula-users mail­ing list will pro­vide the curi­ous reader with more infor­ma­tion about our Inn­oDB and ZFS configuration.

All our MySQL Cat­a­log servers have 2 Quad Core CPUs and 32 GB DRAM. The under­ly­ing ZFS stor­age is com­prised of 30 x 750 GB near­line SAS disks in a mir­ror con­fig­u­ra­tion (mir­ror beats RAID-Z when it comes to read­ing data) and a cou­ple of enter­prise grade SSDs to facil­i­tate the offload­ing of syn­chro­nous write oper­a­tions in ZFS. Since almost all MySQL write I/O is syn­chro­nous this effec­tively means that all writes gen­er­ated by MySQL are ter­mi­nated against our SSDs instead of our spin­ning rust.

We eagerly await the GA release of MySQL 5.5 which will do away with most of the per­for­mance and scal­a­bil­ity related chal­lenges. So far MySQL has served us well but YMMV.

This con­cludes the tour of our Bac­ula infra­struc­ture — next up will we a post about our Bac­ula configuration.

Update

Part IV is now online

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *