Large scale disk-to-disk backups using Bacula, Part IV

This post will pro­vide more insight into our cur­rent Bac­ula con­fig­u­ra­tion and the under­ly­ing method­olo­gies. Our empha­sis is to keep the time we spend on con­fig­u­ra­tion to an absolute min­i­mum while still main­tain­ing a high degree of flexibility.

We have come up with a way of sched­ul­ing that is extremely sim­ple, yet flex­i­ble. Our dirty secret is Bac­ula “Max Full Inter­val” con­fig­u­ra­tion parameter.

The only backup level that we are run­ning is incre­men­tal. The first job a clients runs is always a full backup and once the “Max Full Inter­val” time has expired Bac­ula automag­i­cally upgrades the next incre­men­tal to a full backup. Resched­ul­ing a full backup is as easy as run­ning a full backup on the desired day and Bacula’s “Max Full Inter­val” will take it from there.

Nifty, if you ask me.

We keep one con­fig­u­ra­tion file per client which con­tains every­thing that is needed. All con­fig­u­ra­tion files are stored in a spe­cial direc­tory on the direc­tor that “owns” the client and are included by the direc­tor upon start / reload. Need to dis­able a client ? Sim­ply rename the con­fig file and issue the ‘reload’ com­mand from bcon­sole. This approach also lends itself very well to tem­plat­ing.  We have a coded a sim­ple tem­plat­ing mech­a­nism that does a search and replace on a few tags and writes the file as %CLIENT_IP%.conf. Stor­ing the data that is unique for a client in lets say MySQL and apply­ing it to a tem­plate is extremely sim­ple and makes the task of automa­tion or apply­ing large changes simple.

Every­thing in our con­fig­u­ra­tion refers to the IP address of a client since this enables us to talk to our cus­tomer data­base, billing sys­tem, etc with a unique denom­i­na­tor. We have tight inte­gra­tion into our cus­tomer con­trol panel which relies heav­ily on this.

Sched­ul­ing and Jobs

Every client runs a full backup once a week and incre­men­tals in between. The main rea­son for this is legacy; our old backup plat­form did this and we tried to move as much of the exist­ing metholo­gies to Bac­ula in order to avoid pos­si­ble delays and con­fu­sion while migrating.

We have defined 2 schedules :

Sched­ule {
Name = “Reg­u­lar“
Run = Incre­men­tal mon-sun at 02:00
}

Sched­ule {
Name = “Late“
Run = Incre­men­tal mon-sun at 03:00
}

All clients (except a few spe­cial cases) run one of the above — the main rea­son for the “Late” sched­ule is to allow  cer­tain Win­dows machines to update and poten­tially reboot before run­ning backup .

Our Job direc­tive looks some­what like this :

Job {
Name = %CLIENT_IP%
Client = %CLIENT_IP%
Type = Backup
Sched­ule = Reg­u­lar
File­Set = %CLIENT_IP%
Max Full Inter­val = 6 days
Pool = %CLIENT_IP%
Mes­sages = Stan­dard
Write Boot­strap = “/var/bacula/working/%CLIENT_IP%.bsr“
}

Pools and Volumes

Every client has his own pool and stor­age device. We keep every job in sep­a­rate vol­umes. Every stor­age devices points to a sep­a­rate ZFS filesys­tem for each client.

This allows us easy con­trol over job reten­tion times, num­ber of reserve vol­umes, recy­cling poli­cies, etc on a per client basis.

Pool {
Name = %CLIENT_IP%
Stor­age = %CLIENT_IP%
Pool Type = Backup
Recy­cle = yes
Recy­cle Old­est Vol­ume = yes
Auto­Prune = yes
Vol­ume Reten­tion = 14 days
Max­i­mum Vol­ume Jobs = 1
Max­i­mum Vol­umes = 16
Label For­mat = “${Pool}-${JobId}-vol“
}

Want to enable longer reten­tion for a client ? Change the Vol­ume Reten­tion and Max­i­mum Vol­umes para­me­ters, issue a reload from bcon­sole and you are good to go.

Keep­ing a “one-job-per-volume” rela­tion­ship has serveral advan­tages; bscan and bex­tract are eas­ier to han­dle in case of emer­gen­cies and bench­mark­ing clearly showed both faster backup and restore times com­pared to the “single-volume-per-client” model.

Stor­age

Stor­age {
Name = %CLIENT_IP%
Address = sd03.xxxxxx.dk
SDPort = 9103
Pass­word = “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx“
Device = %CLIENT_IP%
Media Type = %CLIENT_IP%
}

As men­tioned ear­lier we have a device per client. There has been some dis­cus­sion about this approach but more than 6 month of pro­duc­tion and a lot of test­ing have shown no issues with this. Bench­mark­ing showed clearly that we get a lot more per­for­mance from run­ning 100 con­cur­rent jobs against 100 sep­a­rate devices than run­ning 100 jobs against 1 or more shared devices. The only draw­back to this approach is that we have to add the device con­fig­u­ra­tion to our SD’s but this also is an easy tar­get for tem­plat­ing and automation.

Bac­ula Sys­tems were very help­ful in work­ing out the details of our con­fig­u­ra­tion; kudos to both Arno & Kern ;-)

That’s about it. Next post will be about mon­i­tor­ing and inte­gra­tion of Bac­ula into our com­pany and the final post will talk about some of the miss­ing pieces to our puzzle.

Update

Part V is now online

14 Comments

  • You said:

    ###
    Want to enable longer reten­tion for a client ? Change the Vol­ume Reten­tion and Max­i­mum Vol­umes para­me­ters, issue a reload from bcon­sole and you are good to go.
    ###

    This will update the Pool. This won’t change the reten­tion peri­ods on exist­ing Vol­umes. For that, you need to issue the update com­mand in bconsole.

  • admin wrote:

    Hello Dan.

    Cor­rect. I just checked our man­age­ment soft­ware and we do issue the update com­mand after chang­ing the vol­ume reten­tion automagically.

  • Gregory wrote:

    You set Label For mat = “${Pool}-${JobId}-vol“

    but as you’re set­ting max­i­mum vol­umes to 16, in fact the volume’s name on disk won’t reflect the job id past job 16 right?

  • admin wrote:

    The will incre­ment past 16, but only upon cre­ation. Recy­cling does not rename a vol­ume. We use this only to guar­an­tee that each vol­ume has a unique name which is iden­ti­fi­able via the job logs.

    Job id 63510 for exam­ple uses x.x.x.x-19475-vol.

  • Gregory wrote:

    hmm I think I understand

    as far as I under­stood the documentation:

    If no vari­able expan­sion char­ac­ters are found in the string, the Vol­ume name will be formed from the for­mat string appended with the a unique num­ber that increases”

    you could have used

    Label For­mat = “%CLIENT_IP%-vol-”

    which would have appended a 4 digit num­ber to the vol­ume name

    did I under­stand your ini­tial intent correctly?

  • admin wrote:

    Cor­rect, that would be another way of achiev­ing the same thing.

  • Gregory wrote:

    For the record, about “Max Full Inter val” that’s exactly how I’m using duplicity’s –full-if-older-than time option

    really flex­i­ble

  • admin wrote:

    Yeah, it is. And so is duplic­ity … I have using it for a cou­ple of years ;-)

  • Gregory wrote:

    Btw, why don’t you enable Accu­rate Mode?

    Is it that demand­ing on the fd? (the doc has a note about mem­ory consumption)

    Also, why avoid­ing dif­fer­en­tial back­ups in the first place? It feels like then don’t bring any­thing when back­up­ing to disk com­pared to back­up­ing to tape (which I never did)

  • admin wrote:

    Well, our pri­mary focus was the migra­tion to Bac­ula. We will prob­a­bly enable accu­rate mode when we have had some time to test & benchmark.

    Dif­fer­en­tials are inter­est­ing for long reten­tion times — most of our machines have 2 weeks.

  • Mikael Fridh wrote:

    » — you may be start­ing and stop­ping the SD often to do device man­age­ment. This
    » could dis­rupt oper­a­tions, and we are not plan­ning to add code to the SD to
    » reload its conf file while running.

    > I already scriptet my way out of this one ;-)

    Care to hint at how you scripted this?

    Awe­some arti­cle series, I just revis­ited it again after some time …

  • Using a sim­ple (scripted) com­bi­na­tion of tem­plates for gen­er­at­ing the required con­fig files and bcon­sole to check whether the SD is actu­ally in use you can do your own lit­tle “write-my-conf-files-and-reload” thing.

  • Filip wrote:

    I just started with Bac­ula, so chances are that I over­looked some­thing. But, with a “Max­i­mum Vol­ume Jobs = 1″ and “Max­i­mum Vol­umes = 16″, don’t you end up with Incre­men­tals that lost their Full (a.k.a. ref­er­ence) backup? Ren­der­ing those com­pletely useless?

  • Yes, that is pos­si­ble but the incre­men­tals can still be used since you can extract files from them with­out hav­ing the full backup around.

Leave a Reply

Your email is never shared.Required fields are marked *