Why ZFS rocks for databases …

My series of posts regard­ing Bac­ula has resulted in a num­ber of ques­tions about why we have large MySQL data­bases on ZFS. This post will give you a birds-eye view on  exactly why ZFS is so cool for data­base deployments.

If you do not know what ZFS is you should read this.

Data Integrity

Data on ZFS is always consistent-on-disk. No more wor­ry­ing about phys­i­cal cor­rup­tion of your data­base files even if your server should crash.

Snap­shots

Besides phys­i­cal cor­rup­tion, ZFS also pro­vides you  with ways to com­bat log­i­cal errors. ZFS has a fast, zero-cost snap­shot mech­a­nism that com­bined with a lit­tle SQL can ensure con­sis­tent snap­shots of your data­base.  ZFS snap­shots can also be used to  drive the repli­ca­tion of your data­base files via ZFS send / recv.

ZFS also sup­ports read/write snap­shots (clones). If you com­bine this with some­thing like Zones you are able to clone your entire DB zone in a very short time and apply resource con­trols  if needed. We use this in pro­duc­tion and we absolutely love it. We clone for test­ing, devel­op­ment and also to iso­late DB instances from another.

Record­size

ZFS allows you to spec­ify and change the record­size (block­size) of any given ZFS fil­sys­tem, even on-the-fly. Align­ing the ZFS filseystem block­size to the log­i­cal block / page size of the data­base can give you a real per­for­mance ben­e­fit since you can avoid I/O split­ting. Hav­ing mul­ti­ple ZFS filesys­tems with dif­fer­ent record­sizes enables you to do this indi­vid­u­ally for your raw data­base files, logs, redo logs, etc.

Com­pres­sion

ZFS is capa­ble of trans­par­ent, in-flight com­pres­sion of your data. If your data­base work­load is IOPS bound and you have CPU  cycles to spare you can use com­pres­sion as a I/O accel­er­a­tor .Why ? Quite sim­ply because you can read and write more data with each I/O request and thus sav­ing valu­able IOPS.  Data­base files usu­ally com­press quite well.

Caching

The best part about filesys­tem caching in ZFS is that it can be  tuned on a per filesys­tem basis. Imag­ine you have 64GB DRAM in your data­base box and that you are using a file sys­tem that pop­u­lates the OS page cache . Now, when your data­base requests a block from disk it will get cached twice. Once in the OS page­cache and once in the DB cache thus reduc­ing the amount of usable DRAM by a fac­tor of 2 or more. This is what you can avoid with ZFS since it allows you to con­trol what and where any given filesys­tem can cache stuff.

Fur­ther­more, ZFS sup­port cache tier­ing. You can extend the ZFS ARC with SSD disks for both read and write caching. For very lit­tle money you can extend the 64GB ultra-fast DRAM with a 160GB way-faster-than-disk SSD. You could tell the filesys­tem hold­ing your DB files not to cache in DRAM, thus sav­ing the space for caching from the data­base, but still allow ZFS to cache blocks on your SSD. This could con­sid­er­ably accel­er­ate all reads that have not been cached by the data­base. Besides that, blocks that are evicted from DRAM will also go on SSD before van­ish­ing. Imag­ine being able to cache your entire work­load on SSD disks. We do and it rocks!

Prefetch

Data­base I/O is usu­ally highly ran­dom and hard to pre­dict. ZFS allows you to dis­able file-level prefetch­ing since it makes lit­tle sense in a DB envi­ron­ment. No need to spend resources on some­thing that has lit­tle value.

Write accel­er­a­tion

ZFS allows you to do syn­chro­nous I/O against SSD disks. Since most data­base write I/O is syn­chro­nous this means that you can offload most of your data­base I/O to SSD. This con­sid­er­ably reduces latency and increases through­put. Per­fect for databases.

I hope this has pro­vided some insight into the ben­e­fits of ZFS  as a filesys­tem for data­bases. For the tech­ni­cally inclined, there is a ton of low-level infor­ma­tion about ZFS out there … feel free to indulge.

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *