My series of posts regarding Bacula has resulted in a number of questions about why we have large MySQL databases on ZFS. This post will give you a birds-eye view on exactly why ZFS is so cool for database deployments.
If you do not know what ZFS is you should read this.
Data on ZFS is always consistent-on-disk. No more worrying about physical corruption of your database files even if your server should crash.
Besides physical corruption, ZFS also provides you with ways to combat logical errors. ZFS has a fast, zero-cost snapshot mechanism that combined with a little SQL can ensure consistent snapshots of your database. ZFS snapshots can also be used to drive the replication of your database files via ZFS send / recv.
ZFS also supports read/write snapshots (clones). If you combine this with something like Zones you are able to clone your entire DB zone in a very short time and apply resource controls if needed. We use this in production and we absolutely love it. We clone for testing, development and also to isolate DB instances from another.
ZFS allows you to specify and change the recordsize (blocksize) of any given ZFS filsystem, even on-the-fly. Aligning the ZFS filseystem blocksize to the logical block / page size of the database can give you a real performance benefit since you can avoid I/O splitting. Having multiple ZFS filesystems with different recordsizes enables you to do this individually for your raw database files, logs, redo logs, etc.
ZFS is capable of transparent, in-flight compression of your data. If your database workload is IOPS bound and you have CPU cycles to spare you can use compression as a I/O accelerator .Why ? Quite simply because you can read and write more data with each I/O request and thus saving valuable IOPS. Database files usually compress quite well.
The best part about filesystem caching in ZFS is that it can be tuned on a per filesystem basis. Imagine you have 64GB DRAM in your database box and that you are using a file system that populates the OS page cache . Now, when your database requests a block from disk it will get cached twice. Once in the OS pagecache and once in the DB cache thus reducing the amount of usable DRAM by a factor of 2 or more. This is what you can avoid with ZFS since it allows you to control what and where any given filesystem can cache stuff.
Furthermore, ZFS support cache tiering. You can extend the ZFS ARC with SSD disks for both read and write caching. For very little money you can extend the 64GB ultra-fast DRAM with a 160GB way-faster-than-disk SSD. You could tell the filesystem holding your DB files not to cache in DRAM, thus saving the space for caching from the database, but still allow ZFS to cache blocks on your SSD. This could considerably accelerate all reads that have not been cached by the database. Besides that, blocks that are evicted from DRAM will also go on SSD before vanishing. Imagine being able to cache your entire workload on SSD disks. We do and it rocks!
Database I/O is usually highly random and hard to predict. ZFS allows you to disable file-level prefetching since it makes little sense in a DB environment. No need to spend resources on something that has little value.
ZFS allows you to do synchronous I/O against SSD disks. Since most database write I/O is synchronous this means that you can offload most of your database I/O to SSD. This considerably reduces latency and increases throughput. Perfect for databases.
I hope this has provided some insight into the benefits of ZFS as a filesystem for databases. For the technically inclined, there is a ton of low-level information about ZFS out there … feel free to indulge.