On the topic of disk (block) caching

Or how I learned to love BCACHE / LVM-CACHE

I love all kinds of caching. Mostly I love having a 2TB SSD without the costs of a 2TB SSD.

So let's talk about disk caching! Well more specifically block caching. It seems like some kind of black magic that will end up causing nothing but heartbreak and your data being eaten. It's that kind of thinking that leads people to think that you can only run ZFS if you have ECC. This is mostly due to the fact that if you run ANY caching in a write-back mode then you could loose data. Just like if you have a random bit flip in your memory you could loose all your data during a ZFS scrub.

Let's look at this a different way, you could be killed by falling object from space. More likely you could be killed the next time you get into a car or cross the street. Provided you keep good backups there's really nothing to worry about when it comes to block caching and for that matter running ZFS without ECC memory.

There's lots of bench marks and reasons for choosing bcache or dm-cache (lvm-cache). In my experience bcache feels faster. It's also the easiest to setup. If all you're looking for is something to make your computer/laptop load faster or you're gonna try something crazy like caching iscsi then bcache is where it's at. On the other hand if you need flexibility then lvm-cache (dm-cache) is what you want. I'll save my rants about ZFS & L2ARC for another post.

So what is this caching stuff anyway? Well that's both a simple and complicated question to answer. The simple answer is it allows you to store data you're using in a faster place and the data you're not using in a slower place. There's a huge number of benefits to caching. The modern web would be broken without it. There's lots of different technologies that go into it so rather than trying to understand them all let's just boil it down to this. You need/want speed and storage but each time there's a steam sale you feel like your wallet has been beat like a rented mule. So buying multi-terabyte SSDs is of your budget. Let's talk about your options.

Not all SSDs are created equally. There's a few different technologies to choose from each with its own advantages and disadvantages. What it boils down to is for the vast majority of people looking to make their systems go faster buying should avoid buying a low end SSD and aim for anything more than 120GB and less than 1TB. I've found that for desktops / laptops 250GB is the sweet spot. The other thing to consider is the kind of flash that's in the drives. Your low end SSD have TLC chips. This means it's cheap, faster than your average USB flash drive, and that's about it; don't use these drives. The next kind of drives are MLC/eMLC. These are the drives you'll want. I highly recommend the Samsung Evo line of drives. However on a server you'll want to avoid them and go with the Evo Pro or Intel drives. As the turbo write cache will really screw with your iops. There is a third kind that use SLC chips, however that's enterprise level and very expensive. So we'll avoid those.

So how do I setup caching? We'll this mostly has to do with what you're trying to accomplish. So I'll go over how to setup bcache first as it's the easiest. These directions are going to be Arch Linux centric as that's my distro of choice. I do know that Ubuntu and Fedora have the bacahe tools on their live images and I assume that once bcache is setup then the installer will run with it. Having not tested it myself ymmv aka caveat emptor. Now on to the good stuff!

BCACHE - Bcache stands for block cache. Like lvm-cache aka dm-cache it caches blocks. It works by letting you choose a backing device aka your HDD and a caching device aka your SSD. You can put any file system you want on it and you've got your multi-terabyte SSD. The default options are sane and there's almost no risk of loosing data even if the power get's yanked. In my experience bache handles power loss a lot better than lvm-cache. You can set it write-back mode and get write speeds of SSD too. There's very little danger of actually loosing data unless your SSD dies. It's built into the kernel and the vast majority of distros have the tools to use it right out of the box.

You need the bcache tools installed from the AUR to create the storage pair; you only need one command make-bcache -B /dev/sdx1-C /dev/sdy2. This creates the backing device, caching device, and registers them. From this point you can treat it like any other block device. I recommend placing it in write-back mode unless you're paranoid about the possibility of loosing data. To place it in write-back mode you'll need to run echo writeback > /sys/block/bcache0/bcache/cache_mode. To use it as your root device can either compile the tools on the the live image environment or set up the storage on another system prior to booting into the live image environment. Either way the choice is yours, personally I use the Antergos live image to do everything. If you're choosing to use it as your root device there are a few caveats to be aware of. Firstly you cannot have your /boot on the bcache; secondly you should not put bcache on top of other block layers; thirdly it doesn't work well with other caching. So luks should sit on top of bcache if you're using encryption and cachefilesd will cause a complete system lock. That being said while it technically works (I've had no problems with my setup) I can't say I recommend caching iSCSI via bcache.

For more information on bcache check out https://bcache.evilpiepirate.org and https://wiki.archlinux.org/index.php/Bcache

LVM-CACHE - LVM-Cache aka dm-cache like bcache allows you to cache blocks. lvm-cache is a wrapper for dm-cache that makes it easier to use. It's harder to setup since you need to setup spaces on the SSD for the LVMs to be cached on. Most people looking to accelerate their laptops/desktops might be better off avoiding this. That being said since most distros like Fedora and Ubuntu use LVM by default to manage storage this might be easier to implement than bcache for some people and It does have some other real advantages to bcache. The main advantage is the same reason you would use LVM in the first place. To add/remove disks, resize your disks, etc. You have A LOT more control. I use LVM-cache for servers. Being able to dedicate cache for VM storage, LUNs, etc. is huge!

Assuming that you have your LVM setup adding caching to it is fairly simple. First add the disk to your pool pvcreate /dev/sdy then extend your volume group vgextend dataVG /dev/sdy lastly specify your cache lvcreate --type cache -L 200G -n dataLV_cachepool dataVG/dataLV /dev/sdy. Lvm-cache differs from bcache in this way because it needs space for the metadata and data where as bcache doesn't give a crap and just looks at blocks. While you can manually specify meta data sizes it's default and recommended setting is 10%.

For more information checkout https://wiki.archlinux.org/index.php/LVM and https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

L2ARC / ZIL - My best recommendation is to avoid this. If you're running ZFS and need more iops then give it more RAM and disks. The ZIL is helpful for databases and anything in the L2ARC still has to be tracked by the ARC. L2ARC could be helpful for VMs and a few other things but not as helpful as just adding more RAM. Another consideration is at the time of writing this only open Solaris has trim support for ZFS. Meaning using SSDs in your pool isn't the best long term storage idea and having a few 15k SAS drives will just run better. Also if you're using ZFS use mirrored vdevs; you'll thank me later.

All things considered don't take my word as gospel, there's plenty of other much smarter people than I out there. This is just my experience with caching and I hope this post will give you some ideas and encourage you to try them. If you're simply looking for the best performance use bcache. While lvm-cache is just as fast for most things bcache still edges it out.