I will definitely be looking at it, with 16.04 LTS offering ZFS nearly
out of the box. I've done performance testing with LZ4 compression, and
it's impressive.
I finally had some time to tackle the topic of ZFS as external storage for ganeti and got it working on Debian 8 with Ganeti 2.12 (official Debian packages). For that purpose I used the ZFS extstorage provider mentioned in the ganeti wiki (https://github.com/ffzg/ganeti-extstorage-zfs) but unfortunately this provider is experimental and unmaintained. Also my many attempts to contact the author failed. So I just decided to fork it and will maintain it in my own fork which is available here:
https://github.com/hostingnuggets/ganeti-extstorage-zfs
For now I added and fixed the following:
- detailed logging of all commands (useful for debugging and dev/test)
- fixed "gnt-instance replace-disks" and "gnt-instance activate-disks" which was not working at all
So in case you also would like to use ZFS with ganeti it is possible and I am hoping to put my first ganeti on ZFS cluster out in production very soon. Feel free to also use this extstorage provider test it and comment back if you find any issues. If I remember correctly I few other people in this group like me were very interested in using ZFS with ganeti.
Hi insrc,
Glad this can also help others.
So yes I am using my fork of ganeti-extstorage-zfs with a small production cluster of two nodes running Debian 8 with around 15 instances and this since a few months now. I find the ext-storage backend robust and you can do pretty much everything as with the other backends, it's just up to you to implement it. Below I have listed listed my gotchas and experience based on this few months in production.
1) Two DRBD resources suddenly got disconnected for 2 seconds for no apparent reasons, on the one node I got:
2) ZFS L2ARC does not bring any performance advantages in my specific case of using exclusively SSD disks. I got an additional expensive NVMe SSD disk just for L2ARC but it showed me that it was just a waste of money. In fact I even believe that I had slightly worse performance due to all of the context switches and "l2arc_feed" kernel threads running all the time to populate the L2ARC. Furthermore I first made the mistake of having a huge L2ARC of 100 GB, in that specific case my ARC was rendered useless as it got full by containing pointers to the L2ARC and nothing else. When this happens take great care, your whole systems starts to get very slow and you have a high load on the server. Once it was so bad I had to reboot the server. So if you still want to use a L2ARC make sure it is not much bigger than your ARC, I would say your L2ARC should not be larger than 2-3x your ARC. I had around 8 GB reserved for the ARC and having an L2ARC of 100 GB was just plain naive. Finally I simply recommend not to use an L2ARC at all, which is my case now.
3) If you are running Xen and ZFS together there is a rule of thumb (I think I found that one in the Wiki of ZFS on Linux) which says that you should allocate maximum 40% of your hypervisor RAM to your the ARC. As such my hypervisor has now 16 GB of RAM and I have reserved 6 GB of RAM to the ARC. It seems to be a bit of a waste but memory is cheap and with that specific rule of thumb I am on the safe side.
4) There was around 3 months ago a Debian upgrade for the LVM package. This upgrade was a nasty one for me as it for some reasons deleted all the symbolic links to the /dev/zd* devices from the /dev/ffzgvg directory. Luckily enough I found a way through a gnt-instance or gnt-node command to recreate them for me (maybe it was repair-disks, can't remember exactly). So now if I see an new package upgrade for LVM I will be more alert.
5) I never really got to understand why the zvol device numbers increment by 16, such as /dev/zd0, /dev/zd16, /dev/zd32...
6) I have the general feeling that ZFS generates a lot of context switches, on a node with 8 instances I have a daily average of 2k context switches. It is not uncommon to see spikes at 30k context switches. Not sure really what generates these huge spikes but I don't have so many context switches on an equivalent ganeti cluster using hardware RAID.
7) Again regarding ARC I would like to try out switching the ZFS primarycache parameter for a zvol to "metadata" instead of "all". This means the ARC would just contain metadata and not the whole data. I think this makes sense as the data is already cached instance an instance by the OS. I was even thinking of maybe not having any ARC at all. But never tested it.
I would be interested to read about your experience with ganeti and ZFS as storage using my fork of ganeti-extstorage-zfs. Let me know how it works for you once you set it up.
Oh no, Implement things ? I'm not good at that :-/Did you have to implement things that was available by default on the classic storage backend please ?
Good, that means DRBD over Zvol is actually supported :)
Oh ? Now that you mention it, even without an all flash storage, it seems that ZFS L2ARC is pretty much useless according to this discussion on the zfs ML as most of the caching is done at the guest level: http://list.zfsonlinux.org/pipermail/zfs-discuss/2016-September/026318.html
As for the benefit of an SSD SLOG device for a Ganeti cluster with mechanical drive as primary storage, i guess that it would only help guest write performance in a really narrow use case of a cluster with VMs doing a lot of synchronous write...which is only a tiny fraction of the guest i'm running now (basically database VMs)
Hum, thanks to your remarks, i was really naive thinking that ZFS would magically boost the performance of my current cluster and its all mechanical drive storage and i'm just wondering if building a new cluster based on the ZFS ext-storage backend is worth the time and effort it would require :-/
I'm no performance analyst but i guess that the workload of a virtualization node is mostly random reads & writes and if so, ZFS tiered storage feature won't make any difference performance wise and the best and easiest way to boost guest disk I/O operation is to invest in flash drives like you did.
I'll be running KVM but the zfs folks using Xen as hypervisor on the previously linked discussion agree with you
Well, as it was the performance gains (i thought) more than the manageability features (like snapshotting) that made me think about zfs on linux for my new Ganeti cluster, i'm not really sure that'll be brave enough to run it in production now that you make me realize that ZFS won't be of any help on this front :-/Running nodes with hybrid storage (one big VG with all mechanical drives and a small second VG with SSD drives dedicated for the VMs requiring solid I/O perf ) would make more sense in my case i think if i can't afford to go with the all flash nodes setup
Anyway, thanks a lot for you work on the ZFS ext-storage backend and your precious feedback on it.
Hi John, Phil,
this is great. Would you be interested in adding this into the stable-2.16
branch in the main ganeti repo rather than maintaining a separate repo?
We don't have the engineer bandwidth to officially support it at the moment,
but as a starting point we could create a contribs/extstorage/zfs dir, and put
all these scripts there.
Cheers,
Brian.
I'm curious - how do you go about creating a drbd type instance when
the storage is ZFS ? I'm sure the answer is in the documentation, so I'll
go and read it (again - it was a while since I last tried zfs ext storage),
and report here unless someone beats me to it :)
Because the ganeti-extstorage-zfs provider actually "hijacks" the LVM commands you simply create an instance as you are used to do (for example with type DRBD). Nothing changes for for the gnt-instance commands. As an example have a look at the lvcreate shell script which wraps around the LVM's lvcreate binary:
https://github.com/hostingnuggets/ganeti-extstorage-zfs/blob/master/sbin/lvcreate