However, as you can see at the link: thin_trim can only be invoked against a *deactivated* thin pool. This is not possible in Qubes normal operation as dom0 lives *within* the thin pool with the other VMs** and therefore the pools cannot be deactivated.
There seems to be no other tool available to issue discards directly against a thin pool's unallocated space***. `lvremove` does not do so, no matter what the documentation states. Effectively this means that some contents of past VMs, including disposable VMs, are sitting around in the unallocated storage space.
I have a feature request open for an explicit blkdiscard call against LVs before lvremove is invoked, which addresses many but not all cases of remnant data. It would also be good hygiene to opportunistically issue discards against the unallocated thin pool space on a regular basis (e.g. weekly, on boot perhaps).
If I were to brainstorm a bit, there is presumably a point during boot (before pivoting away from the initial ramdisk) or during shutdown (after unmounting dom0's root) where one could potentially invoke the thin_trim command (if you ensured that it and associated libraries were accessible at that point). Any guidance on how one would do so?
Brendan
** might be an argument for dom0 to live in a separate pool?
*** other than explicitly filling the thin pool up to ~99.9% with random data (directly or via a VM-attached LV), then issuing discards against that before/during removal...which is not an efficient approach for time and wear reasons. I also tried adding a linear LV to the VG, to run blkdiscard again... but the linear LV cannot encroach on the thin pool's allocation, so that wasn't helpful.
On Monday, June 17, 2019 at 11:16:05 AM UTC-4, Chris Laprise wrote:
> I would fully expect lvremove to issue discards, if lvm is configured
> for it. Did you try changing /etc/lvm/lvm.conf so that "issue_discards =
> 1" ?
I've got that set (also in dom0 & luks as well). As per my comments near the end of this github issue [ https://github.com/QubesOS/qubes-issues/issues/5077 ], do some experiments putting /dev/urandom data into files in VMs, blkdiscard them, watch the discards occur down at the hardware layer using the one liner below running in dom0, then put more data into the VMs, shutdown the VMs and then delete the VMs using Qubes tools and watch for discards (I don't get any in the latter case).
The one-ish liner to run in dom0 and look for jumps in the discarded blocks count (also works within non-dom0 VMs, but less useful there for this):
#!/bin/bash
watch -n 1 -d \
"if [ -d /sys/block/sda ] ; then pat=sd ; else pat=xvd ; fi ; sync;echo --DISCARD TOTALS--;cat /sys/block/\$pat*/stat|awk 'BEGIN {print \"DRIVE IOS QMERGES SECTORS MSWAITS\"} {printf \"%5i %-8s %s %15s %11s\\n\",NR,\$12,\$13,\$14,\$15}'"
> > ** might be an argument for dom0 to live in a separate pool?
>
> I think Marek wanted in the future to move dom0 root to a regular
> (static) lv in the same volume group.
This makes sense tome.
> But also, it should be possible
> for a user to perform this switch manually by creating a static lv,
> copying dom0 root contents to new root, and then changing the necessary
> grub entries. If you don't rename the old thin lv and name the static lv
> 'root', then you'll have to change fstab as well.
Yeah, I'm playing a bit fast and loose with my daily qubes system, so not ready yet to do that. :) Might experiment a bit on another build.
As it is now, thin-pools can't be shrunk so one would need to do some gymnastics to get it working on an existing system.
> At that point, you might be able to deactivate any configured pools from
> /lib/systemd/system-shutdown as this gets executed right after
> filesystems become read-only. Although a systemd unit might get you
> closer to the point you want to be.
I will investigate, thanks.
> But definitely try lvm.conf "issue_discards = 1" first. ;) There is also
> the "thin_pool_discards" setting.
As above already doing so. Let me know if discards happen on VM delete for you. If it *does* then color me confused, as I've examined the stack a few times and I *DO* get discards out to the hardware from large file deletes inside VMs, but not during deletion of a VM that had large amounts of new data added to it.
> > *** other than explicitly filling the thin pool up to ~99.9% with random data (directly or via a VM-attached LV), then issuing discards against that before/during removal...which is not an efficient approach for time and wear reasons. I also tried adding a linear LV to the VG, to run blkdiscard again... but the linear LV cannot encroach on the thin pool's allocation, so that wasn't helpful.
>
> The way to zero-fill would be with a thin lv. If you think of thin pool
> as a filesystem, you would zero-fill an fs by creating a file. Just
> don't max out the pool completely, or you may end up with an un-bootable
> system.
Yeah, with all that you've said about thin-pool fragility (e.g. filling up the metadata), I'm a little skittish there.
Also, zero fill seemed to be not increasing the data in an experiment, whereas /dev/urandom was increasing the data...but slower than I'd like and I don't want to get bored and overshoot past 99% full. Or at least my memory was that urandom and zero gave me different results last week. Let me double check.
Brendan
Yes, this is why turning on discards in fstab inside VM templates really only impacts large file deletes inside the VMs, as the pool chunk sizes is >= 64KB while the VM block size inside the VMs is much smaller. Ensuring that one deletes all of the directories/files (big and small) *then* performing fstrim before shutdown can partially address this.
Also having Qubes tooling explicitly issue a blkdiscard against devices *before* performing lvremove partially addresses this as well (for deleted or disposable VMs), for the persistent LVs.
Partial solutions aren't great for this topic, though.
> If I wanted to solve this issue relatively quickly, I'd first consider
> moving to btrfs. Its unified approach is more likely to process discards
> completely.
I keep getting mixed messages on the stability of btrfs. :)
Brendan