probable lvm thin_pool exhaustion

85 views
Skip to first unread message

mai...@maiski.net

unread,
Mar 9, 2020, 5:14:13 PM3/9/20
to qubes-users
Hello folks,

I have Qubes 4.0 Release standard luks + lvm thin pool.
After a sudden reboot and entering the encryption pass, the dracut emergency shell comes up.
"Check for pool qubes-dom/pool00 failed (status:1). Manual repair required!"
The only aclive lv is qubes_dom0/swap.
All the others are inactive.

step 1:
from https://github.com/QubesOS/qubes-issues/issues/5160
lvm vgscan vgchange -ay
lvm lvconvert --repair qubes_dom0/pool00

Result:
using default stripesize 64.00 KiB.
Terminate called after throwing an instance of 'std::runtime_error'
what(): transaction_manager::new_block() couldn't allocate new block
Child 7212 exited abnormally
Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed (status:1). Manual repair required!


step 2:
since i suspect that my lvm is full (though it does mark 15 g as free)
i tried the following changes in the /etc/lvm/lvm.conf
thin_pool_autoextend_threshold = 80
thin_pool_autoextend_percent = 2 (Since my the pvs output gives PSize: 465.56g Pfree 15.78g, I set this to 2% to be overly cautious not to extend beyond the 15 G marked as free, since idk)
auto_activation_volume_list = to hold the group, root, pool00, swap and a vm that would like to delete to free some space
volume_list = the same as auto_activation_volume_list

and tried step 1 again, did not work, got the same result as above with qubes_swap as active only

step 3
tried lvextend -L+1G qubes_dom0/pool00_tmeta
Result:
metadata reference count differ for block xxxxxx, expected 0, but got 1 ...
Check for pool qubes-dom/pool00 failed (status:1). Manual repair required!



Since I do not know my way around lvm, what do you think, would be the best way out of this?
Adding another external PV? migrating to a bigger PV?
I did not play with backup or achive out of fear to loose any unbackuped data which happens to be a bit :|

Thanks in advance,
m

Ulrich Windl

unread,
Mar 10, 2020, 3:24:06 AM3/10/20
to qubes...@googlegroups.com, mai...@maiski.net
>>> <mai...@maiski.net> schrieb am 09.03.2020 um 22:14 in Nachricht
<32730_1583788450_5E66B1A2_32730_943_1_20200309221408.Horde.6suQ5c39eHZROYAnW9Jp
w...@webmail.df.eu>:
> Hello folks,
>
> I have Qubes 4.0 Release standard luks + lvm thin pool.
> After a sudden reboot and entering the encryption pass, the dracut
> emergency shell comes up.
> "Check for pool qubes-dom/pool00 failed (status:1). Manual repair
> required!"
> The only aclive lv is qubes_dom0/swap.
> All the others are inactive.
>
> step 1:
> from https://github.com/QubesOS/qubes-issues/issues/5160
> /lvm vgscan vgchange -ay
> lvm lvconvert --repair qubes_dom0/pool00/
> Result:
> /using default stripesize 64.00 KiB.
> Terminate called after throwing an instance of 'std::runtime_error'
> what(): transaction_manager::new_block() couldn't allocate new block
> Child 7212 exited abnormally
> Repair of thin metadata volume of thin pool qubes_dom0/pool00 failed
> (status:1). Manual repair required!/
>
> step 2:
> since i suspect that my lvm is full (though it does mark 15 g as free)
> i tried the following changes in the /etc/lvm/lvm.conf
> thin_pool_autoextend_threshold = 80
> thin_pool_autoextend_percent = 2 (Since my the pvs output gives PSize:
> 465.56g Pfree 15.78g, I set this to 2% to be overly cautious not to extend
> beyond the 15 G marked as free, since idk)
> auto_activation_volume_list = to hold the group, root, pool00, swap and a
> vm that would like to delete to free some space
> volume_list = the same as auto_activation_volume_list
>
> and tried step 1 again, did not work, got the same result as above with
> qubes_swap as active only
>
> step 3
> tried /lvextend -L+1G qubes_dom0/pool00_tmeta/
> Result:
> /metadata reference count differ for block xxxxxx, expected 0, but got 1
> ...
> Check for pool qubes-dom/pool00 failed (status:1). Manual repair required!/
>
> Since I do not know my way around lvm, what do you think, would be the best
> way out of this?
> Adding another external PV? migrating to a bigger PV?
> I did not play with backup or achive out of fear to loose any unbackuped
> data which happens to be a bit :|

For some reason I have a "watch -n30 lvs" running in a big terminal. On one of the op lines I see the usage of the thin pool. Of course this only helps before the problem...

But I thought some app is monitoring the VG; wasn't there some space warning before the actual problem?


mai...@maiski.net

unread,
Mar 10, 2020, 7:49:58 AM3/10/20
to Ulrich Windl, qubes...@googlegroups.com

Quoting Ulrich Windl <Ulrich...@rz.uni-regensburg.de>:
>
> For some reason I have a "watch -n30 lvs" running in a big terminal.
> On one of the op lines I see the usage of the thin pool. Of course
> this only helps before the problem...
>
> But I thought some app is monitoring the VG; wasn't there some space
> warning before the actual problem?
>
>

Of course there was. But atm of failure there was none visible, which
does not excuse that beforehand I had created 3 new and downloaded a
minimal template for fun, so... why can it be simple, when it can be
complicated


> --
> You received this message because you are subscribed to the Google
> Groups "qubes-users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to qubes-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/qubes-users/5E67408E020000A100037B31%40gwsmtp.uni-regensburg.de.



brenda...@gmail.com

unread,
Mar 10, 2020, 1:34:11 PM3/10/20
to qubes-users
On Tuesday, March 10, 2020 at 11:49:58 AM UTC, maiski wrote:
Quoting Ulrich Windl <Ulric...@rz.uni-regensburg.de>:
> For some reason I have a "watch -n30 lvs" running in a big terminal.  
> On one of the op lines I see the usage of the thin pool. Of course  
> this only helps before the problem...
>
> But I thought some app is monitoring the VG; wasn't there some space  
> warning before the actual problem?

Of course there was. But atm of failure there was none visible, which  
does not excuse that beforehand I had created 3 new and downloaded a  
minimal template for fun, so... why can it be simple, when it can be  
complicated


Qubes 4.1 (in development) has added a warning (in addition to the current lvm space usage warning) for lvm metadata usage above a threshold. 4.0 doesn't have the metadata nearing full warning, and that's what tends to cause these types of thinpool issues.

In addition to the warning, Qubes 4.1 is also doubling (vs. the lvm default value) the amount of space set aside for lvm thinpool metadata which will substantially reduce the chances of ever hitting this issue under 4.1.

Brendan

PS - above is not helpful for recovering this machine, of course. However, recovery from this can be very difficult and even after recovery not guaranteed to recover all the data. The Qubes devs are aware of this and very much want to avoid these issues in the next release.

mai...@maiski.net

unread,
Mar 10, 2020, 9:34:17 PM3/10/20
to brenda...@gmail.com, qubes-users

Quoting brenda...@gmail.com:
>
> Qubes 4.1 (in development) has added a warning (in addition to the current
> lvm space usage warning) for lvm metadata usage above a threshold. 4.0
> doesn't have the metadata nearing full warning, and that's what tends to
> cause these types of thinpool issues.
>
> In addition to the warning, Qubes 4.1 is also doubling (vs. the lvm default
> value) the amount of space set aside for lvm thinpool metadata which will
> substantially reduce the chances of ever hitting this issue under 4.1.
>
> Brendan
>
> PS - above is not helpful for recovering this machine, of course. However,
> recovery from this can be very difficult and even after recovery not
> guaranteed to recover all the data. The Qubes devs are aware of this and
> very much want to avoid these issues in the next release.

Hm, yes, this does not help:/
What about running fstrim on the ssd and try booting again?
@brendan: I've seen that you had some thoughts about lvm in some postings,
so would you care to elaborate/brainstorm on the situation i
described, you know, every input is valuable right now :)
>
> --
> You received this message because you are subscribed to the Google
> Groups "qubes-users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to qubes-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/qubes-users/54c72ea0-be0b-4414-bc01-7fee409b7c73%40googlegroups.com.



brenda...@gmail.com

unread,
Mar 10, 2020, 9:44:27 PM3/10/20
to qubes-users
On Wednesday, March 11, 2020 at 1:34:17 AM UTC, maiski wrote:

Quoting brend...@gmail.com:
>
> Qubes 4.1 (in development) has added a warning (in addition to the current
> lvm space usage warning) for lvm metadata usage above a threshold. 4.0
> doesn't have the metadata nearing full warning, and that's what tends to
> cause these types of thinpool issues.
>
> In addition to the warning, Qubes 4.1 is also doubling (vs. the lvm default
> value) the amount of space set aside for lvm thinpool metadata which will
> substantially reduce the chances of ever hitting this issue under 4.1.
>
> Brendan
>
> PS - above is not helpful for recovering this machine, of course. However,
> recovery from this can be very difficult and even after recovery not
> guaranteed to recover all the data. The Qubes devs are aware of this and
> very much want to avoid these issues in the next release.

Hm, yes, this does not help:/
What about running fstrim on the ssd and try booting again?
@brendan: I've seen that you had some thoughts about lvm in some postings,
so would you care to elaborate/brainstorm on the situation i  
described, you know, every input is valuable right now :)


 TBH, I wouldn't know what to do. Ran into a similar problem with 4.0 a long while back and just reinstalled because it seemed insurmountable at the time.

I've been reducing my main pool usage and manually monitoring the metadata to avoid the situation with my current install, waiting for 4.1 to become stable before moving to it.

Chris Laprise (tasket) would be a better resource, if he's willing to jump in.

Brendan

mai...@maiski.net

unread,
Mar 10, 2020, 9:50:56 PM3/10/20
to brenda...@gmail.com, qubes-users
I remember also running into a similar issue waaaay back, I adjusted a
param in grub/xen.cfg, i do not remember, and told lvm to surpass its
set threshold for maximum filled pool size so it can boot, but yeah,
this is not the issue here... Nonetheless thank you for the quick
answer!

> --
> You received this message because you are subscribed to the Google
> Groups "qubes-users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to qubes-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/qubes-users/0c13f771-7bdc-4246-8459-216cb5dabbe2%40googlegroups.com.



Reply all
Reply to author
Forward
0 new messages