fixing LVM corruption, question about LVM locking type in Qubes

1,272 views
Skip to first unread message

qtpie

unread,
Feb 13, 2018, 1:36:50 PM2/13/18
to qubes...@googlegroups.com
Summary: fixing LVM corruption, question about LVM locking

My laptop battery went to 0%. The lid was closed but maybe the laptop
was not suspended, which was could have been the cause of an LVM
corruption which I did manage to fix with the help of numerous obscure
sources.

I'm posting the fix here since I couldnt find a description of this
exact issue anywhere else. I didn't write down the exact wording of the
errors when they happened, I hope the keywords make it clear enough. I'm
not an LVM expert so please correct any errors.

I also have a question at the end about LVM locking and Qubes which I
hope somebody can answer.

Problem description
- At boot I Enter the luks password
- Errormessages (unsure about order in which they appeared):
'dracut initqueue timeout - starting timeout scripts' > repeated 30 times
'you may have to regenerate initramfs'
'/dev/mapper/qubes_dom0-root does not exist'
- Computer enters into a dracut shell

Regenerating initramfs did NOT solve the problem. The following did:

- in the dracut shell, start lvm
- Enter command: "vgchange -ay". Response should be 'Volumegroup x
activated' or something similar
- Enter command: 'lvconvert --repair (logical volume name, in my case
qubes_dom0/pool00'
- Errormessage: 'cannot repair, lvm locking type = 4 (read only)'.
- Change lvm locking type to 1 in /etc/lvm/lvm.conf
- Retry command lvconvert --repair (logical volume name, in my case
qubes_dom0/pool00.
- After the process finishes exit the dracut shell and continue boot
process.

Questions:
- what is the standard LVM locking type for a Qubes install? (in Fedora
it is 1).
- if not 1 should I change it back?
- if 1 what caused it to be changed?
- is powerfailure the probable cause of this issue?

awokd

unread,
Feb 13, 2018, 3:05:28 PM2/13/18
to qtpie, qubes...@googlegroups.com
On Tue, February 13, 2018 6:36 pm, qtpie wrote:
> Summary: fixing LVM corruption, question about LVM locking


> Questions:
> - what is the standard LVM locking type for a Qubes install? (in Fedora
> it is 1). - if not 1 should I change it back?

On my fresh install of rc4, it's set to 1. Per the comment in there,
that's the "standard mode".

> - if 1 what caused it to be changed?

Possibly when something detected LVM corruption?

> - is powerfailure the probable cause of this issue?

I think that's a reasonable guess but haven't seen a power failure cause
LVM corruption before. Not like a have a fleet of servers running it,
though.


thomas...@gmail.com

unread,
Jul 28, 2019, 9:03:03 PM7/28/19
to qubes-users
I've just encountered this issue, and I thought my problems were over once I found this post..

Fyi, previously lvscan on my system shown root, pool00, and every volume but swap as inactive

I followed your instructions, but the system still fails to boot. I've run 'vgchange -ay' and o saw the following printed a number of times.

device-mapper: table 253:6: thin: Couldn't open thin internal device
device-mapper: reload ioctl on (253:6) failed: no data available


I ran 'lvscan' again, and this time some VMS were marked active, but a number (root,various -back volumes, several -root volumes, etc)

Really terrified everything is gone as I had just recovered from a backup while my hardware got fixed, but I don't have the backup anymore.

awokd

unread,
Jul 28, 2019, 9:34:09 PM7/28/19
to qubes...@googlegroups.com
thomas...@gmail.com:
Can't tell which post you're replying to, but I get the idea. The
volumes you are most concerned about all end in --private. If you've
gotten them to the point where they show as active, you can make a
subdir and "sudo mount /dev/mapper/qubes_dom0-vm--work--private subdir"
for example, copy out the contents, umount subdir and move on to the
next. You can ignore --root volumes, since installing default templates
will recreate. If you can't get the --private volumes you want to show
as active, I'm afraid recovering those is beyond me.

awokd

unread,
Jul 28, 2019, 9:48:57 PM7/28/19
to qubes...@googlegroups.com
'awokd' via qubes-users:
Also, if you can't get a --private volume active, try its
--private--######--back equivalent.

Chris Laprise

unread,
Jul 29, 2019, 10:18:54 AM7/29/19
to thomas...@gmail.com, qubes...@googlegroups.com
Did you run "lvm lvconvert --repair qubes_dom0/pool00"? I think that
would be one of the first things you do when the underlying thin device
fails.

If it needs additional space, you could delete the swap lv, then re-add
it later.

--

Chris Laprise, tas...@posteo.net
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB 4AB3 1DC4 D106 F07F 1886

thomas...@gmail.com

unread,
Jul 29, 2019, 10:19:42 AM7/29/19
to qubes-users
Thanks for your response.

I have some which don't show as active - it's looking like some data loss..

Something I am getting when I run
Lvconvert --repair qubes_dom0/pool00


WARNING: Sum of all thin volume sizes (2.67TiB) exceeds the size of thin pools and the size of whole volume group (931.02GiB)

Is this something I can fix perhaps?

Also, I have some large volumes which are present. I've considered trying to remove them, but I might hold off until I get data off the active volumes first..

I've run across the thin_dump / thin_check / thin_repair commands. It seems they're used under the hood by lvconvert --repair to check thin volumes.

Is there a way to relate those dev_ids back to the thin volumes lvm can't seem to find?

Thomas Kerin

unread,
Jul 29, 2019, 10:42:23 AM7/29/19
to qubes-users
Sorry, didn't send to list. See my response to Chris.

---------- Forwarded message ---------
From: Thomas Kerin <thomas...@gmail.com>
Date: Mon, 29 Jul 2019, 3:40 pm
Subject: Re: [qubes-users] fixing LVM corruption, question about LVM locking type in Qubes
To: Chris Laprise <tas...@posteo.net>


Hi Chris,

Yes, I think I tried that once last night.

I notice it creates a qubes_dom0/pool_meta$N volume each time.

Note my earlier post (before I saw yours!) had a weird error about 2.6 tb thin volume sizes exceeds size of pools and volume group

Output this time was:
WARNING: recovery of pools without pool metadata spare LV is not automated
WARNING: if everything works, remove qubes_dom0/pool00_meta2 volumWARNING: Use pvmove command to move qubes_dom0/pool00_meta2 on the best fitting PV 


Currently I have qubes_dom0/pool_meta0, 1, and 2.

Chris Laprise

unread,
Jul 29, 2019, 11:25:50 AM7/29/19
to thomas...@gmail.com, qubes-users
On 7/29/19 10:19 AM, thomas...@gmail.com wrote:
> Thanks for your response.
>
> I have some which don't show as active - it's looking like some data loss..
>
> Something I am getting when I run
> Lvconvert --repair qubes_dom0/pool00
>
>
> WARNING: Sum of all thin volume sizes (2.67TiB) exceeds the size of thin pools and the size of whole volume group (931.02GiB)
>
> Is this something I can fix perhaps?

This is normal. Thin provisioning usually involves over-provisioning,
and that's what you're seeing. Most of our Qubes systems display this
warning when using lvm commands.

>
> Also, I have some large volumes which are present. I've considered trying to remove them, but I might hold off until I get data off the active volumes first..
>
> I've run across the thin_dump / thin_check / thin_repair commands. It seems they're used under the hood by lvconvert --repair to check thin volumes.
>
> Is there a way to relate those dev_ids back to the thin volumes lvm can't seem to find?

If 'lvs' won't show them, then I don't know precisely how. A long time
ago, I think I used 'vgcfgrestore /etc/lvm/archive/<latest-file>' to
resolve this kind of issue.

I also recommend seeking help from the wider Linux community, since this
is a basic Linux storage issue.

And of course, a reminder there mishaps are a good reason to do the
following:

1. After installation, at least double the size of your pool00 tmeta volume.

2. Perform regular backups (I'm working on a tool that can backup lvs
much quicker than the Qubes backup tool).

Thomas Kerin

unread,
Jul 29, 2019, 12:18:50 PM7/29/19
to Chris Laprise, qubes-users
Thanks Chris for your response!


On Mon, 29 Jul 2019, 4:25 pm Chris Laprise, <tas...@posteo.net> wrote:
On 7/29/19 10:19 AM, thomas...@gmail.com wrote:
> Thanks for your response.
>
> I have some which don't show as active - it's looking like some data loss..
>
> Something I am getting when I run
> Lvconvert --repair qubes_dom0/pool00
>
>
> WARNING: Sum of all thin volume sizes (2.67TiB) exceeds the size of thin pools and the size of whole volume group (931.02GiB)
>
> Is this something I can fix perhaps?

This is normal. Thin provisioning usually involves over-provisioning,
and that's what you're seeing. Most of our Qubes systems display this
warning when using lvm commands.


Understood. Thanks!


>
> Also, I have some large volumes which are present. I've considered trying to remove them, but I might hold off until I get data off the active volumes first..
>
> I've run across the thin_dump / thin_check / thin_repair commands. It seems they're used under the hood by lvconvert --repair to check thin volumes.
>
> Is there a way to relate those dev_ids back to the thin volumes lvm can't seem to find?

If 'lvs' won't show them, then I don't know precisely how. A long time
ago, I think I used 'vgcfgrestore /etc/lvm/archive/<latest-file>' to
resolve this kind of issue.


Sorry, I mean, lvs does show them, I'm just wondering what it'll take to show them as active again.

That directory seems to just have files from today!

I also recommend seeking help from the wider Linux community, since this
is a basic Linux storage issue.

I have spent the morning researching, and found a few posts on redhat.com and some other sites describing how to repair the metadata. 

The most common reason seems to be overflowing the metadata partition, though mine is currently around 37%

Others (one qubes user) encountered this after power failure. I shut down cleanly as far as I can tell this was a routine reboot..

And of course, a reminder there mishaps are a good reason to do the
following:

1. After installation, at least double the size of your pool00 tmeta volume.

2. Perform regular backups (I'm working on a tool that can backup lvs
much quicker than the Qubes backup tool).
I definitely agree with both, although seems unlikely to have been point one in this case.

I'm fairly sure the main disk has about 50% free also

Backups are evidently a must.. I've screwed up qubes installs before, but never lost data until maybe now. I know lvm was only adopted in R4.0, everything else has been going so well with this install, but I had only just recovered and organized several old disks worth of data so I'll be gutted if I lost it and won't know why :/

I see a few people posting on the GitHub qubes-issues repo, one guy says 3 people in the past month have had this issue (or at least the same symptoms)

thomas...@gmail.com

unread,
Jul 29, 2019, 12:37:26 PM7/29/19
to qubes-users
Oh, forgive me, no not all vms are present in lvs.

thomas...@gmail.com

unread,
Jul 29, 2019, 2:54:36 PM7/29/19
to qubes-users

My current problem is this error (triggerable by running `lvm lvchange -ay -v qubes_dom0/vm-vault-personal-docs-private`)
It happens with some other affected vms.

    Activating logical volume qubes_dom0/vm-vault-personal-docs-private exclusively.
    activation/volume_list configuration setting not defined: Checking only host tags for qubes_dom0/vm-vault-personal-docs-private.
    Loading qubes_dom0-pool00_tdata table (253:2)
    Suppressed qubes_dom0-pool00_tdata (253:2) identical table reload.
    Loading qubes_dom0-pool00_tmeta table (253:1)
    Suppressed qubes_dom0-pool00_tmeta (253:1) identical table reload.
    Loading qubes_dom0-pool00-tpool table (253:3)
    Suppressed qubes_dom0-pool00-tpool (253:3) identical table reload.
    Creating qubes_dom0-vm--vault--personal--docs--private
    Loading qubes_dom0-vm--vault--personal--docs--private table (253:52)
  device-mapper: reload ioctl on (253:52) failed: No data available
    Removing qubes_dom0-vm--vault--personal--docs--private (253:52)


Btw, I've scanned through the files in /etc/lvm/archive. I wasn't sure if I should follow your advice there as that command requires --force to work with thin provisioning and I've seen warnings about this online. I see the files contain references to the volumes that lvs doesnt show.

I tested thin_check on qubes_dom0/pool00_meta0 volume (created by lvmodify --recover qubes_dom0), and get the following:
examining superblock
examining devices tree
   missing devices: [1, 747]
      too few entries in btree_node: 41, expected at least 42(max entries=126)

Running thin_check on meta1 and meta2 (created by running lvchange --recover qubes_dom0 a further 2 times) doesn't yield anything major:
examining superblock
examining devices tree
examining mapping tree
checking space map counts

I've followed a procedure to get a metadata snapshot (I couldn't directly access _tmeta using normal tools): https://serverfault.com/a/971620
and used thin_dump on the _meta0, _meta1, and _meta2 volumes created by `lvchange --recover`.

I diff'd the tmeta file against the others, and it seems only the first line is different?
1c1
< <superblock uuid="" time="640" transaction="2054" data_block_size="1024" nr_data_blocks="0">
---
> <superblock uuid="" time="640" transaction="2054" data_block_size="1024" nr_data_blocks="1842544">

so the pools tmeta has nr_data_blocks = 0.

Maybe my data is still there but the metadata is wrong?

Chris Laprise

unread,
Jul 29, 2019, 4:01:42 PM7/29/19
to Thomas Kerin, qubes-users
On 7/29/19 12:17 PM, Thomas Kerin wrote:
> Sorry, I mean, lvs does show them, I'm just wondering what it'll take to
> show them as active again.

Normally, the following command should force a volume to become active:

lvchange -kn -ay qubes_dom0/volume


>
> That directory seems to just have files from today!

Avoid them unless the date-time was from when the system was (recently)
still working.
Reply all
Reply to author
Forward
0 new messages