Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1028541: lvm2: LVM filters render server unbootable

51 views
Skip to first unread message

Bastian Blank

unread,
Jan 14, 2023, 7:10:04 AM1/14/23
to
Hi

On Thu, Jan 12, 2023 at 03:18:55PM +0100, Christian Herzog wrote:
> on our storage servers, we employ LVM filters to hide data partitions
> from the OS (since they're iSCSI exported to the frontend
> fileserver). With bookworm, lvm does not activate the root VG when
> filters are in place. So far we have been able to establish the
> following facts:
> - with the default global_filter settings, it does boot

Okay.

> - with global_filter = [ "a|pci-0000:04.*|", "r|.*|" ] (to only
> activate the root VG) bookworm drops into busybox (no root fs
> found)

So it could be that the filter does not apply that early.

> - manually activating the root VG in busybox allows us to boot
> (by copy/pasting the IMPORT{program} lines from the udev rule)

Which one? "pvscan"? That one does not activate anything.

> - replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on
> bookworm with the bullseye versions fixes the problem

What are you replacing exactly? The bullseye version did not include
/lib/udev/rules.d/69-lvm.rules at all, see
https://packages.debian.org/bullseye/amd64/lvm2/filelist.

> - the problem seems to be related (but not identical) to #1018730

This one is about partial VG.

> We've already spent 2 days trying to narrow down the underlying cause as
> much as possible and we'd be happy to provide any additional information
> since for us this is a bookworm deal breaker.

Please provide the output of "pvs", "vgs", "lvs" and the kernel log.

Bastian

--
I'm a soldier, not a diplomat. I can only tell the truth.
-- Kirk, "Errand of Mercy", stardate 3198.9

Christian Herzog

unread,
Jan 16, 2023, 4:40:03 AM1/16/23
to
Dear Bastian,

thanks for picking up on this. We've done some more research, and we now
believe the issue to be upstream, so we've opened a bug report directly with
lvm: https://github.com/lvmteam/lvm2/issues/104
If you check the lvm debug log we posted there, you'll see that it correctly
picks up the filter, finds and scans the right device (sda3), but then rejects
it since at the time of scanning,
/dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part3 (the one in the
filter) doesn't exist. This might be a race condition, since on some reboots
it sees part1 and part2, on some only part1, but never part3.
I could also reproduce the problem in Arch (Fedora, surprisingly, has too old
of an LVM version).

to your questions:

> > - manually activating the root VG in busybox allows us to boot
> > (by copy/pasting the IMPORT{program} lines from the udev rule)
>
> Which one? "pvscan"? That one does not activate anything.
correct, but I don't think that's relevant any longer.

> > - replacing /usr/sbin/lvm and /lib/udev/rules.d/69-lvm.rules on
> > bookworm with the bullseye versions fixes the problem
>
> What are you replacing exactly? The bullseye version did not include
> /lib/udev/rules.d/69-lvm.rules at all, see
> https://packages.debian.org/bullseye/amd64/lvm2/filelist.
correct, I used bullseye's 69-lvm-metad.rules and renamed it to 69-lvm.rules
on bookworm.

> Please provide the output of "pvs", "vgs", "lvs" and the kernel log.
again, I don't think it's relevant, but to help understand the situation
better:

PV VG Fmt
Attr PSize PFree
/dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part3 test-bookworm-vg
lvm2 a-- <2.73t <2.45t

VG #PV #LV #SN Attr VSize VFree
test-bookworm-vg 1 10 4 wz--n- <2.73t <2.45t

LV VG Attr LSize Pool Origin Data% Meta% Move
Log Cpy%Sync Convert
home test-bookworm-vg owi-aos--- 10.00g
root test-bookworm-vg owi-aos--- 23.28g
swap_1 test-bookworm-vg -wi-ao---- 976.00m
var test-bookworm-vg owi-aos--- 9.31g

and

pci-0000:04:00.0-sas-phy0-lun-0 -> ../../sda
pci-0000:04:00.0-sas-phy0-lun-0-part1 -> ../../sda1
pci-0000:04:00.0-sas-phy0-lun-0-part2 -> ../../sda2
pci-0000:04:00.0-sas-phy0-lun-0-part3 -> ../../sda3

Device Start End Sectors Size Type
/dev/sda1 2048 4095 2048 1M BIOS boot
/dev/sda2 4096 1003519 999424 488M Linux filesystem
/dev/sda3 1003520 5860532223 5859528704 2.7T Linux LVM


thanks and kind regards,
-Christian



--
Dr. Christian Herzog <her...@phys.ethz.ch> support: +41 44 633 26 68
Head, IT Services Group, HPT H 8 voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland http://isg.phys.ethz.ch/

Christian Herzog

unread,
Jan 17, 2023, 2:20:04 AM1/17/23
to
Dear Bastian,

update: we were told by upstream that there is a known instability between lvm
and udev-generated symlinks and a devices file should be used instead. So
that's what we're going to do.
In related news, I'll create another bug report shortly, but it's a small one.

thanks,

Bastian Blank

unread,
Jan 17, 2023, 4:30:05 AM1/17/23
to
On Tue, Jan 17, 2023 at 08:13:33AM +0100, Christian Herzog wrote:
> update: we were told by upstream that there is a known instability between lvm
> and udev-generated symlinks and a devices file should be used instead. So
> that's what we're going to do.

I think I actually know what the problem is. pvscan is run during the
udev event handling, esp in the initramfs where no systemd is available
to move that out. Modifications to devices and symlinks are only
applies at the end of the event. So symlinks will always be missing on
the first event.

If you have systemd running, it uses systemd-run, then it is just a race
condition between udev and systemd, which one is faster in finishing.

The only way to fix this is to provide the symlink information to pvscan
in addition to the device itself and let it figure that out.

Regards,
Bastian

--
No problem is insoluble.
-- Dr. Janet Wallace, "The Deadly Years", stardate 3479.4

Friedrich Weber

unread,
Aug 29, 2023, 11:40:05 AM8/29/23
to
Hi,

I'm seeing this bug in a different usecase on Debian Bookworm with LVM
2.03.16-2: multipath is set up, the multipath device is an LVM
physical volume in a volume group with a thin pool. To prevent LVM from
picking up on the multipath components, /etc/lvm/lvm.conf has a
global_filter that rejects the multipath components by matching on their
/dev/disk/by-id symlink paths.

I have replicated this setup in a VM, with the following global_filter
in /etc/lvm/lvm.conf:

devices {
global_filter=["r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1|","r|/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2|"]
}

The relevant portion of /dev/disk/by-id:

lrwxrwxrwx 1 root root 9 Aug 29 16:31
scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb
lrwxrwxrwx 1 root root 9 Aug 29 16:31
scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 -> ../../sdc

After running update-initramfs and rebooting, pvs and other LVM
tooling reports the following warning:

# pvs
WARNING: Device mismatch detected for somegroup/somethinpool_tmeta
which is accessing /dev/sdb instead of /dev/mapper/mpatha.
WARNING: Device mismatch detected for somegroup/somethinpool_tdata
which is accessing /dev/sdb instead of /dev/mapper/mpatha.
PV VG Fmt Attr PSize PFree
/dev/mapper/mpatha somegroup lvm2 a-- <4.00g <2.99g

From reading this report and the now-resolved upstream report, this
seems to happen because the /dev/disk/by-id symlinks are not available
by the time the LVM udev hooks run, so the r|...| filters do not have
any effect. Indeed, if I use r|/dev/sdb| and r|/dev/sdc| instead, run
update-initramfs and reboot, the warning does not appear anymore.
However, being able to use the /dev/disk/by-id paths would be preferable.

With the following four patches applied, I can use /dev/disk/by-id in
the filters and the warning does not appear:

https://sourceware.org/git/?p=lvm2.git;a=commit;h=17a3585cbb55d9a15ced9775a18b50c53a50ee8e
https://sourceware.org/git/?p=lvm2.git;a=commit;h=c9fdc828ff0504bc2e57f65862bc382f7663a8a2
https://sourceware.org/git/?p=lvm2.git;a=commit;h=6d14144d311fb347e4225ad6a48d4900b39445c4
https://sourceware.org/git/?p=lvm2.git;a=commit;h=bd05318ba2fc588be6339f5dc61f09195996b0e9

The first three patches are mentioned in the upstream bug report [1] and
cause pvscan to read symlink names from udev's DEVLINKS environment
variable under certain conditions. One of the conditions is that at
least one of the filter regexes refer to a symlink. However, this check
only considers a|...| filters [2], so it doesn't trigger if only r|...|
filters are used as above. Hence, in my case the fourth patch is also
needed, as it removes the filter regex check altogether.

Is there a chance the patches could be backported? All four patches seem
to be included in upstream release 2.03.19 [3].

Happy to provide any more information if needed!

Thanks and best wishes,

Friedrich

[1] https://github.com/lvmteam/lvm2/issues/104
[2]
https://sourceware.org/git/?p=lvm2.git;a=blob;f=lib/filters/filter-regex.c;h=ecc32914b0e15ba9cbac5c101cffddf25eddd8ad;hb=6d14144d311fb347e4225ad6a48d4900b39445c4#l272
[3] https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/tags/v2_03_19

Friedrich Weber

unread,
Jan 10, 2024, 3:40:05 AM1/10/24
to
Hi,

On Tue, 29 Aug 2023 17:25:23 +0200 Friedrich Weber <f.w...@proxmox.com>
wrote:
> I'm seeing this bug in a different usecase on Debian Bookworm with LVM
> 2.03.16-2: multipath is set up, the multipath device is an LVM
> physical volume in a volume group with a thin pool. To prevent LVM from
> picking up on the multipath components, /etc/lvm/lvm.conf has a
> global_filter that rejects the multipath components by matching on their
> /dev/disk/by-id symlink paths.

FWIW, for this usecase there seems to be a viable workaround: Instead of
manually adding a global_filter that ignores multipath components, rely
on LVM's own multipath component detection (available since LVM 2.03.13
[1]) that reads /etc/multipath/wwids. Installing multipath-tools-boot
makes this file available in initramfs, and then detection also works in
early boot. The description of multipath-tools-boot states that it
should not be installed if not booting from a multipath device, but
currently I don't see any downside of installing it here (not booting
from a multipath device).

Still, is there a chance the mentioned patches could be backported?
Without them, global_filter is not functioning as expected.
Best,

Friedrich

[1]
https://gitlab.com/lvmteam/lvm2/-/commit/90485650931d3fc04d00c92a729050c8743969e5
[2] https://packages.debian.org/bookworm/multipath-tools-boot
0 new messages