Intermittent kernel zombie dm entries after lazy unmount - LVM images on EL9

19 views
Skip to first unread message

Matt Tolle

unread,
Feb 5, 2026, 12:41:43 PMFeb 5
to kiwi
Hi all,

I'm running into an issue with KIWI 10.2.33 on a Rocky 9 build host where intermittently the cleanup phase leaves behind device mapper entries that can't be removed without rebooting the host. I'm hoping someone can point me in the right direction or tell me if this is a known issue.

My setup: I'm building LVM-based GCE/Azure/AWS disk images for Rocky 9, RHEL 9, and OEL 9. The builds run sequentially on the same host. Each image has a rootvg volume group with lv_root, lv_var, lv_tmp, lv_home, and lv_opt logical volumes.

The problem shows up during KIWI's internal cleanup. When KIWI tries to unmount /app/tmp/kiwi_volumes.XXXXX, it sometimes gets "target is busy" errors. After 5 retries it falls back to lazy unmount. Here's what that looks like in the logs:

    [ WARNING ]: 01:06:00 | 0 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:01 | 1 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:02 | 2 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:03 | 3 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:04 | 4 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ DEBUG   ]: 01:06:05 | EXEC: [umount --lazy /app/tmp/kiwi_volumes.qw595_hg]

The lazy unmount "succeeds" but leaves behind a kernel reference on the dm device. After KIWI exits, I'm left with this:

    $ dmsetup info rootvg-lv_root
    Name:              rootvg-lv_root
    State:             ACTIVE
    Open count:        1

    $ fuser -v /dev/mapper/rootvg-lv_root
    (nothing - no userspace process has it open)

    $ dmsetup remove rootvg-lv_root
    device-mapper: remove ioctl on rootvg-lv_root failed: Device or resource busy

So there's a kernel-level hold with open count 1, but nothing in userspace is using it. I've tried dmsetup remove --force, vgchange -an, suspending the device first - nothing works. Only a reboot clears it.

This blocks subsequent builds because they can't create a new rootvg volume group while the old dm entries exist.

I dug into the debug logs to figure out what's causing the "target is busy" and found something interesting. At the exact moment KIWI is trying to unmount kiwi_volumes, it's also mounting a NEW kiwi_mount_manager on the same LVM volume:

    [ DEBUG   ]: 01:06:00 | EXEC: [umount /app/tmp/kiwi_volumes.qw595_hg]
    [ DEBUG   ]: 01:06:00 | EXEC: Failed with stderr: umount: target is busy
    [ DEBUG   ]: 01:06:00 | EXEC: [mount /dev/rootvg/lv_root /app/tmp/kiwi_mount_manager.28_k0mus]

Same second, same LVM volume. So KIWI seems to be racing with itself - it's got concurrent operations trying to mount and unmount the same underlying block device. That would explain the "target is busy" error.

What's weird is that it doesn't happen every time. In my last run:
- Rocky 9: zero "target is busy" errors, built fine
- RHEL 9: zero "target is busy" errors, built fine
- OEL 9: 21 "target is busy" errors, fell back to lazy unmount, left zombie dm entry

All three use the same KIWI config and profiles. They run one after another on the same host. Rocky and RHEL had no issues at all, OEL hit the race condition hard. It's not always the third build that failes. Sometimes it's the first. Sometimes it's the second. 

I looked at mount_manager.py and see the umount() method does 5 retries with 1 second sleep, then falls back to umount --lazy. I tried patching it locally to do 10 retries and add fuser -km calls to kill any holders, but that didn't fully solve it because the "holder" is apparently KIWI's own concurrent mount operation, not an external process.

A few questions:

1. Is the concurrent mount/unmount on the same LV expected? It seems like there might be missing serialization between different KIWI subsystems.

2. Is there a way to disable the lazy unmount fallback? I'd rather have the build fail than leave zombie entries that require a reboot. Or at least have an option to control this behavior.

3. Has anyone else seen this on EL9? We recently updated to lvm2-2.03.32-2.el9_7.1 and device-mapper-1.02.206-2.el9_7.1 (Feb 3rd), and I'm wondering if something changed in the LVM/dm layer that made this more likely to happen.

4. Any suggestions for workarounds? Right now I'm adding a 60 second delay between builds to give the kernel more time to settle, but that's just a guess.

For reference, my environment:
- KIWI: 10.2.33
- Host: Rocky Linux 9
- Kernel: 5.14.0-611.24.1.el9_7.x86_64
- LVM2: 2.03.32-2.el9_7.1
- device-mapper: 1.02.206-2.el9_7.1

Happy to provide more debug logs or test patches if that would help.

Thanks,

Matt Tolle

unread,
Feb 5, 2026, 1:37:48 PMFeb 5
to kiwi
Not sure if this helps. I had AI review all the other posts in this group to see if someone else has already resolved this issue. Here is what AI found: 

  Summary of what I found in the KIWI mailing list:
  1. 2020 thread (lvmetad) - Marcus explained that device mapper cleanup issues on LVM builds are often caused by
     lvmetad holding locks. Solution: disable lvmetad. However, we already have use_lvmetad = 0, so this isn't our
     issue.
  2. 2019 thread (device busy) - Users saw "device-mapper: remove ioctl failed: Device or resource busy" and solved it
      with dmsetup remove_all -f. Marcus noted that if KIWI isn't killed with SIGKILL, it should cleanup properly.
  3. 2012 thread (vgchange -a n needs retry) - Very similar timing issue. The solution was adding udevadm settle
     before vgchange calls. This was incorporated into KIWI.

  The key difference with our issue:
  • Those older issues were about external processes (lvmetad, udev) holding locks
  • Our issue is KIWI racing with itself - it's simultaneously mounting a new kiwi_mount_manager while trying to
    unmount kiwi_volumes on the same LVM volume

  This appears to be a new behavior pattern, possibly related to:
  • Recent LVM/device-mapper update (Feb 3, 2026)
  • Or changes in how KIWI orchestrates its internal operations

  Your post to the list is the first to report this specific concurrent mount/unmount race condition. Hopefully Marcus
  or others will respond with insight.

-Matt

Marcus Schäfer

unread,
Feb 17, 2026, 4:37:06 AMFeb 17
to kiwi-...@googlegroups.com
Hi Matt,

> I'm running into an issue with KIWI 10.2.33 on a Rocky 9 build host
> where intermittently the cleanup phase leaves behind device mapper
> entries that can't be removed without rebooting the host. I'm hoping
> someone can point me in the right direction or tell me if this is a
> known issue.
> My setup: I'm building LVM-based GCE/Azure/AWS disk images for Rocky 9,

I saw this issue with LVM images if the host has lvmetad running which
goes wild when creating new volume groups and volumes in a loop device.
The usual way for me to solve this is:

systemctl stop lvm2-lvmetad
systemctl disable lvm2-lvmetad
systemctl mask lvm2-lvmetad

systemctl stop lvm2-lvmetad.socket
systemctl disable lvm2-lvmetad.socket
systemctl mask lvm2-lvmetad.socket

Of course this is only an option if the host itself is not LVM based

Hope this helps

Regards,
Marcus
--
Public Key available via: https://keybase.io/marcus_schaefer/key.asc
keybase search marcus_schaefer
signature.asc

Matt Tolle

unread,
Feb 19, 2026, 11:16:32 AM (12 days ago) Feb 19
to kiwi
Hi Marcus,
  Thank you so much for the quick response. That's really helpful context.
  You mentioned that the solution (disabling lvmetad) is "only an option if the host itself is not LVM based." I'm
  currently running into exactly that situation - my build host uses LVM for its root filesystem (root_vg with lv_root,
   lv_var, lv_tmp, etc.), so I can't safely disable the LVM services.
  On EL9, lvmetad has been deprecated, but I do have lvm2-monitor.service and lvm2-lvmpolld.socket active, and I'm
  still seeing the intermittent "target is busy" issues that lead to zombie dm entries requiring host reboots.
  I'm considering rebuilding my build host to use XFS directly (no LVM) while continuing to build LVM-based images for
  deployment. My understanding is that KIWI can build LVM images on a non-LVM host since it creates the LVM structures
  inside loop devices independently of the host's filesystem.
  Would this approach eliminate the race condition? Or are there other considerations I should be aware of when running
   KIWI on a non-LVM host to build LVM images?
  I really appreciate your work on KIWI - it's been a game-changer for our multi-cloud image pipeline. Just trying to
  get past this last stability hurdle.
  
Thanks again,
  -Matt

Marcus Schäfer

unread,
Feb 20, 2026, 2:21:06 PM (11 days ago) Feb 20
to kiwi-...@googlegroups.com
Hi Matt,

> You mentioned that the solution (disabling lvmetad) is "only an
> option if the host itself is not LVM based."

yeah because in this case the metad manages your hosts LVM and
for some reasons creating and deleting new volumes and volume
groups causes conflicts when kiwi tries to deactivate them even
though from a device perspective there is no conflict. It leads
to the resources staying busy and only a reboot can solve it.

> I'm
> currently running into exactly that situation - my build host uses
> LVM for its root filesystem (root_vg with lv_root,
> lv_var, lv_tmp, etc.), so I can't safely disable the LVM services.
> On EL9, lvmetad has been deprecated, but I do have
> lvm2-monitor.service and lvm2-lvmpolld.socket active, and I'm
> still seeing the intermittent "target is busy" issues that lead to
> zombie dm entries requiring host reboots.

I understand

> I'm considering rebuilding my build host to use XFS directly (no LVM)
> while continuing to build LVM-based images for

There is another solution available with the kiwi boxbuild plugin.
For details see:

https://osinside.github.io/kiwi/plugins/self_contained.html

If you would consider to run the build process in an isolated
environment like a VM or a container you don't have to refactor
your build host.

Please also note that kiwi tries to not interfere with the
host as much as possible, but as kiwi uses kernel interfaces (mount, loop)
some host infrastructure is not prepared for this use case. LVM
is unfortunately one of it. If it turns out to be a big issue for
you, you could also consider to take a look at mkosi which is an
image builder that avoids using kernel interfaces, including
other restrictions though.

> deployment. My understanding is that KIWI can build LVM images on a
> non-LVM host since it creates the LVM structures
> inside loop devices independently of the host's filesystem.
> Would this approach eliminate the race condition?

yes I myself build LVM images on xfs based host systems

> Or are there other
> considerations I should be aware of when running
> KIWI on a non-LVM host to build LVM images?

Not to my knowledge

> I really appreciate your work on KIWI - it's been a game-changer for
> our multi-cloud image pipeline. Just trying to
> get past this last stability hurdle.

Thanks much, we try to make it useful for many use cases.
Let's hope we find an acceptable solution for this problem as well

Best regards,
signature.asc
Reply all
Reply to author
Forward
0 new messages