Intermittent kernel zombie dm entries after lazy unmount - LVM images on EL9

5 views
Skip to first unread message

Matt Tolle

unread,
Feb 5, 2026, 12:41:43 PM (6 days ago) Feb 5
to kiwi
Hi all,

I'm running into an issue with KIWI 10.2.33 on a Rocky 9 build host where intermittently the cleanup phase leaves behind device mapper entries that can't be removed without rebooting the host. I'm hoping someone can point me in the right direction or tell me if this is a known issue.

My setup: I'm building LVM-based GCE/Azure/AWS disk images for Rocky 9, RHEL 9, and OEL 9. The builds run sequentially on the same host. Each image has a rootvg volume group with lv_root, lv_var, lv_tmp, lv_home, and lv_opt logical volumes.

The problem shows up during KIWI's internal cleanup. When KIWI tries to unmount /app/tmp/kiwi_volumes.XXXXX, it sometimes gets "target is busy" errors. After 5 retries it falls back to lazy unmount. Here's what that looks like in the logs:

    [ WARNING ]: 01:06:00 | 0 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:01 | 1 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:02 | 2 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:03 | 3 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ WARNING ]: 01:06:04 | 4 umount of /app/tmp/kiwi_volumes.qw595_hg failed with: target is busy
    [ DEBUG   ]: 01:06:05 | EXEC: [umount --lazy /app/tmp/kiwi_volumes.qw595_hg]

The lazy unmount "succeeds" but leaves behind a kernel reference on the dm device. After KIWI exits, I'm left with this:

    $ dmsetup info rootvg-lv_root
    Name:              rootvg-lv_root
    State:             ACTIVE
    Open count:        1

    $ fuser -v /dev/mapper/rootvg-lv_root
    (nothing - no userspace process has it open)

    $ dmsetup remove rootvg-lv_root
    device-mapper: remove ioctl on rootvg-lv_root failed: Device or resource busy

So there's a kernel-level hold with open count 1, but nothing in userspace is using it. I've tried dmsetup remove --force, vgchange -an, suspending the device first - nothing works. Only a reboot clears it.

This blocks subsequent builds because they can't create a new rootvg volume group while the old dm entries exist.

I dug into the debug logs to figure out what's causing the "target is busy" and found something interesting. At the exact moment KIWI is trying to unmount kiwi_volumes, it's also mounting a NEW kiwi_mount_manager on the same LVM volume:

    [ DEBUG   ]: 01:06:00 | EXEC: [umount /app/tmp/kiwi_volumes.qw595_hg]
    [ DEBUG   ]: 01:06:00 | EXEC: Failed with stderr: umount: target is busy
    [ DEBUG   ]: 01:06:00 | EXEC: [mount /dev/rootvg/lv_root /app/tmp/kiwi_mount_manager.28_k0mus]

Same second, same LVM volume. So KIWI seems to be racing with itself - it's got concurrent operations trying to mount and unmount the same underlying block device. That would explain the "target is busy" error.

What's weird is that it doesn't happen every time. In my last run:
- Rocky 9: zero "target is busy" errors, built fine
- RHEL 9: zero "target is busy" errors, built fine
- OEL 9: 21 "target is busy" errors, fell back to lazy unmount, left zombie dm entry

All three use the same KIWI config and profiles. They run one after another on the same host. Rocky and RHEL had no issues at all, OEL hit the race condition hard. It's not always the third build that failes. Sometimes it's the first. Sometimes it's the second. 

I looked at mount_manager.py and see the umount() method does 5 retries with 1 second sleep, then falls back to umount --lazy. I tried patching it locally to do 10 retries and add fuser -km calls to kill any holders, but that didn't fully solve it because the "holder" is apparently KIWI's own concurrent mount operation, not an external process.

A few questions:

1. Is the concurrent mount/unmount on the same LV expected? It seems like there might be missing serialization between different KIWI subsystems.

2. Is there a way to disable the lazy unmount fallback? I'd rather have the build fail than leave zombie entries that require a reboot. Or at least have an option to control this behavior.

3. Has anyone else seen this on EL9? We recently updated to lvm2-2.03.32-2.el9_7.1 and device-mapper-1.02.206-2.el9_7.1 (Feb 3rd), and I'm wondering if something changed in the LVM/dm layer that made this more likely to happen.

4. Any suggestions for workarounds? Right now I'm adding a 60 second delay between builds to give the kernel more time to settle, but that's just a guess.

For reference, my environment:
- KIWI: 10.2.33
- Host: Rocky Linux 9
- Kernel: 5.14.0-611.24.1.el9_7.x86_64
- LVM2: 2.03.32-2.el9_7.1
- device-mapper: 1.02.206-2.el9_7.1

Happy to provide more debug logs or test patches if that would help.

Thanks,

Matt Tolle

unread,
Feb 5, 2026, 1:37:48 PM (5 days ago) Feb 5
to kiwi
Not sure if this helps. I had AI review all the other posts in this group to see if someone else has already resolved this issue. Here is what AI found: 

  Summary of what I found in the KIWI mailing list:
  1. 2020 thread (lvmetad) - Marcus explained that device mapper cleanup issues on LVM builds are often caused by
     lvmetad holding locks. Solution: disable lvmetad. However, we already have use_lvmetad = 0, so this isn't our
     issue.
  2. 2019 thread (device busy) - Users saw "device-mapper: remove ioctl failed: Device or resource busy" and solved it
      with dmsetup remove_all -f. Marcus noted that if KIWI isn't killed with SIGKILL, it should cleanup properly.
  3. 2012 thread (vgchange -a n needs retry) - Very similar timing issue. The solution was adding udevadm settle
     before vgchange calls. This was incorporated into KIWI.

  The key difference with our issue:
  • Those older issues were about external processes (lvmetad, udev) holding locks
  • Our issue is KIWI racing with itself - it's simultaneously mounting a new kiwi_mount_manager while trying to
    unmount kiwi_volumes on the same LVM volume

  This appears to be a new behavior pattern, possibly related to:
  • Recent LVM/device-mapper update (Feb 3, 2026)
  • Or changes in how KIWI orchestrates its internal operations

  Your post to the list is the first to report this specific concurrent mount/unmount race condition. Hopefully Marcus
  or others will respond with insight.

-Matt

Reply all
Reply to author
Forward
0 new messages