Bug#986230: davfs2: After silent crash of mount.davfs background process, accessing user space program hangs busy

Andreas Feldner

unread,

Apr 1, 2021, 3:20:03 AM4/1/21

to

Package: davfs2
Version: 1.6.0-1
Severity: important

Dear Maintainer,

a move command was issued to move a directory from a local directory onto a mounted webdav resource.
The mount is system-wide, configured in /etc/fstab and handled by systemd.
After creating the target directory and successfully copying one file, mv hangs busy (100% of one CPU core).
Analysing the system showed that the corresponding mount.davfs background process was gone. No message as of
the cause could be found in /var/log/daemon.log. No message at all at that time is shown in dmesg. The mv
process cannot be killed (not even with -KILL, neither with -STOP), trying to attach with strace or gdb
just hangs strace/gdb (can be killed). Any command accessing the stale webdav mount point also hangs unkillable,
however not consuming CPU.
The mv command does not hold a file descriptor on the stale mount, I'd conclude that it hangs in an "open"
system call. Its process status is R (running), not D (uninteruptible sleep)! The process is not consuming
more memory, not generating I/O, nor generating any diagnostic (neither itself nor from the kernel). It
remains a mystery what it is doing.
Unmounting the volume is not possible (device busy). After a lazy unmount, mounting the webdav resource again
is possible and the mount point is usable for any future access. Moving the same directory again worked without
problems, so the presumed crash of mount.davfs was not related to the content.
The original mv command is still running constantly consuming one CPU core after 24 hours.

I'd expect to see a diagnostic log entry from the crashing mount.davfs process. I'd expect the user space
programs accessing the stale mount point to receive an I/O error.

Yours,
Andreas.

-- System Information:
Debian Release: bullseye/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'testing-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-4-amd64 (SMP w/4 CPU threads)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de:en_US
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages davfs2 depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.75
ii libc6 2.31-10
ii libneon27 0.31.2-1

davfs2 recommends no packages.

davfs2 suggests no packages.

-- Configuration Files:
/etc/davfs2/davfs2.conf changed:
connect_timeout 10
read_timeout 30
retry 5
max_retry 300
# httpauth, locks, ssl, httpbody, secrets, most

/etc/davfs2/secrets [Errno 13] Keine Berechtigung: '/etc/davfs2/secrets'

-- debconf information:
davfs2/new_user: true
davfs2/group_name: davfs2
davfs2/non_root_users_confimed:
davfs2/user_name: davfs2
* davfs2/suid_file: true
davfs2/new_group: true

Werner Baumann

unread,

Apr 9, 2021, 5:50:03 AM4/9/21

to

When the userspace daemon mount.davfs dies before the file system is
unmounted, then the kernel file system fuse will hang forever and it is
not possible to normally unmount. I think this is a bug in fuse and
filed a report to the kernel bug tracker long ago.
https://bugzilla.kernel.org/show_bug.cgi?id=115971
There is always a chance that a userspace process crashes. So the kernel
should be able to handle this gracefully.

The other question is why mount.davfs crashed?

- Is the problem reproducible and can you test whether it only happens
when managed by systemd?

- mount.davfs may crash silently when it runs out of memory. For every
file (or directory) it gets to know of it will require about 200 to
300 bytes of working memory and it will not free any of it before it
terminates.

- When mount.davfs is stopped with signal SIGTERM or SIGHUB there
should be a message in daemon.log.

- You could use the debug options to get a lot of debug messages in
daemon.log which might show at what action the daemon dies. These
logs may get very lengthy. Option "debug kernel" and "debug cache"
could be a start.

Werner

Pelzi

unread,

Sep 18, 2021, 8:20:03 AM9/18/21

to

I cannot reproduce that behaviour nor did it happen ever since. My speculation is that some trouble on WebDAV server side (nextcloud) may have caused the crash, but I was unable to nail down the exact operation where that crash did happen.

I still believe it was worth the effort to stabilize the kernel (fuse module) against crashed user processes. Looking at the existing report (https://bugzilla.kernel.org/show_bug.cgi?id=115971), I’d like to add that the impact I observed was even more severe:
- a user space program (mv) accessing the affected mount was shown as hanging busy, consuming CPU
- it was *not* possible to kill this process
- the status of the process did not seem to match its actual state
- it was *not* possible to unmount or force unmount the mount point
- the only means to recover the whole machine from this state has been to reboot, which is quite drastic IMHO.

Far from being an expert, to me this looks like broken error handling in the fuse kernel module. Instead of returning an I/O error to the calling program (mv), the syscall seems to be looping around busy.

I’m not sure how to deal with this issue - it seems it is less an issue of davfs2 but rather of the fuse kernel module?