INFO: task hung in fuse_reverse_inval_entry

63 views
Skip to first unread message

syzbot

unread,
Jul 23, 2018, 3:59:02 AM7/23/18
to linux-...@vger.kernel.org, linux-...@vger.kernel.org, mik...@szeredi.hu, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: d72e90f33aa4 Linux 4.18-rc6
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000
kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5
dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bb6d80...@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
INFO: task syz-executor842:4559 blocked for more than 140 seconds.
Not tainted 4.18.0-rc6+ #160
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor842 D23528 4559 4556 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2853 [inline]
__schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
schedule+0xfb/0x450 kernel/sched/core.c:3545
__rwsem_down_write_failed_common+0x95d/0x1630
kernel/locking/rwsem-xadd.c:566
rwsem_down_write_failed+0xe/0x10 kernel/locking/rwsem-xadd.c:595
call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
__down_write arch/x86/include/asm/rwsem.h:142 [inline]
down_write+0xaa/0x130 kernel/locking/rwsem.c:72
inode_lock include/linux/fs.h:715 [inline]
fuse_reverse_inval_entry+0xae/0x6d0 fs/fuse/dir.c:969
fuse_notify_inval_entry fs/fuse/dev.c:1491 [inline]
fuse_notify fs/fuse/dev.c:1764 [inline]
fuse_dev_do_write+0x2b97/0x3700 fs/fuse/dev.c:1848
fuse_dev_write+0x19a/0x240 fs/fuse/dev.c:1928
call_write_iter include/linux/fs.h:1793 [inline]
new_sync_write fs/read_write.c:474 [inline]
__vfs_write+0x6c6/0x9f0 fs/read_write.c:487
vfs_write+0x1f8/0x560 fs/read_write.c:549
ksys_write+0x101/0x260 fs/read_write.c:598
__do_sys_write fs/read_write.c:610 [inline]
__se_sys_write fs/read_write.c:607 [inline]
__x64_sys_write+0x73/0xb0 fs/read_write.c:607
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445869
Code: Bad RIP value.
RSP: 002b:00007ffa2ef7fda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 0000000000445869
RDX: 0000000000000029 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
R13: 64695f70756f7267 R14: 2f30656c69662f2e R15: 0000000000000001
INFO: task syz-executor842:4560 blocked for more than 140 seconds.
Not tainted 4.18.0-rc6+ #160
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor842 D26008 4560 4556 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2853 [inline]
__schedule+0x87c/0x1ed0 kernel/sched/core.c:3501
schedule+0xfb/0x450 kernel/sched/core.c:3545
request_wait_answer+0x4c8/0x920 fs/fuse/dev.c:463
__fuse_request_send+0x12a/0x1d0 fs/fuse/dev.c:483
fuse_request_send+0x62/0xa0 fs/fuse/dev.c:496
fuse_simple_request+0x33d/0x730 fs/fuse/dev.c:554
fuse_lookup_name+0x3ee/0x830 fs/fuse/dir.c:323
fuse_lookup+0xf9/0x4c0 fs/fuse/dir.c:360
__lookup_hash+0x12e/0x190 fs/namei.c:1505
filename_create+0x1e5/0x5b0 fs/namei.c:3646
user_path_create fs/namei.c:3703 [inline]
do_mkdirat+0xda/0x310 fs/namei.c:3842
__do_sys_mkdirat fs/namei.c:3861 [inline]
__se_sys_mkdirat fs/namei.c:3859 [inline]
__x64_sys_mkdirat+0x76/0xb0 fs/namei.c:3859
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445869
Code: Bad RIP value.
RSP: 002b:00007ffa2ef5eda8 EFLAGS: 00000297 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 0000000000445869
RDX: 0000000000000000 RSI: 0000000020000500 RDI: 00000000ffffff9c
RBP: 00000000006dac38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000297 R12: 0030656c69662f2e
R13: 64695f70756f7267 R14: 2f30656c69662f2e R15: 0000000000000001

Showing all locks held in the system:
1 lock held by khungtaskd/901:
#0: (____ptrval____) (rcu_read_lock){....}, at:
debug_show_all_locks+0xd0/0x428 kernel/locking/lockdep.c:4461
1 lock held by rsyslogd/4441:
#0: (____ptrval____) (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200
fs/file.c:766
2 locks held by getty/4531:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4532:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4533:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4534:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4535:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4536:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by getty/4537:
#0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (____ptrval____) (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1ce0 drivers/tty/n_tty.c:2140
2 locks held by syz-executor842/4559:
#0: (____ptrval____) (&fc->killsb){.+.+}, at: fuse_notify_inval_entry
fs/fuse/dev.c:1488 [inline]
#0: (____ptrval____) (&fc->killsb){.+.+}, at: fuse_notify
fs/fuse/dev.c:1764 [inline]
#0: (____ptrval____) (&fc->killsb){.+.+}, at:
fuse_dev_do_write+0x2b2d/0x3700 fs/fuse/dev.c:1848
#1: (____ptrval____) (&type->i_mutex_dir_key#4){+.+.}, at: inode_lock
include/linux/fs.h:715 [inline]
#1: (____ptrval____) (&type->i_mutex_dir_key#4){+.+.}, at:
fuse_reverse_inval_entry+0xae/0x6d0 fs/fuse/dir.c:969
3 locks held by syz-executor842/4560:
#0: (____ptrval____) (sb_writers#9){.+.+}, at: sb_start_write
include/linux/fs.h:1554 [inline]
#0: (____ptrval____) (sb_writers#9){.+.+}, at: mnt_want_write+0x3f/0xc0
fs/namespace.c:386
#1: (____ptrval____) (&type->i_mutex_dir_key#3/1){+.+.}, at:
inode_lock_nested include/linux/fs.h:750 [inline]
#1: (____ptrval____) (&type->i_mutex_dir_key#3/1){+.+.}, at:
filename_create+0x1b2/0x5b0 fs/namei.c:3645
#2: (____ptrval____) (&fi->mutex){+.+.}, at: fuse_lock_inode+0xaf/0xe0
fs/fuse/inode.c:363

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 901 Comm: khungtaskd Not tainted 4.18.0-rc6+ #160
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:196 [inline]
watchdog+0x9c4/0xf80 kernel/hung_task.c:252
kthread+0x345/0x410 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Dmitry Vyukov

unread,
Jul 23, 2018, 4:12:16 AM7/23/18
to linux-fsdevel, Miklos Szeredi, LKML, syzkaller-bugs, syzbot
On Mon, Jul 23, 2018 at 9:59 AM, syzbot
<syzbot+bb6d80...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: d72e90f33aa4 Linux 4.18-rc6
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5
> dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000


Hi fuse maintainers,

We are seeing a bunch of such deadlocks in fuse on syzbot. As far as I
understand this is mostly working-as-intended (parts about deadlocks
in Documentation/filesystems/fuse.txt). The intended way to resolve
this is aborting connections via fusectl, right? The doc says "Under
the fuse control filesystem each connection has a directory named by a
unique number". The question is: if I start a process and this process
can mount fuse, how do I kill it? I mean: totally and certainly get
rid of it right away? How do I find these unique numbers for the
mounts it created? Taking into account that there is usually no
operator attached to each server, I wonder if kernel could somehow
auto-abort fuse on kill? E.g. if all processes holding the fuse fd are
killed, it would be reasonable to abort the fuse conn and auto-resolve
deadlocks. Is it possible?
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000bc17b60571a60434%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Miklos Szeredi

unread,
Jul 23, 2018, 8:12:47 AM7/23/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
On Mon, Jul 23, 2018 at 10:11 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, Jul 23, 2018 at 9:59 AM, syzbot
> <syzbot+bb6d80...@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: d72e90f33aa4 Linux 4.18-rc6
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000
>
>
> Hi fuse maintainers,
>
> We are seeing a bunch of such deadlocks in fuse on syzbot. As far as I
> understand this is mostly working-as-intended (parts about deadlocks
> in Documentation/filesystems/fuse.txt). The intended way to resolve
> this is aborting connections via fusectl, right?

Yes. Alternative is with "umount -f".

> The doc says "Under
> the fuse control filesystem each connection has a directory named by a
> unique number". The question is: if I start a process and this process
> can mount fuse, how do I kill it? I mean: totally and certainly get
> rid of it right away? How do I find these unique numbers for the
> mounts it created?

It is the device number found in st_dev for the mount. Other than
doing stat(2) it is possible to find out the device number by reading
/proc/$PID/mountinfo (third field).

> Taking into account that there is usually no
> operator attached to each server, I wonder if kernel could somehow
> auto-abort fuse on kill?

Depends on what the fuse server is sleeping on. If it's trying to
acquire an inode lock (e.g. unlink(2)), which is classical way to
deadlock a fuse filesystem, then it will go into an uninterruptible
sleep. There's no way in which that process can be killed except to
force a release of the offending lock, which can only be done by
aborting the request that is being performed while holding that lock.

Thanks,
Miklos

Dmitry Vyukov

unread,
Jul 23, 2018, 8:22:32 AM7/23/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Thanks. I will try to figure out fusectl connection numbers and see if
it's possible to integrate aborting into syzkaller.

>> Taking into account that there is usually no
>> operator attached to each server, I wonder if kernel could somehow
>> auto-abort fuse on kill?
>
> Depends on what the fuse server is sleeping on. If it's trying to
> acquire an inode lock (e.g. unlink(2)), which is classical way to
> deadlock a fuse filesystem, then it will go into an uninterruptible
> sleep. There's no way in which that process can be killed except to
> force a release of the offending lock, which can only be done by
> aborting the request that is being performed while holding that lock.

I understand that it is not killed today, but I am asking if we can
make it killable. It's all code that we can change, and if a human
operator can do it, it can be done pure programmatically on kill too,
right?

Miklos Szeredi

unread,
Jul 23, 2018, 8:33:10 AM7/23/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Hmm, you mean if a process is in an uninterruptible sleep trying to
acquire a lock on a fuse filesystem and is killed, then the fuse
filesystem should be aborted?

Even if we'd manage to implement that, it's a large backward
incompatibility risk.

I don't argue that it can be done, but I would definitely argue *if*
it should be done.

Thanks,
Miklos

Dmitry Vyukov

unread,
Jul 23, 2018, 8:47:06 AM7/23/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
I understand that we should abort only if we are sure that it's
actually deadlocked and there is no other way.
So if fuse-user process is blocked on fuse lock, then we probably
should do nothing. However, if the fuse-server is killed, then perhaps
we could abort the connection at that point. Namely, if a process that
has a fuse fd open is killed and it is the only process that shared
this fd, then we could abort the connection on arrival of the kill
signal (rather than wait untill all it's threads finish and then start
closing all fd's, this is where we get the deadlock -- some of its
threads won't finish). I don't know if such synchronous kill hook is
available, though. If several processes shared the same fuse fd, then
we could close the fd in each process on SIGKILL arrival, then when
all of these processes are killed, fuse fd will be closed and we can
abort the connection, which will un-deadlock all of these processes.
Does this look any reasonable?

Miklos Szeredi

unread,
Jul 23, 2018, 9:05:39 AM7/23/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Biggest conceptual problem: your definition of fuse-server is weak.
Take the following example: process A is holding the fuse device fd
and is forwarding requests and replies to/from process B via a pipe.
So basically A is just a proxy that does nothing interesting, the
"real" server is B. But according to your definition B is not a
server, only A is.

And this is just a simple example, parts of the server might be on
different machines, etc... It's impossible to automatically detect if
a process is acting as a fuse server or not.

We could let the fuse server itself notify the kernel that it's a fuse
server. That might help in the cases where the deadlock is
accidental, but obviously not in the case when done by a malicious
agent. I'm not sure it's worth the effort. Also I have no idea how
the respective maintainers would take the idea of "kill hooks"... It
would probably be a lot of work for little gain.

Thanks,
Miklos

Dmitry Vyukov

unread,
Jul 23, 2018, 9:37:40 AM7/23/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
I proposed to abort fuse conn when all fuse device fd's are "killed"
(all processes having the fd opened are killed). So if _only_ process
B is killed, then, yes, it will still hang. However if A is killed or
both A and B (say, process group, everything inside of pid namespace,
etc) then the deadlock will be autoresolved without human
intervention.

> And this is just a simple example, parts of the server might be on
> different machines, etc... It's impossible to automatically detect if
> a process is acting as a fuse server or not.

It does not seem we need the precise definition. If no one ever can
write anything into the fd, we can safely abort the connection (?). If
we don't, we can either get that the process exits normally and the
connection is doomed anyway, so no difference in behavior, or we can
get a deadlock.

> We could let the fuse server itself notify the kernel that it's a fuse
> server. That might help in the cases where the deadlock is
> accidental, but obviously not in the case when done by a malicious
> agent. I'm not sure it's worth the effort. Also I have no idea how
> the respective maintainers would take the idea of "kill hooks"... It
> would probably be a lot of work for little gain.

What looks wrong to me here is that fuse is only (?) subsystem in
kernel that stops SIGKILL from working and requires complex custom
dance performed by a human operator (which is not necessary there at
all). Say, if a process has opened a socket, whatever, I don't need to
locate and abort something in socketctl fs, just SIGKILL. If a
processes has opened a file, I don't need to locate the fd in /proc
and abort it, just SIGKILL. If a process has created an ipc object, I
don't need to do any special dance, just SIGKILL. fuse is somehow very
special, if we have more such cases, it definitely won't scale.
I understand that there can be implementation difficulties, but
fundamentally that's how things should work -- choose target
processes, kill, done, right?

Miklos Szeredi

unread,
Jul 23, 2018, 11:09:17 AM7/23/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
On Mon, Jul 23, 2018 at 3:37 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, Jul 23, 2018 at 3:05 PM, Miklos Szeredi <mik...@szeredi.hu> wrote:

>> Biggest conceptual problem: your definition of fuse-server is weak.
>> Take the following example: process A is holding the fuse device fd
>> and is forwarding requests and replies to/from process B via a pipe.
>> So basically A is just a proxy that does nothing interesting, the
>> "real" server is B. But according to your definition B is not a
>> server, only A is.
>
> I proposed to abort fuse conn when all fuse device fd's are "killed"
> (all processes having the fd opened are killed). So if _only_ process
> B is killed, then, yes, it will still hang. However if A is killed or
> both A and B (say, process group, everything inside of pid namespace,
> etc) then the deadlock will be autoresolved without human
> intervention.

Okay, so you're saying:

1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
2) for a particular fuse instance find set of fuse device fd
references that are in non-doomed tasks; if there are none then abort
fuse instance

Right?

The above is not an implementation proposal, just to get us on the
same page regarding the concept.

>> And this is just a simple example, parts of the server might be on
>> different machines, etc... It's impossible to automatically detect if
>> a process is acting as a fuse server or not.
>
> It does not seem we need the precise definition. If no one ever can
> write anything into the fd, we can safely abort the connection (?).

Seems to me so.

> If
> we don't, we can either get that the process exits normally and the
> connection is doomed anyway, so no difference in behavior, or we can
> get a deadlock.
>
>> We could let the fuse server itself notify the kernel that it's a fuse
>> server. That might help in the cases where the deadlock is
>> accidental, but obviously not in the case when done by a malicious
>> agent. I'm not sure it's worth the effort. Also I have no idea how
>> the respective maintainers would take the idea of "kill hooks"... It
>> would probably be a lot of work for little gain.
>
> What looks wrong to me here is that fuse is only (?) subsystem in
> kernel that stops SIGKILL from working and requires complex custom
> dance performed by a human operator (which is not necessary there at
> all). Say, if a process has opened a socket, whatever, I don't need to
> locate and abort something in socketctl fs, just SIGKILL. If a
> processes has opened a file, I don't need to locate the fd in /proc
> and abort it, just SIGKILL. If a process has created an ipc object, I
> don't need to do any special dance, just SIGKILL. fuse is somehow very
> special, if we have more such cases, it definitely won't scale.
> I understand that there can be implementation difficulties, but
> fundamentally that's how things should work -- choose target
> processes, kill, done, right?

Yes, it would be nice.

But I'm not sure it will fly due to implementation difficulties. It's
definitely not a high prio feature currently for me, but I'll happily
accept patches.

Thanks,
Miklos

Dmitry Vyukov

unread,
Jul 23, 2018, 11:19:22 AM7/23/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
On Mon, Jul 23, 2018 at 5:09 PM, Miklos Szeredi <mik...@szeredi.hu> wrote:
> On Mon, Jul 23, 2018 at 3:37 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>> On Mon, Jul 23, 2018 at 3:05 PM, Miklos Szeredi <mik...@szeredi.hu> wrote:
>
>>> Biggest conceptual problem: your definition of fuse-server is weak.
>>> Take the following example: process A is holding the fuse device fd
>>> and is forwarding requests and replies to/from process B via a pipe.
>>> So basically A is just a proxy that does nothing interesting, the
>>> "real" server is B. But according to your definition B is not a
>>> server, only A is.
>>
>> I proposed to abort fuse conn when all fuse device fd's are "killed"
>> (all processes having the fd opened are killed). So if _only_ process
>> B is killed, then, yes, it will still hang. However if A is killed or
>> both A and B (say, process group, everything inside of pid namespace,
>> etc) then the deadlock will be autoresolved without human
>> intervention.
>
> Okay, so you're saying:
>
> 1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
> 2) for a particular fuse instance find set of fuse device fd
> references that are in non-doomed tasks; if there are none then abort
> fuse instance
>
> Right?


Yes, something like this.
Perhaps checking for "uninterruptible sleep" is excessive. If it has
SIGKILL pending it's pretty much doomed already. This info should be
already available for tasks.
Not saying that it's better, but what I described was the other way
around: when a task killed it drops a reference to all opened fuse
fds, when the last fd is dropped, the connection can be aborted.
I see. Thanks for bearing with me.

Miklos Szeredi

unread,
Jul 24, 2018, 11:18:02 AM7/24/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
struct task_struct {
[...]
struct files_struct *files;
[...]
};

struct files_struct {
[...]
struct fdtable __rcu *fdt;
[...]
};

struct fdtable {
[...]
struct file __rcu **fd; /* current fd array */
[...]
};

So there we have an array of pointers to struct files. Suppose we'd
magically be able to find files that point to fuse devices upon
receiving SIGKILL, what would we do with them? We can't close them:
other tasks might still be pointing to the same files_struct.

We could do a global search for non-doomed tasks referencing the same
fuse device, but I have no clue how we'd go about doing that without
racing with forks, fd sending, etc...

Thanks,
Miklos

Dmitry Vyukov

unread,
Jul 25, 2018, 5:12:22 AM7/25/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Good questions for which I don't have answers.

Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?

Miklos Szeredi

unread,
Jul 26, 2018, 4:44:27 AM7/26/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
That's an interesting aspect. Making request_wait_answer always be
killable would help with the issue you raise (killing set of processes
taking part in deadlock should resolve deadlock), but it breaks
another aspect of the interface.

Namely that userspace filesystems expect some serialization from
kernel when performing operations. If we allow killing of a process
in the middle of an fs operation, then that serialization is no longer
there, which can break the server.

One solution to that is to duplicate all locking in the server
(libfuse normally), but it would not solve the issue for legacy
libfuse or legacy non-libfuse servers. It would also be difficult to
test. Also it doesn't solve the problem of killing the server, as
that alone doesn't resolve the deadlock.

Thanks,
Miklos

Miklos Szeredi

unread,
Jul 26, 2018, 5:13:00 AM7/26/18
to Dmitry Vyukov, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Umm, we can actually do better. Duplicate all vfs locking in the
fuse kernel implementation: when killing a task that has an
outstanding request, return immediately (which results in releasing
the VFS level lock and hence the deadlock) but hold onto our own lock
until the reply from the userspace server comes back.

Need to think about the details; this might not be easy to do this
properly. Notably memory management locks (page->lock, mmap_sem,
etc) are notoriously tricky.

Thanks,
MIklos

Dmitry Vyukov

unread,
Nov 2, 2018, 3:31:29 PM11/2/18
to Miklos Szeredi, linux-fsdevel, LKML, syzkaller-bugs, syzbot
Hi Miklos,

Any updates on this?

syzbot recently found this hang in fuse, which looks real (totally unkillable):
https://syzkaller.appspot.com/bug?id=0d08132d6dac82ae63b7b8d4a9d027d30b46167d

but this one still happens, and it's hard to tell if it's real or not:
https://syzkaller.appspot.com/bug?id=76f8203fef423375d230f14b8f5b45617ab945e2

syzbot

unread,
Nov 7, 2019, 8:42:06 AM11/7/19
to dvy...@google.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, mik...@szeredi.hu, msze...@redhat.com, syzkall...@googlegroups.com
syzbot suspects this bug was fixed by commit:

commit c59fd85e4fd07fdf0ab523a5e9734f5338d6aa19
Author: Kirill Tkhai <ktk...@virtuozzo.com>
Date: Tue Sep 11 10:11:56 2018 +0000

fuse: change interrupt requests allocation algorithm

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15518db2600000
start commit: d72e90f3 Linux 4.18-rc6
git tree: upstream
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000

If the result looks correct, please mark the bug fixed by replying with:

#syz fix: fuse: change interrupt requests allocation algorithm

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

syzbot

unread,
Nov 10, 2019, 9:15:09 AM11/10/19
to Tetsuo Handa, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
> Bisection log is clear.

> #syz fix: fuse: change interrupt requests allocation algorithm

Your 'fix:' command is accepted, but please keep
syzkall...@googlegroups.com mailing list in CC next time. It serves as
a history of what happened with each bug report. Thank you.

Reply all
Reply to author
Forward
0 new messages