fanotify: unkillable hanged processes

517 views
Skip to first unread message

Dmitry Vyukov

unread,
Sep 1, 2016, 1:54:05 PM9/1/16
to epa...@redhat.com, LKML, Peter Zijlstra, Ingo Molnar, syzkaller, Kostya Serebryany, Alexander Potapenko
Hello,

The following program:

https://gist.githubusercontent.com/dvyukov/0952eeac71069b46b3fe0e28bd1a02bf/raw/396b9dcce2636cecab1a4161c15d3f066e6ef639/gistfile1.txt

if run in a parallel loop creates unkillable hanged processes:

-bash-4.3# ps afxu | grep a.out
root 4489 0.0 0.0 8868 340 ttyp1 S+ 17:19 0:00
\_ grep a.out
root 4423 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
root 4424 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
root 4425 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
root 4470 0.0 0.1 7016 2316 ttyp0 D 17:16 0:00
./stress ./a.out

This looks like a classical deadlock, but LOCKDEP is silent so
+LOCKDEP maintainers.

Most of the processes are hanged at (from /proc/pid/stack):

[<ffffffff814dd6c8>] __synchronize_srcu+0x248/0x380 kernel/rcu/srcu.c:448
[<ffffffff814dd81e>] synchronize_srcu+0x1e/0x40 kernel/rcu/srcu.c:492
[<ffffffff819448d7>] fsnotify_mark_destroy_list+0x107/0x370 fs/notify/mark.c:551
[<ffffffff8194163e>] fsnotify_destroy_group+0x1e/0xc0 fs/notify/group.c:57
[<ffffffff8194a84b>] fanotify_release+0x20b/0x2d0
fs/notify/fanotify/fanotify_user.c:392
[<ffffffff81868bbc>] __fput+0x28c/0x780 fs/file_table.c:208
[<ffffffff81869135>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff813ebf63>] task_work_run+0xf3/0x170 kernel/task_work.c:116
[< inline >] exit_task_work ./include/linux/task_work.h:21
[<ffffffff81394218>] do_exit+0x868/0x2e70 kernel/exit.c:828
[<ffffffff81396998>] do_group_exit+0x108/0x330 kernel/exit.c:958
[< inline >] SYSC_exit_group kernel/exit.c:969
[<ffffffff81396bdd>] SyS_exit_group+0x1d/0x20 kernel/exit.c:967
[<ffffffff86e10700>] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:208
[<ffffffffffffffff>] 0xffffffffffffffff

One process holds the srcu lock:

stress D ffffffff86dfe09a 28024 4470 1 0x00000004
ffff880064d9c440 0000000000000000 ffff8800686ab0c0 ffff88006cecc080
ffff88006d322bd8 ffff8800653f7630 ffffffff86dfe09a ffff8800653f76e0
0000000000000282 ffff88006d323568 ffff88006d323540 ffff880064d9c448
Call Trace:
[<ffffffff86dff497>] schedule+0x97/0x1c0 kernel/sched/core.c:3414
[< inline >] fanotify_get_response fs/notify/fanotify/fanotify.c:70
[<ffffffff8194a347>] fanotify_handle_event+0x537/0x830
fs/notify/fanotify/fanotify.c:233
[< inline >] send_to_group fs/notify/fsnotify.c:179
[<ffffffff8193fc8d>] fsnotify+0x73d/0x1020 fs/notify/fsnotify.c:275
[< inline >] fsnotify_perm ./include/linux/fsnotify.h:55
[<ffffffff82acefb1>] security_file_open+0x151/0x190 security/security.c:887
[<ffffffff8185a8db>] do_dentry_open+0x2ab/0xd30 fs/open.c:736
[<ffffffff8185e445>] vfs_open+0x105/0x220 fs/open.c:860
[< inline >] do_last fs/namei.c:3374
[<ffffffff81895909>] path_openat+0x12f9/0x2ab0 fs/namei.c:3497
[<ffffffff8189a7ac>] do_filp_open+0x18c/0x250 fs/namei.c:3532
[<ffffffff81873528>] do_open_execat+0xe8/0x4d0 fs/exec.c:818
[<ffffffff8187a18f>] do_execveat_common.isra.35+0x71f/0x1d80 fs/exec.c:1679
[< inline >] do_execve fs/exec.c:1783
[< inline >] SYSC_execve fs/exec.c:1864
[<ffffffff8187c262>] SyS_execve+0x42/0x50 fs/exec.c:1859
[<ffffffff810088ff>] do_syscall_64+0x1df/0x640 arch/x86/entry/common.c:288
[<ffffffff86e107c3>] entry_SYSCALL64_slow_path+0x25/0x25
arch/x86/entry/entry_64.S:249


[ 467.548485] Showing all locks held in the system:
[ 467.548981] 2 locks held by bash/4044:
[ 467.549313] #0: (&tty->ldisc_sem){.+.+.+}, at:
[<ffffffff86e0f7f7>] ldsem_down_read+0x37/0x40
[ 467.550076] #1: (&ldata->atomic_read_lock){+.+...}, at:
[<ffffffff8322ec95>] n_tty_read+0x1e5/0x1860
[ 467.550897] 3 locks held by bash/4062:
[ 467.551210] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff81863c04>]
vfs_write+0x3a4/0x4e0
[ 467.551923] #1: (rcu_read_lock){......}, at: [<ffffffff83245c30>]
__handle_sysrq+0x0/0x4d0
[ 467.552655] #2: (tasklist_lock){.+.+..}, at: [<ffffffff81496fc4>]
debug_show_all_locks+0x74/0x290
[ 467.553454] 2 locks held by stress/4470:
[ 467.553754] #0: (&sig->cred_guard_mutex){+.+.+.}, at:
[<ffffffff818799b3>] prepare_bprm_creds+0x53/0x110
[ 467.554590] #1: (&fsnotify_mark_srcu){......}, at:
[<ffffffff8193f71d>] fsnotify+0x1cd/0x1020

On 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next.

Peter Zijlstra

unread,
Sep 1, 2016, 2:04:27 PM9/1/16
to Dmitry Vyukov, epa...@redhat.com, LKML, Ingo Molnar, syzkaller, Kostya Serebryany, Alexander Potapenko
On Thu, Sep 01, 2016 at 07:53:44PM +0200, Dmitry Vyukov wrote:
> Hello,
>
> The following program:
>
> https://gist.githubusercontent.com/dvyukov/0952eeac71069b46b3fe0e28bd1a02bf/raw/396b9dcce2636cecab1a4161c15d3f066e6ef639/gistfile1.txt
>
> if run in a parallel loop creates unkillable hanged processes:
>
> -bash-4.3# ps afxu | grep a.out
> root 4489 0.0 0.0 8868 340 ttyp1 S+ 17:19 0:00
> \_ grep a.out
> root 4423 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4424 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4425 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4470 0.0 0.1 7016 2316 ttyp0 D 17:16 0:00
> ./stress ./a.out
>
> This looks like a classical deadlock, but LOCKDEP is silent so
> +LOCKDEP maintainers.

fanotify_get_response() uses wait_event(), that's an asymmetric API and
therefore one cannot include it in block-on chains. That is, while it
blocks, it doesn't know who it blocks on etc..
So not quite enough information, what is bash/4044's stack trace?
Because that is holding the fanotify srcu reference.

But it looks like a scenario where everyone is waiting for SRCU to
complete, while the task holding up SRCU completion is waiting for
something else.

But I'm not at all familiar with fanotify.

Dmitry Vyukov

unread,
Sep 1, 2016, 2:15:39 PM9/1/16
to syzkaller, epa...@redhat.com, LKML, Ingo Molnar, Kostya Serebryany, Alexander Potapenko
On Thu, Sep 1, 2016 at 8:04 PM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Thu, Sep 01, 2016 at 07:53:44PM +0200, Dmitry Vyukov wrote:
>> Hello,
>>
>> The following program:
>>
>> https://gist.githubusercontent.com/dvyukov/0952eeac71069b46b3fe0e28bd1a02bf/raw/396b9dcce2636cecab1a4161c15d3f066e6ef639/gistfile1.txt
>>
>> if run in a parallel loop creates unkillable hanged processes:
>>
>> -bash-4.3# ps afxu | grep a.out
>> root 4489 0.0 0.0 8868 340 ttyp1 S+ 17:19 0:00
>> \_ grep a.out
>> root 4423 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
>> root 4424 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
>> root 4425 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
>> root 4470 0.0 0.1 7016 2316 ttyp0 D 17:16 0:00
>> ./stress ./a.out
>>
>> This looks like a classical deadlock, but LOCKDEP is silent so
>> +LOCKDEP maintainers.
>
> fanotify_get_response() uses wait_event(), that's an asymmetric API and
> therefore one cannot include it in block-on chains. That is, while it
> blocks, it doesn't know who it blocks on etc..


Ah, I see. So that's deadlock on semaphores, that's not easily detectable.
4044 just holds a normal rcu_read_lock lock during sysrq processing.
I've sent the sysrq after the deadlock happened.
The bad guy is 4470, he holds fsnotify_mark_srcu.
Reply all
Reply to author
Forward
0 new messages