[syzbot] possible deadlock in register_for_each_vma

10 views
Skip to first unread message

syzbot

unread,
Mar 26, 2021, 6:29:20 AM3/26/21
to ac...@kernel.org, alexander...@linux.intel.com, jo...@redhat.com, linux-...@vger.kernel.org, mark.r...@arm.com, mi...@redhat.com, namh...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 0d02ec6b Linux 5.12-rc4
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1719e4aad00000
kernel config: https://syzkaller.appspot.com/x/.config?x=5adab0bdee099d7a
dashboard link: https://syzkaller.appspot.com/bug?extid=b804f902bbb6bcf290cb

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b804f9...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.3/23522 is trying to acquire lock:
ffffffff8c03e530 (dup_mmap_sem){++++}-{0:0}, at: register_for_each_vma+0x2c/0xc10 kernel/events/uprobes.c:1040

but task is already holding lock:
ffff8880624a8c90 (&uprobe->register_rwsem){+.+.}-{3:3}, at: __uprobe_register+0x531/0x850 kernel/events/uprobes.c:1177

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&uprobe->register_rwsem){+.+.}-{3:3}:
down_write+0x92/0x150 kernel/locking/rwsem.c:1406
__uprobe_register+0x531/0x850 kernel/events/uprobes.c:1177
trace_uprobe_enable kernel/trace/trace_uprobe.c:1065 [inline]
probe_event_enable+0x357/0xa00 kernel/trace/trace_uprobe.c:1134
trace_uprobe_register+0x443/0x880 kernel/trace/trace_uprobe.c:1461
perf_trace_event_reg kernel/trace/trace_event_perf.c:129 [inline]
perf_trace_event_init+0x549/0xa20 kernel/trace/trace_event_perf.c:204
perf_uprobe_init+0x16f/0x210 kernel/trace/trace_event_perf.c:336
perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9754
perf_try_init_event+0x12a/0x560 kernel/events/core.c:11071
perf_init_event kernel/events/core.c:11123 [inline]
perf_event_alloc.part.0+0xe3b/0x3960 kernel/events/core.c:11403
perf_event_alloc kernel/events/core.c:11785 [inline]
__do_sys_perf_event_open+0x647/0x2e60 kernel/events/core.c:11883
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (event_mutex){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:949 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
perf_trace_destroy+0x23/0xf0 kernel/trace/trace_event_perf.c:241
_free_event+0x2ee/0x1380 kernel/events/core.c:4863
put_event kernel/events/core.c:4957 [inline]
perf_mmap_close+0x572/0xe10 kernel/events/core.c:6002
remove_vma+0xae/0x170 mm/mmap.c:180
remove_vma_list mm/mmap.c:2653 [inline]
__do_munmap+0x74f/0x11a0 mm/mmap.c:2909
do_munmap mm/mmap.c:2917 [inline]
munmap_vma_range mm/mmap.c:598 [inline]
mmap_region+0x85a/0x1730 mm/mmap.c:1750
do_mmap+0xcff/0x11d0 mm/mmap.c:1581
vm_mmap_pgoff+0x1b7/0x290 mm/util.c:519
ksys_mmap_pgoff+0x49c/0x620 mm/mmap.c:1632
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (&mm->mmap_lock#2){++++}-{3:3}:
down_write_killable+0x95/0x170 kernel/locking/rwsem.c:1417
mmap_write_lock_killable include/linux/mmap_lock.h:87 [inline]
dup_mmap kernel/fork.c:480 [inline]
dup_mm+0x12e/0x1380 kernel/fork.c:1368
copy_mm kernel/fork.c:1424 [inline]
copy_process+0x2b99/0x7150 kernel/fork.c:2107
kernel_clone+0xe7/0xab0 kernel/fork.c:2500
__do_sys_clone+0xc8/0x110 kernel/fork.c:2617
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (dup_mmap_sem){++++}-{0:0}:
check_prev_add kernel/locking/lockdep.c:2936 [inline]
check_prevs_add kernel/locking/lockdep.c:3059 [inline]
validate_chain kernel/locking/lockdep.c:3674 [inline]
__lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
percpu_down_write+0x95/0x440 kernel/locking/percpu-rwsem.c:217
register_for_each_vma+0x2c/0xc10 kernel/events/uprobes.c:1040
__uprobe_register+0x5c2/0x850 kernel/events/uprobes.c:1181
trace_uprobe_enable kernel/trace/trace_uprobe.c:1065 [inline]
probe_event_enable+0x357/0xa00 kernel/trace/trace_uprobe.c:1134
trace_uprobe_register+0x443/0x880 kernel/trace/trace_uprobe.c:1461
perf_trace_event_reg kernel/trace/trace_event_perf.c:129 [inline]
perf_trace_event_init+0x549/0xa20 kernel/trace/trace_event_perf.c:204
perf_uprobe_init+0x16f/0x210 kernel/trace/trace_event_perf.c:336
perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9754
perf_try_init_event+0x12a/0x560 kernel/events/core.c:11071
perf_init_event kernel/events/core.c:11123 [inline]
perf_event_alloc.part.0+0xe3b/0x3960 kernel/events/core.c:11403
perf_event_alloc kernel/events/core.c:11785 [inline]
__do_sys_perf_event_open+0x647/0x2e60 kernel/events/core.c:11883
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
dup_mmap_sem --> event_mutex --> &uprobe->register_rwsem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&uprobe->register_rwsem);
lock(event_mutex);
lock(&uprobe->register_rwsem);
lock(dup_mmap_sem);

*** DEADLOCK ***

3 locks held by syz-executor.3/23522:
#0: ffffffff8fe4fcd8 (&pmus_srcu){....}-{0:0}, at: perf_event_alloc.part.0+0xc8e/0x3960 kernel/events/core.c:11401
#1: ffffffff8bfe5688 (event_mutex){+.+.}-{3:3}, at: perf_uprobe_init+0x164/0x210 kernel/trace/trace_event_perf.c:335
#2: ffff8880624a8c90 (&uprobe->register_rwsem){+.+.}-{3:3}, at: __uprobe_register+0x531/0x850 kernel/events/uprobes.c:1177

stack backtrace:
CPU: 0 PID: 23522 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2127
check_prev_add kernel/locking/lockdep.c:2936 [inline]
check_prevs_add kernel/locking/lockdep.c:3059 [inline]
validate_chain kernel/locking/lockdep.c:3674 [inline]
__lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
percpu_down_write+0x95/0x440 kernel/locking/percpu-rwsem.c:217
register_for_each_vma+0x2c/0xc10 kernel/events/uprobes.c:1040
__uprobe_register+0x5c2/0x850 kernel/events/uprobes.c:1181
trace_uprobe_enable kernel/trace/trace_uprobe.c:1065 [inline]
probe_event_enable+0x357/0xa00 kernel/trace/trace_uprobe.c:1134
trace_uprobe_register+0x443/0x880 kernel/trace/trace_uprobe.c:1461
perf_trace_event_reg kernel/trace/trace_event_perf.c:129 [inline]
perf_trace_event_init+0x549/0xa20 kernel/trace/trace_event_perf.c:204
perf_uprobe_init+0x16f/0x210 kernel/trace/trace_event_perf.c:336
perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9754
perf_try_init_event+0x12a/0x560 kernel/events/core.c:11071
perf_init_event kernel/events/core.c:11123 [inline]
perf_event_alloc.part.0+0xe3b/0x3960 kernel/events/core.c:11403
perf_event_alloc kernel/events/core.c:11785 [inline]
__do_sys_perf_event_open+0x647/0x2e60 kernel/events/core.c:11883
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x466459
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f30c08b8188 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000466459
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000180
RBP: 00000000004bf9fb R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffffffffff R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffedcea592f R14: 00007f30c08b8300 R15: 0000000000022000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Hillf Danton

unread,
Mar 27, 2021, 12:22:10 AM3/27/21
to syzbot, Srikar Dronamraju, pet...@infradead.org, Oleg Nesterov, Greg KH, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, 26 Mar 2021 03:29:19
Add a flag and check it to avoid double registering of uprobe.

--- x/kernel/events/uprobes.c
+++ y/kernel/events/uprobes.c
@@ -51,6 +51,7 @@ DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem)

/* Have a copy of original instruction */
#define UPROBE_COPY_INSN 0
+#define UPROBE_REGISTERING 1

struct uprobe {
struct rb_node rb_node; /* node in the rb tree */
@@ -1170,6 +1171,9 @@ static int __uprobe_register(struct inod
if (IS_ERR(uprobe))
return PTR_ERR(uprobe);

+ /* no point to register twice at the cost of deadlock */
+ if (test_and_set_bit(UPROBE_REGISTERING, &uprobe->flags))
+ return 0;
/*
* We can race with uprobe_unregister()->delete_uprobe().
* Check uprobe_is_active() and retry if it is false.
@@ -1183,6 +1187,7 @@ static int __uprobe_register(struct inod
__uprobe_unregister(uprobe, uc);
}
up_write(&uprobe->register_rwsem);
+ clear_bit(UPROBE_REGISTERING, &uprobe->flags);
put_uprobe(uprobe);

if (unlikely(ret == -EAGAIN))

Oleg Nesterov

unread,
Mar 27, 2021, 1:53:18 PM3/27/21
to Hillf Danton, syzbot, Srikar Dronamraju, pet...@infradead.org, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hi Hillf,

it seems that you already understand the problem ;) I don't.

Could you explain in details how double __register is possible ? and how
it connects to this lockdep report?

Hillf Danton

unread,
Mar 27, 2021, 10:04:20 PM3/27/21
to syzbot, Srikar Dronamraju, pet...@infradead.org, Oleg Nesterov, Greg KH, Dan Carpenter, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, 26 Mar 2021 03:29:19
Add a flag and check it to detect double register - registering A embeds
inside A as syzbot reported.

Add a mark and check it to see if registering A embeds inside B.

--- x/kernel/events/uprobes.c
+++ y/kernel/events/uprobes.c
@@ -51,6 +51,7 @@ DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem)

/* Have a copy of original instruction */
#define UPROBE_COPY_INSN 0
+#define UPROBE_REGISTERING 1

struct uprobe {
struct rb_node rb_node; /* node in the rb tree */
@@ -1033,11 +1034,17 @@ build_map_info(struct address_space *map
static int
register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
{
+ static struct task_struct *holder;
bool is_register = !!new;
struct map_info *info;
int err = 0;

+ /* bail out if registering A embeds inside B */
+ if (WARN_ON_ONCE(holder == current))
+ return -EDEADLK;
+
percpu_down_write(&dup_mmap_sem);
+ holder = current;
info = build_map_info(uprobe->inode->i_mapping,
uprobe->offset, is_register);
if (IS_ERR(info)) {
@@ -1080,6 +1087,7 @@ register_for_each_vma(struct uprobe *upr
info = free_map_info(info);
}
out:
+ holder = NULL;
percpu_up_write(&dup_mmap_sem);
return err;
}
@@ -1170,6 +1178,12 @@ static int __uprobe_register(struct inod
if (IS_ERR(uprobe))
return PTR_ERR(uprobe);

+ /* bail out if registering A embeds inside A */
+ if (test_and_set_bit(UPROBE_REGISTERING, &uprobe->flags)) {
+ put_uprobe(uprobe);
+ return -EDEADLK;
+ }
+
/*
* We can race with uprobe_unregister()->delete_uprobe().
* Check uprobe_is_active() and retry if it is false.
@@ -1183,6 +1197,7 @@ static int __uprobe_register(struct inod

Hillf Danton

unread,
Mar 27, 2021, 10:52:40 PM3/27/21
to Oleg Nesterov, Hillf Danton, syzbot, Srikar Dronamraju, pet...@infradead.org, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 27 Mar 2021 18:53:08 Oleg Nesterov wrote:
>Hi Hillf,
>
>it seems that you already understand the problem ;) I don't.

It is simpler than you thought - I always blindly believe what syzbot
reported is true before it turns out false as I am not smarter than it.
Feel free to laugh loud.
>
>Could you explain in details how double __register is possible ? and how

Taking another look at the report over five minutes may help more?

>it connects to this lockdep report?

Feel free to show the report is false and ignore my noise.

Hillf Danton

unread,
Mar 29, 2021, 1:08:15 AM3/29/21
to syzbot, Srikar Dronamraju, Oleg Nesterov, pet...@infradead.org, Greg KH, Dan Carpenter, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, 26 Mar 2021 03:29:19
In bid to break the lockdep &mm->mmap_lock --> event_mutex, ask kworker to
free perf_event.

--- x/include/linux/perf_event.h
+++ y/include/linux/perf_event.h
@@ -735,6 +735,7 @@ struct perf_event {

void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;
+ struct list_head exit_item;

struct pid_namespace *ns;
u64 id;
--- x/kernel/events/core.c
+++ y/kernel/events/core.c
@@ -4949,12 +4949,45 @@ static void perf_remove_from_owner(struc
}
}

+static LIST_HEAD(pe_exit_list);
+static DEFINE_SPINLOCK(pe_exit_lock);
+
+static void pe_exit_work_fn(struct work_struct *unused)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pe_exit_lock, flags);
+
+ while (!list_empty(&pe_exit_list)) {
+ struct perf_event *e;
+
+ e = list_first_entry(&pe_exit_list, struct perf_event,
+ exit_item);
+ list_del(&e->exit_item);
+ spin_unlock_irqrestore(&pe_exit_lock, flags);
+ _free_event(e);
+ cond_resched();
+ spin_lock_irqsave(&pe_exit_lock, flags);
+ }
+ spin_unlock_irqrestore(&pe_exit_lock, flags);
+}
+static DECLARE_WORK(pe_exit_work, pe_exit_work_fn);
+
static void put_event(struct perf_event *event)
{
+ unsigned long flags;
+ bool empty;
+
if (!atomic_long_dec_and_test(&event->refcount))
return;

- _free_event(event);
+ spin_lock_irqsave(&pe_exit_lock, flags);
+ empty = list_empty(&pe_exit_list);
+ list_add(&event->exit_item, &pe_exit_list);
+ spin_unlock_irqrestore(&pe_exit_lock, flags);
+
+ if (empty)
+ queue_work(system_unbound_wq, &pe_exit_work);
}

/*

Oleg Nesterov

unread,
Mar 31, 2021, 12:59:31 PM3/31/21
to Hillf Danton, Namhyung Kim, Song Liu, Peter Zijlstra, syzbot, Srikar Dronamraju, pet...@infradead.org, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 03/28, Hillf Danton wrote:
>
> On Sat, 27 Mar 2021 18:53:08 Oleg Nesterov wrote:
> >Hi Hillf,
> >
> >it seems that you already understand the problem ;) I don't.
>
> It is simpler than you thought - I always blindly believe what syzbot
> reported is true before it turns out false as I am not smarter than it.
> Feel free to laugh loud.

I am not going to laugh. I too think that lockdep is more clever than me.

> >Could you explain in details how double __register is possible ? and how
>
> Taking another look at the report over five minutes may help more?

No. I spent much, much more time time and I still can't understand your
patch which adds UPROBE_REGISTERING. Quite possibly your patch is fine,
just I am not smart enough.

And I am a bit surprised you refused to help me.

> >it connects to this lockdep report?
>
> Feel free to show the report is false and ignore my noise.

Well, this particular report looks correct but false-positive to me,
_free_event() is not possible, but I can be easily wrong and we need
to shut up lockdep anyway...


-------------------------------------------------------------------------------
Add more CC's. So, we have the following trace

-> #0 (dup_mmap_sem){++++}-{0:0}:
check_prev_add kernel/locking/lockdep.c:2936 [inline]
check_prevs_add kernel/locking/lockdep.c:3059 [inline]
validate_chain kernel/locking/lockdep.c:3674 [inline]
__lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
percpu_down_write+0x95/0x440 kernel/locking/percpu-rwsem.c:217
register_for_each_vma+0x2c/0xc10 kernel/events/uprobes.c:1040
__uprobe_register+0x5c2/0x850 kernel/events/uprobes.c:1181
trace_uprobe_enable kernel/trace/trace_uprobe.c:1065 [inline]
probe_event_enable+0x357/0xa00 kernel/trace/trace_uprobe.c:1134
trace_uprobe_register+0x443/0x880 kernel/trace/trace_uprobe.c:1461
perf_trace_event_reg kernel/trace/trace_event_perf.c:129 [inline]
perf_trace_event_init+0x549/0xa20 kernel/trace/trace_event_perf.c:204
perf_uprobe_init+0x16f/0x210 kernel/trace/trace_event_perf.c:336
perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9754
perf_try_init_event+0x12a/0x560 kernel/events/core.c:11071
perf_init_event kernel/events/core.c:11123 [inline]
perf_event_alloc.part.0+0xe3b/0x3960 kernel/events/core.c:11403
perf_event_alloc kernel/events/core.c:11785 [inline]
__do_sys_perf_event_open+0x647/0x2e60 kernel/events/core.c:11883
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae


which shows that this path takes

event_mutex -> uprobe.register_rwsem -> dup_mmap_sem -> mm.mmap_lock

Not good. If nothing else, perf_mmap_close() path can take event_mutex under
mm.mmap_lock, so lockdep complains correctly.

But why does perf_uprobe_init() take event_mutex? The comment mentions
uprobe_buffer_enable().

If this is the only reason, then why uprobe_buffer_enable/disable abuse
event_mutex?

IOW, can something like the stupid patch below work? (Just in case... yes
it is very suboptimal, I am just trying to understand the problem).

Song, Namhyung, Peter, what do you think?

Oleg.


--- x/kernel/trace/trace_event_perf.c
+++ x/kernel/trace/trace_event_perf.c
@@ -327,16 +327,9 @@ int perf_uprobe_init(struct perf_event *p_event,
goto out;
}

- /*
- * local trace_uprobe need to hold event_mutex to call
- * uprobe_buffer_enable() and uprobe_buffer_disable().
- * event_mutex is not required for local trace_kprobes.
- */
- mutex_lock(&event_mutex);
ret = perf_trace_event_init(tp_event, p_event);
if (ret)
destroy_local_trace_uprobe(tp_event);
- mutex_unlock(&event_mutex);
out:
kfree(path);
return ret;
--- x/kernel/trace/trace_uprobe.c
+++ x/kernel/trace/trace_uprobe.c
@@ -857,6 +857,7 @@ struct uprobe_cpu_buffer {
};
static struct uprobe_cpu_buffer __percpu *uprobe_cpu_buffer;
static int uprobe_buffer_refcnt;
+static DEFINE_MUTEX(uprobe_buffer_mutex);

static int uprobe_buffer_init(void)
{
@@ -894,13 +895,13 @@ static int uprobe_buffer_enable(void)
{
int ret = 0;

- BUG_ON(!mutex_is_locked(&event_mutex));
-
+ mutex_lock(&uprobe_buffer_mutex);
if (uprobe_buffer_refcnt++ == 0) {
ret = uprobe_buffer_init();
if (ret < 0)
uprobe_buffer_refcnt--;
}
+ mutex_unlock(&uprobe_buffer_mutex);

return ret;
}
@@ -909,8 +910,7 @@ static void uprobe_buffer_disable(void)
{
int cpu;

- BUG_ON(!mutex_is_locked(&event_mutex));
-
+ mutex_lock(&uprobe_buffer_mutex);
if (--uprobe_buffer_refcnt == 0) {
for_each_possible_cpu(cpu)
free_page((unsigned long)per_cpu_ptr(uprobe_cpu_buffer,
@@ -919,6 +919,7 @@ static void uprobe_buffer_disable(void)
free_percpu(uprobe_cpu_buffer);
uprobe_cpu_buffer = NULL;
}
+ mutex_unlock(&uprobe_buffer_mutex);
}

static struct uprobe_cpu_buffer *uprobe_buffer_get(void)

Song Liu

unread,
Mar 31, 2021, 4:18:34 PM3/31/21
to Oleg Nesterov, Hillf Danton, Namhyung Kim, Peter Zijlstra, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
I think the following patch works well. I haven't tested it though.

Thanks,
Song

Hillf Danton

unread,
Apr 1, 2021, 5:29:26 AM4/1/21
to Oleg Nesterov, Hillf Danton, Namhyung Kim, Song Liu, Peter Zijlstra, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, 31 Mar 2021 18:59:18 Oleg Nesterov wrote:
> On 03/28, Hillf Danton wrote:
> >
> > On Sat, 27 Mar 2021 18:53:08 Oleg Nesterov wrote:
> > >Hi Hillf,
> > >
> > >it seems that you already understand the problem ;) I don't.
> >
> > It is simpler than you thought - I always blindly believe what syzbot
> > reported is true before it turns out false as I am not smarter than it.
> > Feel free to laugh loud.
>
> I am not going to laugh. I too think that lockdep is more clever than me.
>
> > >Could you explain in details how double __register is possible ? and how
> >
> > Taking another look at the report over five minutes may help more?
>
> No. I spent much, much more time time and I still can't understand your
> patch which adds UPROBE_REGISTERING. Quite possibly your patch is fine,
> just I am not smart enough.
>
> And I am a bit surprised you refused to help me.

The explanation is, the UPROBE_REGISTERING approach is completely incorrect,
and I doubt it could survive your scan, so the more steps I was running down
that road the more noise I made, sigh.
If I dont misread it, the lockdep chain will likely evolve from

event_mutex -> uprobe.register_rwsem -> dup_mmap_sem -> mm.mmap_lock ->
event_mutex
to
dup_mmap_sem -> mm.mmap_lock -> dup_mmap_sem

after this patch as both uprobe_register() and uprobe_unregister() would take
dup_mmap_sem.

Oleg Nesterov

unread,
Apr 1, 2021, 6:53:41 AM4/1/21
to Hillf Danton, Namhyung Kim, Song Liu, Peter Zijlstra, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 04/01, Hillf Danton wrote:
>
> If I dont misread it, the lockdep chain will likely evolve from
>
> event_mutex -> uprobe.register_rwsem -> dup_mmap_sem -> mm.mmap_lock ->
> event_mutex
> to
> dup_mmap_sem -> mm.mmap_lock -> dup_mmap_sem
>
> after this patch as both uprobe_register() and uprobe_unregister() would take
> dup_mmap_sem.

Hmm, please correct me, but I don't think so. I think mmap_lock -> dup_mmap_sem
is not possible.

Oleg.

Hillf Danton

unread,
Apr 2, 2021, 3:46:45 AM4/2/21
to Oleg Nesterov, Namhyung Kim, Song Liu, Peter Zijlstra, syzbot, Srikar Dronamraju, Greg KH, Hillf Danton, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Given perf_trace_destroy() in the report, it is a couple of steps in the
subsequent call chain that it likely takes to reach uprobe_unregister().

perf_trace_destroy()
perf_trace_event_unreg(p_event)
tp_event->class->reg(tp_event, TRACE_REG_PERF_UNREGISTER, NULL);
trace_uprobe_register()
probe_event_disable()
__probe_event_disable()
uprobe_unregister()

Oleg Nesterov

unread,
Apr 6, 2021, 1:23:36 PM4/6/21
to Hillf Danton, Song Liu, Peter Zijlstra, Namhyung Kim, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Song, Peter, please take a look.
Well, this is not possible, perf_trace_destroy() is only used by perf_tracepoint
pmu. But you are right anyway, event.destroy == perf_uprobe_destroy can lead to
uprobe_unregister(). Thanks.

-------------------------------------------------------------------------------

So. perf_mmap_close() does put_event(event) with mm->mmap_lock held. This can
deadlock if event->destroy == perf_uprobe_destroy: perf_trace_event_close/unreg
takes dup_mmap_sem.


perf_mmap_close() was added by 9bb5d40cd93c9 ("perf: Fix mmap() accounting hole")
and this commit doesn't look right anyway, I'll write another email. However, it
seems that this particular problem was added later by 33ea4b24277b0 ("perf/core:
Implement the 'perf_uprobe' PMU").

Oleg.

Oleg Nesterov

unread,
Apr 6, 2021, 1:44:03 PM4/6/21
to Hillf Danton, Song Liu, Peter Zijlstra, Namhyung Kim, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 04/06, Oleg Nesterov wrote:
>
> perf_mmap_close() was added by 9bb5d40cd93c9 ("perf: Fix mmap() accounting hole")

I meant perf_mmap_close() -> put_event()

> and this commit doesn't look right anyway

It seems there is another problem or I am totally confused. I do not
understand why can we use list_for_each_entry_rcu(event, rb->event_list)
if this can race with perf_event_set_output(event) which can move "event"
to another list, in this case list_for_each_entry_rcu() can loop forever.

perf_mmap_close() even mentions this race and restarts the iteration to
avoid it but I don't think this is enough,

rcu_read_lock();
list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
if (!atomic_long_inc_not_zero(&event->refcount)) {
/*
* This event is en-route to free_event() which will
* detach it and remove it from the list.
*/
continue;
}

just suppose that "this event" is moved to another list first and after
that it goes away so that atomic_long_inc_not_zero() fails; in this case
the next iteration will play with event->rb_entry.next, and this is not
necessarily "struct perf_event", it can can be "list_head event_list".

Don't we need rb->event_lock ?

Oleg.

Peter Zijlstra

unread,
Apr 7, 2021, 3:51:11 AM4/7/21
to Oleg Nesterov, Hillf Danton, Song Liu, Namhyung Kim, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
We observe an RCU GP in ring_buffer_attach(), between detach and attach,
no?

Normally, when we attach to a rb for the first time, or when we remove
it first, no GP is required and everything is fine. But when we remove
it and then attach it again to another rb, we must observe a GP because
of that list_rcu, agreed?

The cond_synchronize_rcu() in ring_buffer_attach() should capture
exactly that case.

Oleg Nesterov

unread,
Apr 7, 2021, 8:30:49 AM4/7/21
to Peter Zijlstra, Hillf Danton, Song Liu, Namhyung Kim, syzbot, Srikar Dronamraju, Greg KH, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Aaah yes, I didn't notice cond_synchronize_rcu() in ring_buffer_attach().

Thanks!

Oleg.

syzbot

unread,
Jul 20, 2021, 6:25:18 AM7/20/21
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages