[syzbot] WARNING in exit_tasks_rcu

syzbot

unread,

Jun 12, 2022, 11:39:26 AM6/12/22

to bra...@kernel.org, ebie...@xmission.com, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hello,

syzbot found the following issue on:

HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13b52c2ff00000
kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9bb26e...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
Modules linked in:
CPU: 1 PID: 28639 Comm: syz-executor.3 Not tainted 5.19.0-rc1-next-20220610-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
RIP: 0010:exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
Code: fc ff df 48 89 fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 62 0f b6 85 89 04 00 00 83 e0 01 08 d8 74 0e <0f> 0b 5b 48 89 ef 5d 41 5c e9 00 d2 ff ff 48 b8 00 00 00 00 00 fc
RSP: 0018:ffffc9000a497b58 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000201 RCX: 1ffff11004213b89
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88802109dc49
RBP: ffff88802109d7c0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88802109dc40
R13: dffffc0000000000 R14: ffff88802109d7c0 R15: ffff88802109ddd0
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f771379d090 CR3: 0000000051fd2000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
do_exit+0x18f2/0x2a00 kernel/exit.c:844
do_group_exit+0xd2/0x2f0 kernel/exit.c:925
get_signal+0x2542/0x2600 kernel/signal.c:2857
arch_do_signal_or_restart+0x82/0x2300 arch/x86/kernel/signal.c:869
exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f7713689109
Code: Unable to access opcode bytes at RIP 0x7f77136890df.
RSP: 002b:00007f77125fe218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007f771379bf68 RCX: 00007f7713689109
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f771379bf6c
RBP: 00007f771379bf60 R08: 00007fff2278d080 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f771379bf6c
R13: 00007fff22724c8f R14: 00007f77125fe300 R15: 0000000000022000
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Eric W. Biederman

unread,

Jun 12, 2022, 3:16:16 PM6/12/22

to syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, Paul E. McKenney

syzbot <syzbot+9bb26e...@syzkaller.appspotmail.com> writes:

> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13b52c2ff00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
> dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.

I don't understand what is going on in linux-next kernel/rcu/tasks.h
looks different than in Linus's tree. Paul does that mean you have
some staged rcu changes?

Eric

Paul E. McKenney

unread,

Jun 12, 2022, 4:22:06 PM6/12/22

to Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Sun, Jun 12, 2022 at 02:15:43PM -0500, Eric W. Biederman wrote:
> syzbot <syzbot+9bb26e...@syzkaller.appspotmail.com> writes:
>
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
> > git tree: linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13b52c2ff00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
>
> I don't understand what is going on in linux-next kernel/rcu/tasks.h
> looks different than in Linus's tree. Paul does that mean you have
> some staged rcu changes?

Less than 100 RCU-related patches in -rcu, so not all that bad. ;-)

But yes, this could possibly be an issue in one of those patches.

> Eric
>
>
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+9bb26e...@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
> > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006

The usual way for this warning to trigger is for these a task to exit
while in an RCU Tasks Trace read-side critical section:

rcu_read_lock_trace();
do_something_that_causes_task_exit();

> > Modules linked in:
> > CPU: 1 PID: 28639 Comm: syz-executor.3 Not tainted 5.19.0-rc1-next-20220610-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
> > RIP: 0010:exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
> > Code: fc ff df 48 89 fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 62 0f b6 85 89 04 00 00 83 e0 01 08 d8 74 0e <0f> 0b 5b 48 89 ef 5d 41 5c e9 00 d2 ff ff 48 b8 00 00 00 00 00 fc
> > RSP: 0018:ffffc9000a497b58 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000201 RCX: 1ffff11004213b89
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88802109dc49
> > RBP: ffff88802109d7c0 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000001 R12: ffff88802109dc40
> > R13: dffffc0000000000 R14: ffff88802109d7c0 R15: ffff88802109ddd0
> > FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f771379d090 CR3: 0000000051fd2000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > do_exit+0x18f2/0x2a00 kernel/exit.c:844
> > do_group_exit+0xd2/0x2f0 kernel/exit.c:925
> > get_signal+0x2542/0x2600 kernel/signal.c:2857

And in this case, the do_something_that_causes_task_exit() might
be reception of a fatal signal...

> > arch_do_signal_or_restart+0x82/0x2300 arch/x86/kernel/signal.c:869
> > exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
> > exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:201
> > __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
> > syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294

...but it is also impolite to exit to usermode while in an RCU Tasks
read-side critical section.

It looks like this kernel was built with CONFIG_PROVE_LOCKING=y,
which should have complained if someone was exiting to userspace
while in an RCU Tasks Trace read-side critical section.

So, was there tracing of any kind in use during this test?

Thanx, Paul

syzbot

unread,

Jun 12, 2022, 8:16:21 PM6/12/22

to bra...@kernel.org, ebie...@xmission.com, kees...@chromium.org, linux-...@vger.kernel.org, pau...@kernel.org, syzkall...@googlegroups.com

syzbot has found a reproducer for the following issue on:

HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
git tree: linux-next

console+strace: https://syzkaller.appspot.com/x/log.txt?x=106611bff00000

kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=177b6230080000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148207bff00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9bb26e...@syzkaller.appspotmail.com

------------[ cut here ]------------

WARNING: CPU: 1 PID: 4198 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
WARNING: CPU: 1 PID: 4198 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
Modules linked in:
CPU: 1 PID: 4198 Comm: syz-executor403 Not tainted 5.19.0-rc1-next-20220610-syzkaller #0

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
RIP: 0010:exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
Code: fc ff df 48 89 fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 62 0f b6 85 89 04 00 00 83 e0 01 08 d8 74 0e <0f> 0b 5b 48 89 ef 5d 41 5c e9 00 d2 ff ff 48 b8 00 00 00 00 00 fc

RSP: 0018:ffffc90003a2fb58 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000201 RCX: 1ffff1100ebf1091
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888075f88489
RBP: ffff888075f88000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff888075f88480
R13: dffffc0000000000 R14: ffff888075f88000 R15: ffffc90003a2fbe8

FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 0000555556d916d0 CR3: 000000000ba8e000 CR4: 00000000003506e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
do_exit+0x18f2/0x2a00 kernel/exit.c:844
do_group_exit+0xd2/0x2f0 kernel/exit.c:925
get_signal+0x2542/0x2600 kernel/signal.c:2857
arch_do_signal_or_restart+0x82/0x2300 arch/x86/kernel/signal.c:869
exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x46/0xb0

RIP: 0033:0x7fb21689cc79
Code: Unable to access opcode bytes at RIP 0x7fb21689cc4f.
RSP: 002b:00007fb21684d218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007fb216924408 RCX: 00007fb21689cc79
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fb216924408
RBP: 00007fb216924400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb21692440c
R13: 00007ffe134b471f R14: 00007fb21684d300 R15: 0000000000022000
</TASK>

syzbot

unread,

Jun 13, 2022, 6:44:08 AM6/13/22

to and...@kernel.org, a...@kernel.org, b...@vger.kernel.org, bra...@kernel.org, dan...@iogearbox.net, da...@davemloft.net, ebie...@xmission.com, fred...@kernel.org, ha...@kernel.org, jiangs...@gmail.com, jo...@joelfernandes.org, john.fa...@gmail.com, jo...@joshtriplett.org, ka...@fb.com, kees...@chromium.org, kps...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, mathieu....@efficios.com, net...@vger.kernel.org, pab...@redhat.com, pau...@kernel.org, quic_n...@quicinc.com, r...@vger.kernel.org, ros...@goodmis.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com

syzbot has bisected this issue to:

commit 09f110d4a1597185a5ed177da8573eec997b7227
Author: Paul E. McKenney <pau...@kernel.org>
Date: Tue May 17 18:30:32 2022 +0000

rcu-tasks: Track blocked RCU Tasks Trace readers

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=170f6ee7f00000
start commit: 6d0c80680317 Add linux-next specific files for 20220610
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=148f6ee7f00000
console output: https://syzkaller.appspot.com/x/log.txt?x=108f6ee7f00000

kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=177b6230080000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148207bff00000

Reported-by: syzbot+9bb26e...@syzkaller.appspotmail.com
Fixes: 09f110d4a159 ("rcu-tasks: Track blocked RCU Tasks Trace readers")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Zhang, Qiang1

unread,

Jun 13, 2022, 9:55:40 AM6/13/22

to pau...@kernel.org, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hi Paul, wether the following scenarios be considered

rcu_read_unlock_trace_special
->if (trs.b.blocked)
->raw_spin_lock_irqsave_rcu_node
->list_del_init(&t->trc_blkd_node)
->WRITE_ONCE(t->trc_reader_special.b.blocked, false)
->raw_spin_unlock_irqrestore_rcu_node
->Inerrrupt
->schedule
->rcu_note_context_switch
->rcu_tasks_trace_qs
If (___rttq_nesting && !READ_ONCE((t)->trc_reader_special.b.blocked)
/*___rttq_nesting ==1 && (t)->trc_reader_special.b.blocked =false*/
rcu_tasks_trace_qs_blkd(t)
->WRITE_ONCE(t->trc_reader_nesting, 0)
.......
-> exit_tasks_rcu_finish

Whether the following patch can fix it, or what am I missing?
Any thoughts?

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index f1209ce621c5..c607e4c914d3 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1247,6 +1247,7 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
struct rcu_tasks_percpu *rtpcp;
union rcu_special trs;

+ WRITE_ONCE(t->trc_reader_nesting, 0);
// Open-coded full-word version of rcu_ld_need_qs().
smp_mb(); // Enforce full grace-period ordering.
trs = smp_load_acquire(&t->trc_reader_special);
@@ -1267,7 +1268,6 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
WRITE_ONCE(t->trc_reader_special.b.blocked, false);
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
}
- WRITE_ONCE(t->trc_reader_nesting, 0);
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);

Thanks
Zqiang

Paul E. McKenney

unread,

Jun 13, 2022, 11:23:39 AM6/13/22

to Zhang, Qiang1, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Thank you for looking into this!

You do have what I believe to be the correct failure scenario, but the
above fix would break nested RCU Tasks Trace read-side critical sections.

But would you be willing to try out the patch shown below?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 08059d8d4f5a7..937a58b3266bf 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -184,7 +184,7 @@ void rcu_tasks_trace_qs_blkd(struct task_struct *t);
if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
likely(!___rttq_nesting)) { \
rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
- } else if (___rttq_nesting && \
+ } else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \
!READ_ONCE((t)->trc_reader_special.b.blocked)) { \
rcu_tasks_trace_qs_blkd(t); \
} \
diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index 6f9c358173989..9bc8cbb33340b 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void)
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
barrier(); // Critical section before disabling.
// Disable IPI-based setting of .need_qs.
- WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
+ WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting);
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.

syzbot

unread,

Jun 13, 2022, 1:36:08 PM6/13/22

to kees...@chromium.org, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+9bb26e...@syzkaller.appspotmail.com

Tested on:

commit: 6d0c8068 Add linux-next specific files for 20220610
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git next-20220610
console output: https://syzkaller.appspot.com/x/log.txt?x=12d068eff00000

kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

patch: https://syzkaller.appspot.com/x/patch.diff?x=147f47c8080000

Note: testing is done by a robot and is best-effort only.

Zhang, Qiang1

unread,

Jun 13, 2022, 6:26:53 PM6/13/22

to pau...@kernel.org, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hi Paul

Break nested RCU Tasks Trace read-side critical sections?
Does it mean the following?

rcu_read_unlock_trace
-> WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
/* t->trc_reader_special.s == 0*/
-> if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting)
-> Interrupt

-> schedule
-> rcu_note_context_switch
-> rcu_tasks_trace_qs

/*___rttq_nesting == INT_MIN && (t)->trc_reader_special.b.blocked == false*/
->rcu_tasks_trace_qs_blkd(t)
/*nesting == 0*/
-> WRITE_ONCE(t->trc_reader_nesting, nesting);
-> return;
.........
exit_tasks_rcu_finish
trigger Warnings

Or where am I misunderstanding?

Thanks
Zqiang

>
>But would you be willing to try out the patch shown below?

I will try to test it.

Paul E. McKenney

unread,

Jun 13, 2022, 7:37:57 PM6/13/22

to Zhang, Qiang1, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

I suspect that you do in fact understand it. Let's walk through the
failure scenario again and see.

If that READ_ONCE(t->trc_reader_special.s) return zero as you suggest,
and then the interrupt and schedule happen as you suggest, then
rcu_tasks_trace_qs_blkd() will see a non-zero ->trc_reader_nesting and
a zero .b.blocked. It queues the task, but rcu_read_unlock_trace()
won't check again. It will set ->trc_reader_nesting to zero and
continue. As you noted, if the task exits in that state, then
exit_tasks_rcu_finish_trace() will trigger its WARN_ON_ONCE().

Your change won't affect this because rcu_read_unlock_trace_special()
never gets called.

Hence the approach in the patch below.

Do you see any failure modes given the below patch?

> Thanks
> Zqiang
>
> >
> >But would you be willing to try out the patch shown below?
>
> I will try to test it.

Thank you very much!

Kees Cook

unread,

Jun 13, 2022, 7:42:09 PM6/13/22

to Paul E. McKenney, Zhang, Qiang1, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Mon, Jun 13, 2022 at 10:36:07AM -0700, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-and-tested-by: syzbot+9bb26e...@syzkaller.appspotmail.com

Looks like the patch solved the reproducer. :)

> Tested on:
>
> commit: 6d0c8068 Add linux-next specific files for 20220610
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git next-20220610
> console output: https://syzkaller.appspot.com/x/log.txt?x=12d068eff00000

> kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
> dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

> patch: https://syzkaller.appspot.com/x/patch.diff?x=147f47c8080000
>
> Note: testing is done by a robot and is best-effort only.

You can see the dashboard link above for testing details...

--
Kees Cook

Zhang, Qiang1

unread,

Jun 13, 2022, 7:54:50 PM6/13/22

to pau...@kernel.org, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

I understand the problem you describe, Just like the call flow I described above,
rcu_read_unlock_trace_special() may not be called.

Thanks
Zqiang

syzbot

unread,

Jun 13, 2022, 8:00:11 PM6/13/22

to pau...@kernel.org, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+9bb26e...@syzkaller.appspotmail.com

Tested on:

commit: 6d0c8068 Add linux-next specific files for 20220610
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git next-20220610

console output: https://syzkaller.appspot.com/x/log.txt?x=134c4d30080000

kernel config: https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

patch: https://syzkaller.appspot.com/x/patch.diff?x=153b22abf00000

Zhang, Qiang1

unread,

Jun 13, 2022, 8:02:07 PM6/13/22

to Zhang, Qiang1, pau...@kernel.org, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Directly return.

> .........
> exit_tasks_rcu_finish
> trigger Warnings
>

My change ignores this, so I described the above scenario, the rcu_read_unlock_trace_special() is not be called.

Paul E. McKenney

unread,

Jun 13, 2022, 8:03:33 PM6/13/22

to Kees Cook, Zhang, Qiang1, Eric W. Biederman, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Mon, Jun 13, 2022 at 04:42:07PM -0700, Kees Cook wrote:
> On Mon, Jun 13, 2022 at 10:36:07AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger any issue:
> >
> > Reported-and-tested-by: syzbot+9bb26e...@syzkaller.appspotmail.com
>
> Looks like the patch solved the reproducer. :)

For one of the two possible ways to make the bug happen. ;-)

Still, good on Qiang for identifying one of the paths! Not just anyone
can do that.

Thanx, Paul

Paul E. McKenney

unread,

Jun 13, 2022, 8:04:18 PM6/13/22

to Zhang, Qiang1, Eric W. Biederman, syzbot, bra...@kernel.org, kees...@chromium.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Very good!

Thanx, Paul

Zhang, Qiang1

unread,

Jun 13, 2022, 8:52:29 PM6/13/22

to pau...@kernel.org, Kees Cook, Eric W. Biederman, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Mon, Jun 13, 2022 at 04:42:07PM -0700, Kees Cook wrote:
> On Mon, Jun 13, 2022 at 10:36:07AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger any issue:
> >
> > Reported-and-tested-by:
> > syzbot+9bb26e...@syzkaller.appspotmail.com
>
> Looks like the patch solved the reproducer. :)

>For one of the two possible ways to make the bug happen. ;-)
>
>Still, good on Qiang for identifying one of the paths! Not just anyone can do that.

Thanks Paul, yes this patch can solved the reproducer and I also test it.

I compiled "bpf_test" according to the c file in the following link

C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148207bff00000

Not apply this patch, this calltrace reproduce

root@qemux86-64:/# ./bpf_test
^C
root@qemux86-64:/# dmesg
[ 147.017232] ------------[ cut here ]------------
[ 147.017234] WARNING: CPU: 3 PID: 21586 at kernel/rcu/tasks.h:1664 exit_tasks_rcu_finish+0xbf/0x100
[ 147.017245] Modules linked in:
[ 147.017249] CPU: 3 PID: 21586 Comm: bpf_test Not tainted 5.19.0-rc1-next-20220610-yoctodev-standard #204
[ 147.017253] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 147.017256] RIP: 0010:exit_tasks_rcu_finish+0xbf/0x100
[ 147.017261] Code: 03 00 00 85 c0 75 59 f0 83 44 24 fc 00 49 8d bc 24 41 03 00 00 e8 21 09 2b 00 41 0f b6 84 24 41 03 00 00 83 e0 01 08 d8 74 11 <0f> 8
[ 147.017264] RSP: 0018:ffff88810ccd7af0 EFLAGS: 00010202
[ 147.017268] RAX: 0000000000000001 RBX: 0000000000000201 RCX: ffffffff897ab52f
[ 147.017271] RDX: 1ffff1102181d7c8 RSI: 0000000000000000 RDI: ffff88810c0ebe41
[ 147.017274] RBP: ffff88810ccd7b08 R08: ffffffff897ab510 R09: 0000000000000000
[ 147.017277] R10: ffffffff8c407bbf R11: 0000000000000000 R12: ffff88810c0ebb00
[ 147.017279] R13: ffff88810c0ebe38 R14: ffff88810ccd7b78 R15: ffff88810ccd76b0
[ 147.017282] FS: 0000000000000000(0000) GS:ffff88815a000000(0000) knlGS:0000000000000000
[ 147.017287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 147.017289] CR2: 0000000000000000 CR3: 000000011ac16000 CR4: 00000000001506e0
[ 147.017292] Call Trace:
[ 147.017294] <TASK>
[ 147.017297] do_exit+0xe9a/0x13a0
[ 147.017305] ? mm_update_next_owner+0x450/0x450
[ 147.017309] ? lock_downgrade+0x410/0x410
[ 147.017313] ? lock_contended+0x790/0x790
[ 147.017317] ? rcu_read_lock_held_common+0x1e/0x60
[ 147.017324] do_group_exit+0x71/0x150
[ 147.017329] get_signal+0x1319/0x1340
[ 147.017334] ? futex_wait_setup+0x180/0x180
[ 147.017341] ? exit_signals+0x4c0/0x4c0
[ 147.017345] ? futex_wake+0x29e/0x2e0
[ 147.017351] arch_do_signal_or_restart+0x9c/0xc60
[ 147.017357] ? rcu_read_lock_sched_held+0x60/0xd0
[ 147.017361] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 147.017367] ? get_sigframe_size+0x20/0x20
[ 147.017371] ? do_futex+0x1a2/0x240
[ 147.017375] ? __ia32_sys_get_robust_list+0x240/0x240
[ 147.017378] ? __context_tracking_exit+0x91/0xa0
[ 147.017385] ? lock_downgrade+0x410/0x410
[ 147.017390] ? __x64_sys_futex+0x125/0x2b0
[ 147.017393] ? rcu_read_lock_held_common+0x1e/0x60
[ 147.017399] ? lockdep_hardirqs_on_prepare+0x13/0x230
[ 147.017403] ? exit_to_user_mode_prepare+0x151/0x240
[ 147.017409] exit_to_user_mode_prepare+0x16c/0x240
[ 147.017415] syscall_exit_to_user_mode+0x1e/0x70
[ 147.017419] do_syscall_64+0x50/0x90
[ 147.017423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 147.017427] RIP: 0033:0x7f780e668fbd
[ 147.017431] Code: Unable to access opcode bytes at RIP 0x7f780e668f93.
[ 147.017433] RSP: 002b:00007f780e55bd48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 147.017437] RAX: fffffffffffffe00 RBX: 00007f780e55c640 RCX: 00007f780e668fbd
[ 147.017440] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000055ed9f3a8088
[ 147.017442] RBP: 00007f780e55bd60 R08: 0000000100000001 R09: 0000000100000001
[ 147.017445] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffeb0
[ 147.017447] R13: 0000000000000000 R14: 00007ffe5dcf20a0 R15: 00007f780e53c000
[ 147.017455] </TASK>
[ 147.017458] irq event stamp: 0
[ 147.017460] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 147.017463] hardirqs last disabled at (0): [<ffffffff896b5990>] copy_process+0xe30/0x35f0
[ 147.017467] softirqs last enabled at (0): [<ffffffff896b5990>] copy_process+0xe30/0x35f0
[ 147.017470] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 147.017473] ---[ end trace 0000000000000000 ]---

Apply this patch , this problem not reproduce.

Thanks
Zqiang

Paul E. McKenney

unread,

Jun 13, 2022, 10:42:02 PM6/13/22

to Zhang, Qiang1, Kees Cook, Eric W. Biederman, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Thank you very much! May I apply your Tested-by?

Thanx, Paul

Zhang, Qiang1

unread,

Jun 13, 2022, 10:48:01 PM6/13/22

to pau...@kernel.org, Kees Cook, Eric W. Biederman, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Thanks, I am very willing

Reply all

Reply to author

Forward

[syzbot] WARNING in exit_tasks_rcu_finish

syzbot

Eric W. Biederman

Paul E. McKenney

syzbot

syzbot

Zhang, Qiang1

Paul E. McKenney

syzbot

Zhang, Qiang1

Paul E. McKenney

Kees Cook

Zhang, Qiang1

syzbot

Zhang, Qiang1

Paul E. McKenney

Paul E. McKenney

Zhang, Qiang1

Paul E. McKenney

Zhang, Qiang1