[syzbot] upstream boot error: BUG: unable to handle kernel NULL pointer dereference in psi_task_switch

8 views
Skip to first unread message

syzbot

unread,
May 1, 2023, 12:43:05 PM5/1/23
to bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com
Hello,

syzbot found the following issue on:

HEAD commit: 89d77f71f493 Merge tag 'riscv-for-linus-6.4-mw1' of git://..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1113550c280000
kernel config: https://syzkaller.appspot.com/x/.config?x=4cc65ccad523b604
dashboard link: https://syzkaller.appspot.com/bug?extid=0827f43974813b74e6db
compiler: arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0827f4...@syzkaller.appspotmail.com

Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000080000000-0x00000000ffffffff]
Initmem setup node 0 [mem 0x0000000080000000-0x00000000ffffffff]
percpu: Embedded 19 pages/cpu s47048 r8192 d22584 u77824
Kernel command line: root=/dev/vda console=ttyAMA0 earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/vda console=ttyAMA0 vmalloc=512M smp.csd_lock_timeout=300000 watchdog_thresh=165 workqueue.watchdog_thresh=420 sysctl.net.core.netdev_unregister_timeout_secs=420 dummy_hcd.num=2 panic_on_warn=1
Unknown kernel command line parameters "earlyprintk=serial page_owner=on", will be passed to user space.
Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
Built 1 zonelists, mobility grouping on. Total pages: 520868
allocated 2097152 bytes of page_ext
mem auto-init: stack:off, heap alloc:on, heap free:off
software IO TLB: area num 2.
software IO TLB: mapped [mem 0x00000000d9a47000-0x00000000dda47000] (64MB)
Memory: 1952320K/2097152K available (24576K kernel code, 2362K rwdata, 8400K rodata, 2048K init, 867K bss, 128448K reserved, 16384K cma-reserved, 524288K highmem)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
trace event string verifier disabled
rcu: Preemptible hierarchical RCU implementation.
rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
rcu: RCU callback double-/use-after-free debug is enabled.
All grace periods are expedited (rcu_expedited).
Trampoline variant of Tasks RCU enabled.
Tracing variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
GIC physical location is 0x2c001000
rcu: srcu_init: Setting srcu_struct sizes based on contention.
sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
clocksource: arm,sp804: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
arch_timer: cp15 timer(s) running at 62.50MHz (virt).
clocksource: arch_sys_counter: mask: 0x1ffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
sched_clock: 57 bits at 63MHz, resolution 16ns, wraps every 4398046511096ns
Switching to timer-based delay loop, resolution 16ns
Console: colour dummy device 80x30
Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=625000)
pid_max: default: 32768 minimum: 301
LSM: initializing lsm=lockdown,capability,landlock,yama,safesetid,tomoyo,selinux,bpf,integrity
landlock: Up and running.
Yama: becoming mindful.
TOMOYO Linux initialized
SELinux: Initializing.
LSM support for eBPF active
stackdepot: allocating hash table of 131072 entries via kvcalloc
Mount-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre BHB: enabling loop workaround for all CPUs
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read
[0000001c] *pgd=80000080004003, *pmd=00000000
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-syzkaller #0
Hardware name: ARM-Versatile Express
PC is at psi_task_switch+0x1e0/0x204 kernel/sched/psi.c:940
LR is at psi_task_switch+0x190/0x204 kernel/sched/psi.c:932
pc : [<802a3c58>] lr : [<802a3c08>] psr: a0000193
sp : 82601e70 ip : 5b923000 fp : 82601ebc
r10: 8261ae40 r9 : 00000001 r8 : 00000002
r7 : 1b5aa6f1 r6 : 00000000 r5 : 8260c964 r4 : 00000000
r3 : 826f6748 r2 : 8309d13c r1 : 00000000 r0 : 00000000
Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5387d Table: 80003000 DAC: fffffffd
Register r0 information:
8<--- cut here ---
Unable to handle kernel paging request at virtual address 00802158 when read
[00802158] *pgd=80000080004003, *pmd=00000000
Internal error: Oops: 206 [#2] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-syzkaller #0
Hardware name: ARM-Versatile Express
PC is at __find_vmap_area mm/vmalloc.c:841 [inline]
PC is at find_vmap_area mm/vmalloc.c:1862 [inline]
PC is at find_vm_area mm/vmalloc.c:2623 [inline]
PC is at vmalloc_dump_obj+0x38/0xb4 mm/vmalloc.c:4221
LR is at __raw_spin_lock include/linux/spinlock_api_smp.h:132 [inline]
LR is at _raw_spin_lock+0x18/0x58 kernel/locking/spinlock.c:154
pc : [<80479e58>] lr : [<81800154>] psr: 20000193
sp : 82601cd8 ip : 82601cc0 fp : 82601cec
r10: 8261ae40 r9 : 8261c9a4 r8 : 8284f41c
r7 : 60000193 r6 : 00000001 r5 : 00000000 r4 : 00802160
r3 : f0830b42 r2 : 00001ef5 r1 : 00000000 r0 : 00000001
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5387d Table: 80003000 DAC: fffffffd


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Dmitry Vyukov

unread,
May 2, 2023, 2:37:30 AM5/2/23
to syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Linux ARM
On Mon, 1 May 2023 at 18:43, syzbot
<syzbot+0827f4...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 89d77f71f493 Merge tag 'riscv-for-linus-6.4-mw1' of git://..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1113550c280000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4cc65ccad523b604
> dashboard link: https://syzkaller.appspot.com/bug?extid=0827f43974813b74e6db
> compiler: arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: arm
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+0827f4...@syzkaller.appspotmail.com

+arm mailing list

Kernel started falling apart on arm during boot in various strange ways.

#syz set subsystems: arm
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000281ae805faa4844e%40google.com.

Dmitry Vyukov

unread,
May 3, 2023, 6:38:34 AM5/3/23
to Russell King (Oracle), syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Linux ARM
On Wed, 3 May 2023 at 12:34, Russell King (Oracle)
<li...@armlinux.org.uk> wrote:
>
> On Tue, May 02, 2023 at 08:37:15AM +0200, Dmitry Vyukov wrote:
> > On Mon, 1 May 2023 at 18:43, syzbot
> > <syzbot+0827f4...@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 89d77f71f493 Merge tag 'riscv-for-linus-6.4-mw1' of git://..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1113550c280000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=4cc65ccad523b604
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=0827f43974813b74e6db
> > > compiler: arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > userspace arch: arm
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+0827f4...@syzkaller.appspotmail.com
> >
> > +arm mailing list
> >
> > Kernel started falling apart on arm during boot in various strange ways.
>
> Are these all the same hardware platform? If so, please check the
> hardware with a known working kernel. These look like spurious failts
> caused possibly the hardware platform failing.

It happens in the same qemu that used to work before. So "hardware" is
presumably working.

Russell King (Oracle)

unread,
May 3, 2023, 6:46:06 AM5/3/23
to Dmitry Vyukov, syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Linux ARM
On Tue, May 02, 2023 at 08:37:15AM +0200, Dmitry Vyukov wrote:
> On Mon, 1 May 2023 at 18:43, syzbot
> <syzbot+0827f4...@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 89d77f71f493 Merge tag 'riscv-for-linus-6.4-mw1' of git://..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1113550c280000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=4cc65ccad523b604
> > dashboard link: https://syzkaller.appspot.com/bug?extid=0827f43974813b74e6db
> > compiler: arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > userspace arch: arm
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+0827f4...@syzkaller.appspotmail.com
>
> +arm mailing list
>
> Kernel started falling apart on arm during boot in various strange ways.

Are these all the same hardware platform? If so, please check the
hardware with a known working kernel. These look like spurious failts
caused possibly the hardware platform failing.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Russell King (Oracle)

unread,
May 3, 2023, 6:54:33 AM5/3/23
to Dmitry Vyukov, syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Linux ARM
I have no other ideas based on the incomplete information provided
(due to the initial oops triggering a subsequent oops while trying to
print the register "information".) And I have no suggestions how to
debug it further. Sorry.

Peter Zijlstra

unread,
May 3, 2023, 9:33:05 AM5/3/23
to syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, han...@cmpxchg.org, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com
Johannes, your commit 71dbdde7914d ("sched/psi: Remove NR_ONCPU task
accounting") was the last to touch psi.c:640. Any ideas?

Johannes Weiner

unread,
May 3, 2023, 12:35:02 PM5/3/23
to Peter Zijlstra, syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Chengming Zhou
PC is at :940, which is the group->pcpu->state_mask derefs. It's a bit
hard to say without the Code line, but 0x1c looks like the group->pcpu
offset on 32 bit, so presumably group is NULL.

Is this reproducible with 6.1? And 6.0?

Looking through git, the most recent change to how the group is looked
up is in 6.1, dc86aba751e2 ("sched/psi: Cache parent psi_group to
speed up group iteration"). It does this:

-static struct psi_group *iterate_groups(struct task_struct *task, void **iter)
+static inline struct psi_group *task_psi_group(struct task_struct *task)
{
- if (*iter == &psi_system)
- return NULL;
-
#ifdef CONFIG_CGROUPS
- if (static_branch_likely(&psi_cgroups_enabled)) {
- struct cgroup *cgroup = NULL;
-
- if (!*iter)
- cgroup = task->cgroups->dfl_cgrp;
- else
- cgroup = cgroup_parent(*iter);
-
- if (cgroup && cgroup_parent(cgroup)) {
- *iter = cgroup;
- return cgroup_psi(cgroup);
- }
- }
+ if (static_branch_likely(&psi_cgroups_enabled))
+ return cgroup_psi(task_dfl_cgroup(task));
#endif
- *iter = &psi_system;
return &psi_system;
}

Notably, it got rid of the if (cgroup) and now calls cgroup_psi()
unconditionally.

If this is easily reproducible, does the following fix it?

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index e072f6b31bf3..21268e67ae34 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -881,8 +881,11 @@ static void psi_group_change(struct psi_group *group, int cpu,
static inline struct psi_group *task_psi_group(struct task_struct *task)
{
#ifdef CONFIG_CGROUPS
- if (static_branch_likely(&psi_cgroups_enabled))
- return cgroup_psi(task_dfl_cgroup(task));
+ if (static_branch_likely(&psi_cgroups_enabled)) {
+ struct cgroup *cgroup = task_dfl_cgroup(task);
+
+ if (cgroup)
+ return cgroup_psi(cgroup);
#endif
return &psi_system;
}

[remaining email quoted below for Chengming]

Chengming Zhou

unread,
May 4, 2023, 5:57:49 AM5/4/23
to Johannes Weiner, Peter Zijlstra, syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Chengming Zhou
Yes, I think your analysis is correct.

In psi_task_switch(), cgroup_psi(task_dfl_cgroup(next)) return NULL.

```
static inline struct psi_group *cgroup_psi(struct cgroup *cgrp)
{
return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi;
}
```

So task_dfl_cgroup(next) shouldn't be NULL, or it should panic here.

If cgrp is correct, cgrp->psi shouldn't be NULL, cgrp->psi is alloc
with cgroup in cgroup_create().

But here cgrp->psi == NULL, I don't know why...

>
> Is this reproducible with 6.1? And 6.0?
>

I tried today, but can't reproduce it.

My local qemu setup maybe not the very same with syzbot. I run the
versatilepb machine on my x86_64 server and use busybox as the rootfs.
All css_set is created from init_css_set as the ancestor, which has dfl_cgrp set,
so task->cgroups->dfl_cgrp shouldn't be NULL?

Will continue to look into this issue.

Thanks.

Dmitry Vyukov

unread,
May 8, 2023, 2:23:44 AM5/8/23
to Chengming Zhou, Johannes Weiner, Peter Zijlstra, syzbot, bri...@redhat.com, bse...@google.com, dietmar....@arm.com, juri....@redhat.com, linux-...@vger.kernel.org, mgo...@suse.de, mi...@redhat.com, ros...@goodmis.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Chengming Zhou, Aleksandr Nogikh
arm32 kernel started falling apart in multiple random ways. Each
individual crash site is most likely red herring, see the following
for more info:
https://lore.kernel.org/all/CANp29Y5Bw5mrvJDzQ1AJ3eYj3bvaTzjYDVOFC-cWXb0=PEQ...@mail.gmail.com/
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/2907c99f-6456-3147-f43d-a65c2cdaf574%40linux.dev.

syzbot

unread,
Aug 23, 2023, 5:02:49 AM8/23/23
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages