[syzbot] possible deadlock in tty_port_tty_get

19 views
Skip to first unread message

syzbot

unread,
Sep 30, 2022, 6:05:39 PM9/30/22
to andy.sh...@gmail.com, etre...@distech-controls.com, gre...@linuxfoundation.org, ilpo.j...@linux.intel.com, jiri...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, u.klein...@pengutronix.de, wan...@redhat.com
Hello,

syzbot found the following issue on:

HEAD commit: 5a77386984b5 Merge tag 'drm-fixes-2022-09-30-1' of git://a..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=132b8504880000
kernel config: https://syzkaller.appspot.com/x/.config?x=a1992c90769e07
dashboard link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+223c74...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc7-syzkaller-00162-g5a77386984b5 #0 Not tainted
------------------------------------------------------
syz-executor.0/4926 is trying to acquire lock:
ffffffff8c0be788 (zonelist_update_seq.seqcount){...-}-{0:0}, at: __alloc_pages+0x43d/0x510 mm/page_alloc.c:5562

but task is already holding lock:
ffff88802624c958 (&port->lock){-.-.}-{2:2}, at: tty_insert_flip_string_and_push_buffer+0x2b/0x160 drivers/tty/tty_buffer.c:628

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&port->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
tty_port_tty_get+0x1f/0x100 drivers/tty/tty_port.c:327
tty_port_default_wakeup+0x11/0x40 drivers/tty/tty_port.c:68
serial8250_tx_chars+0x4f6/0xd90 drivers/tty/serial/8250/8250_port.c:1851
serial8250_handle_irq.part.0+0x440/0x820 drivers/tty/serial/8250/8250_port.c:1938
serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1911 [inline]
serial8250_default_handle_irq+0xb2/0x220 drivers/tty/serial/8250/8250_port.c:1958
serial8250_interrupt+0xf8/0x200 drivers/tty/serial/8250/8250_core.c:126
__handle_irq_event_percpu+0x227/0x870 kernel/irq/handle.c:158
handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
handle_irq_event+0xa7/0x1e0 kernel/irq/handle.c:210
handle_edge_irq+0x25f/0xd00 kernel/irq/chip.c:819
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq arch/x86/kernel/irq.c:231 [inline]
__common_interrupt+0x9d/0x210 arch/x86/kernel/irq.c:250
common_interrupt+0xa4/0xc0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x22/0x40 arch/x86/include/asm/idtentry.h:640
native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
default_idle+0xb/0x10 arch/x86/kernel/process.c:730
default_idle_call+0x80/0xc0 kernel/sched/idle.c:109
cpuidle_idle_call kernel/sched/idle.c:191 [inline]
do_idle+0x401/0x590 kernel/sched/idle.c:303
cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:400
start_secondary+0x21d/0x2b0 arch/x86/kernel/smpboot.c:262
secondary_startup_64_no_verify+0xce/0xdb

-> #2 (&port_lock_key){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
serial8250_console_write+0x975/0xfe0 drivers/tty/serial/8250/8250_port.c:3380
call_console_driver kernel/printk/printk.c:1945 [inline]
console_emit_next_record.constprop.0+0x3de/0x840 kernel/printk/printk.c:2732
console_flush_all kernel/printk/printk.c:2794 [inline]
console_unlock+0x37a/0x5a0 kernel/printk/printk.c:2861
vprintk_emit+0x1b9/0x5f0 kernel/printk/printk.c:2271
vprintk+0x80/0x90 kernel/printk/printk_safe.c:50
_printk+0xba/0xed kernel/printk/printk.c:2292
register_console kernel/printk/printk.c:3212 [inline]
register_console+0x482/0x840 kernel/printk/printk.c:3104
univ8250_console_init+0x3a/0x46 drivers/tty/serial/8250/8250_core.c:681
console_init+0x3b7/0x57e kernel/printk/printk.c:3308
start_kernel+0x2fa/0x48f init/main.c:1066
secondary_startup_64_no_verify+0xce/0xdb

-> #1 (console_owner){-...}-{0:0}:
console_lock_spinning_enable kernel/printk/printk.c:1808 [inline]
console_emit_next_record.constprop.0+0x2dd/0x840 kernel/printk/printk.c:2729
console_flush_all kernel/printk/printk.c:2794 [inline]
console_unlock+0x37a/0x5a0 kernel/printk/printk.c:2861
vprintk_emit+0x1b9/0x5f0 kernel/printk/printk.c:2271
vprintk+0x80/0x90 kernel/printk/printk_safe.c:50
_printk+0xba/0xed kernel/printk/printk.c:2292
build_zonelists.cold+0xe5/0x11f mm/page_alloc.c:6471
__build_all_zonelists+0x111/0x180 mm/page_alloc.c:6584
build_all_zonelists_init+0x2f/0x104 mm/page_alloc.c:6609
build_all_zonelists+0x11f/0x140 mm/page_alloc.c:6642
start_kernel+0xb9/0x48f init/main.c:960
secondary_startup_64_no_verify+0xce/0xdb

-> #0 (zonelist_update_seq.seqcount){...-}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5666 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
read_seqbegin include/linux/seqlock.h:836 [inline]
zonelist_iter_begin mm/page_alloc.c:4722 [inline]
__alloc_pages_slowpath.constprop.0+0x1a5/0x2300 mm/page_alloc.c:5044
__alloc_pages+0x43d/0x510 mm/page_alloc.c:5562
__alloc_pages_node include/linux/gfp.h:243 [inline]
kmem_getpages mm/slab.c:1363 [inline]
cache_grow_begin+0x75/0x360 mm/slab.c:2569
cache_alloc_refill+0x27f/0x380 mm/slab.c:2942
____cache_alloc mm/slab.c:3018 [inline]
____cache_alloc mm/slab.c:3001 [inline]
__do_cache_alloc mm/slab.c:3246 [inline]
slab_alloc mm/slab.c:3287 [inline]
__do_kmalloc mm/slab.c:3684 [inline]
__kmalloc+0x3a1/0x4a0 mm/slab.c:3695
kmalloc include/linux/slab.h:605 [inline]
tty_buffer_alloc+0x27b/0x2f0 drivers/tty/tty_buffer.c:180
__tty_buffer_request_room+0x15f/0x2b0 drivers/tty/tty_buffer.c:278
tty_insert_flip_string_fixed_flag+0x8c/0x250 drivers/tty/tty_buffer.c:327
tty_insert_flip_string include/linux/tty_flip.h:41 [inline]
tty_insert_flip_string_and_push_buffer+0x3e/0x160 drivers/tty/tty_buffer.c:629
pty_write+0xd6/0x100 drivers/tty/pty.c:118
process_output_block drivers/tty/n_tty.c:586 [inline]
n_tty_write+0x4ca/0xfd0 drivers/tty/n_tty.c:2350
do_tty_write drivers/tty/tty_io.c:1024 [inline]
file_tty_write.constprop.0+0x499/0x8f0 drivers/tty/tty_io.c:1095
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:584
ksys_write+0x127/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Chain exists of:
zonelist_update_seq.seqcount --> &port_lock_key --> &port->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&port->lock);
lock(&port_lock_key);
lock(&port->lock);
lock(zonelist_update_seq.seqcount);

*** DEADLOCK ***

5 locks held by syz-executor.0/4926:
#0: ffff88806b3c3098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:244
#1: ffff88806b3c3130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: tty_write_lock drivers/tty/tty_io.c:950 [inline]
#1: ffff88806b3c3130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: do_tty_write drivers/tty/tty_io.c:973 [inline]
#1: ffff88806b3c3130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: file_tty_write.constprop.0+0x296/0x8f0 drivers/tty/tty_io.c:1095
#2: ffff88806b3c32e8 (&o_tty->termios_rwsem/1){++++}-{3:3}, at: n_tty_write+0x1bf/0xfd0 drivers/tty/n_tty.c:2333
#3: ffffc90000cf4380 (&ldata->output_lock){+.+.}-{3:3}, at: process_output_block drivers/tty/n_tty.c:541 [inline]
#3: ffffc90000cf4380 (&ldata->output_lock){+.+.}-{3:3}, at: n_tty_write+0x5f6/0xfd0 drivers/tty/n_tty.c:2350
#4: ffff88802624c958 (&port->lock){-.-.}-{2:2}, at: tty_insert_flip_string_and_push_buffer+0x2b/0x160 drivers/tty/tty_buffer.c:628

stack backtrace:
CPU: 1 PID: 4926 Comm: syz-executor.0 Not tainted 6.0.0-rc7-syzkaller-00162-g5a77386984b5 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5666 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
read_seqbegin include/linux/seqlock.h:836 [inline]
zonelist_iter_begin mm/page_alloc.c:4722 [inline]
__alloc_pages_slowpath.constprop.0+0x1a5/0x2300 mm/page_alloc.c:5044
__alloc_pages+0x43d/0x510 mm/page_alloc.c:5562
__alloc_pages_node include/linux/gfp.h:243 [inline]
kmem_getpages mm/slab.c:1363 [inline]
cache_grow_begin+0x75/0x360 mm/slab.c:2569
cache_alloc_refill+0x27f/0x380 mm/slab.c:2942
____cache_alloc mm/slab.c:3018 [inline]
____cache_alloc mm/slab.c:3001 [inline]
__do_cache_alloc mm/slab.c:3246 [inline]
slab_alloc mm/slab.c:3287 [inline]
__do_kmalloc mm/slab.c:3684 [inline]
__kmalloc+0x3a1/0x4a0 mm/slab.c:3695
kmalloc include/linux/slab.h:605 [inline]
tty_buffer_alloc+0x27b/0x2f0 drivers/tty/tty_buffer.c:180
__tty_buffer_request_room+0x15f/0x2b0 drivers/tty/tty_buffer.c:278
tty_insert_flip_string_fixed_flag+0x8c/0x250 drivers/tty/tty_buffer.c:327
tty_insert_flip_string include/linux/tty_flip.h:41 [inline]
tty_insert_flip_string_and_push_buffer+0x3e/0x160 drivers/tty/tty_buffer.c:629
pty_write+0xd6/0x100 drivers/tty/pty.c:118
process_output_block drivers/tty/n_tty.c:586 [inline]
n_tty_write+0x4ca/0xfd0 drivers/tty/n_tty.c:2350
do_tty_write drivers/tty/tty_io.c:1024 [inline]
file_tty_write.constprop.0+0x499/0x8f0 drivers/tty/tty_io.c:1095
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:584
ksys_write+0x127/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7feffde8a5a9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007feffcdfe168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007feffdfabf80 RCX: 00007feffde8a5a9
RDX: 00000000fffffedf RSI: 0000000020000000 RDI: 0000000000000004
RBP: 00007feffdee5580 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffc3837d3f R14: 00007feffcdfe300 R15: 0000000000022000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Ilpo Järvinen

unread,
Oct 3, 2022, 4:44:27 AM10/3/22
to syzbot, andy.sh...@gmail.com, etre...@distech-controls.com, Greg Kroah-Hartman, Jiri Slaby, LKML, linux-serial, syzkall...@googlegroups.com, u.klein...@pengutronix.de, wan...@redhat.com, Mel Gorman
+ Cc Mel Gorman.

--
i.

syzbot

unread,
Mar 25, 2023, 7:02:48 PM3/25/23
to andy.sh...@gmail.com, etre...@distech-controls.com, gre...@linuxfoundation.org, ilpo.j...@linux.intel.com, jiri...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, u.klein...@pengutronix.de, wan...@redhat.com
syzbot has found a reproducer for the following issue on:

HEAD commit: 65aca32efdcb Merge tag 'mm-hotfixes-stable-2023-03-24-17-0..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17b21b0ec80000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ef27c4ff127cda5
dashboard link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1591ba51c80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12ac6789c80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+223c74...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc3-syzkaller-00317-g65aca32efdcb #0 Not tainted
------------------------------------------------------
syz-executor930/5168 is trying to acquire lock:
ffffffff8c8e11e8 (zonelist_update_seq.seqcount){...-}-{0:0}, at: __alloc_pages+0x408/0x4a0 mm/page_alloc.c:5605

but task is already holding lock:
ffff888014325958 (&port->lock){-...}-{2:2}, at: tty_insert_flip_string_and_push_buffer+0x2f/0x160 drivers/tty/tty_buffer.c:628

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&port->lock){-...}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
tty_port_tty_get+0x21/0xf0 drivers/tty/tty_port.c:327
tty_port_default_wakeup+0x15/0x40 drivers/tty/tty_port.c:68
serial8250_tx_chars+0x53e/0xdf0 drivers/tty/serial/8250/8250_port.c:1865
serial8250_handle_irq.part.0+0x460/0x870 drivers/tty/serial/8250/8250_port.c:1955
serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1928 [inline]
serial8250_default_handle_irq+0xb6/0x230 drivers/tty/serial/8250/8250_port.c:1975
serial8250_interrupt+0xfc/0x200 drivers/tty/serial/8250/8250_core.c:127
__handle_irq_event_percpu+0x22b/0x730 kernel/irq/handle.c:158
handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
handle_irq_event+0xab/0x1e0 kernel/irq/handle.c:210
handle_edge_irq+0x263/0xd00 kernel/irq/chip.c:819
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq arch/x86/kernel/irq.c:231 [inline]
__common_interrupt+0xa1/0x220 arch/x86/kernel/irq.c:250
common_interrupt+0xa8/0xd0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:636
native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
default_idle+0xf/0x20 arch/x86/kernel/process.c:703
default_idle_call+0x67/0xa0 kernel/sched/idle.c:97
cpuidle_idle_call kernel/sched/idle.c:170 [inline]
do_idle+0x31e/0x3e0 kernel/sched/idle.c:282
cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:379
start_secondary+0x221/0x2b0 arch/x86/kernel/smpboot.c:264
secondary_startup_64_no_verify+0xce/0xdb

-> #2 (&port_lock_key
){-...}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
serial8250_console_write+0x4ba/0x1010 drivers/tty/serial/8250/8250_port.c:3401
console_emit_next_record kernel/printk/printk.c:2886 [inline]
console_flush_all+0x49c/0xcc0 kernel/printk/printk.c:2942
console_unlock+0xb8/0x1f0 kernel/printk/printk.c:3016
vprintk_emit+0x1bd/0x600 kernel/printk/printk.c:2316
vprintk+0x84/0xa0 kernel/printk/printk_safe.c:50
_printk+0xbf/0xf0 kernel/printk/printk.c:2337
register_console+0x7ef/0x10e0 kernel/printk/printk.c:3467
univ8250_console_init+0x38/0x50 drivers/tty/serial/8250/8250_core.c:688
console_init+0xba/0x5c0 kernel/printk/printk.c:3610
start_kernel+0x273/0x4d0 init/main.c:1081
secondary_startup_64_no_verify+0xce/0xdb

-> #1 (console_owner){....}-{0:0}:
console_lock_spinning_enable kernel/printk/printk.c:1867 [inline]
console_emit_next_record kernel/printk/printk.c:2880 [inline]
console_flush_all+0x472/0xcc0 kernel/printk/printk.c:2942
console_unlock+0xb8/0x1f0 kernel/printk/printk.c:3016
vprintk_emit+0x1bd/0x600 kernel/printk/printk.c:2316
vprintk+0x84/0xa0 kernel/printk/printk_safe.c:50
_printk+0xbf/0xf0 kernel/printk/printk.c:2337
build_zonelists+0x357/0x560 mm/page_alloc.c:6543
__build_all_zonelists+0x122/0x180 mm/page_alloc.c:6656
build_all_zonelists_init+0x1c/0x190 mm/page_alloc.c:6681
build_all_zonelists+0x4a/0x150 mm/page_alloc.c:6714
start_kernel+0xb7/0x4d0 init/main.c:975
secondary_startup_64_no_verify+0xce/0xdb

-> #0 (zonelist_update_seq.seqcount){...-}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3098 [inline]
check_prevs_add kernel/locking/lockdep.c:3217 [inline]
validate_chain kernel/locking/lockdep.c:3832 [inline]
__lock_acquire+0x2ec7/0x5d40 kernel/locking/lockdep.c:5056
lock_acquire kernel/locking/lockdep.c:5669 [inline]
lock_acquire+0x1af/0x520 kernel/locking/lockdep.c:5634
seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
read_seqbegin include/linux/seqlock.h:836 [inline]
zonelist_iter_begin mm/page_alloc.c:4765 [inline]
__alloc_pages_slowpath.constprop.0+0x15a/0x2170 mm/page_alloc.c:5086
__alloc_pages+0x408/0x4a0 mm/page_alloc.c:5605
__alloc_pages_node include/linux/gfp.h:237 [inline]
kmem_getpages mm/slab.c:1360 [inline]
cache_grow_begin+0x9b/0x3b0 mm/slab.c:2570
cache_alloc_refill+0x27f/0x380 mm/slab.c:2943
____cache_alloc mm/slab.c:3019 [inline]
____cache_alloc mm/slab.c:3002 [inline]
__do_cache_alloc mm/slab.c:3202 [inline]
slab_alloc_node mm/slab.c:3250 [inline]
__kmem_cache_alloc_node+0x360/0x3f0 mm/slab.c:3541
__do_kmalloc_node mm/slab_common.c:966 [inline]
__kmalloc+0x4e/0x190 mm/slab_common.c:980
kmalloc include/linux/slab.h:584 [inline]
tty_buffer_alloc+0x275/0x2f0 drivers/tty/tty_buffer.c:180
__tty_buffer_request_room+0x15b/0x2d0 drivers/tty/tty_buffer.c:278
tty_insert_flip_string_fixed_flag+0x90/0x250 drivers/tty/tty_buffer.c:327
tty_insert_flip_string include/linux/tty_flip.h:41 [inline]
tty_insert_flip_string_and_push_buffer+0x42/0x160 drivers/tty/tty_buffer.c:629
pty_write+0xda/0x100 drivers/tty/pty.c:118
ppp_async_push+0x5cb/0x1660 drivers/net/ppp/ppp_async.c:670
ppp_async_send+0xe2/0x110 drivers/net/ppp/ppp_async.c:638
__ppp_channel_push+0xe3/0x230 drivers/net/ppp/ppp_generic.c:2150
ppp_channel_push+0x1c2/0x250 drivers/net/ppp/ppp_generic.c:2177
ppp_write+0x22b/0x2d0 drivers/net/ppp/ppp_generic.c:523
do_loop_readv_writev fs/read_write.c:759 [inline]
do_loop_readv_writev fs/read_write.c:743 [inline]
do_iter_write+0x4ef/0x700 fs/read_write.c:863
vfs_writev+0x1aa/0x670 fs/read_write.c:934
do_pwritev fs/read_write.c:1031 [inline]
__do_sys_pwritev fs/read_write.c:1078 [inline]
__se_sys_pwritev fs/read_write.c:1073 [inline]
__x64_sys_pwritev+0x22f/0x310 fs/read_write.c:1073
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Chain exists of:
zonelist_update_seq.seqcount --> &port_lock_key --> &port->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&port->lock);
lock(&port_lock_key);
lock(&port->lock);
lock(zonelist_update_seq.seqcount);

*** DEADLOCK ***

4 locks held by syz-executor930/5168:
#0: ffff888000630a48 (&pch->upl){.+..}-{2:2}, at: ppp_channel_push+0x28/0x250 drivers/net/ppp/ppp_generic.c:2171
#1: ffff8880006309e0 (&pch->downl){+...}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
#1: ffff8880006309e0 (&pch->downl){+...}-{2:2}, at: __ppp_channel_push+0x2a/0x230 drivers/net/ppp/ppp_generic.c:2146
#2: ffff8880283b8030 (&ap->xmit_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
#2: ffff8880283b8030 (&ap->xmit_lock){+...}-{2:2}, at: ppp_async_push+0xb2/0x1660 drivers/net/ppp/ppp_async.c:663
#3: ffff888014325958 (&port->lock){-...}-{2:2}, at: tty_insert_flip_string_and_push_buffer+0x2f/0x160 drivers/tty/tty_buffer.c:628

stack backtrace:
CPU: 0 PID: 5168 Comm: syz-executor930 Not tainted 6.3.0-rc3-syzkaller-00317-g65aca32efdcb #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2178
check_prev_add kernel/locking/lockdep.c:3098 [inline]
check_prevs_add kernel/locking/lockdep.c:3217 [inline]
validate_chain kernel/locking/lockdep.c:3832 [inline]
__lock_acquire+0x2ec7/0x5d40 kernel/locking/lockdep.c:5056
lock_acquire kernel/locking/lockdep.c:5669 [inline]
lock_acquire+0x1af/0x520 kernel/locking/lockdep.c:5634
seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
read_seqbegin include/linux/seqlock.h:836 [inline]
zonelist_iter_begin mm/page_alloc.c:4765 [inline]
__alloc_pages_slowpath.constprop.0+0x15a/0x2170 mm/page_alloc.c:5086
__alloc_pages+0x408/0x4a0 mm/page_alloc.c:5605
__alloc_pages_node include/linux/gfp.h:237 [inline]
kmem_getpages mm/slab.c:1360 [inline]
cache_grow_begin+0x9b/0x3b0 mm/slab.c:2570
cache_alloc_refill+0x27f/0x380 mm/slab.c:2943
____cache_alloc mm/slab.c:3019 [inline]
____cache_alloc mm/slab.c:3002 [inline]
__do_cache_alloc mm/slab.c:3202 [inline]
slab_alloc_node mm/slab.c:3250 [inline]
__kmem_cache_alloc_node+0x360/0x3f0 mm/slab.c:3541
__do_kmalloc_node mm/slab_common.c:966 [inline]
__kmalloc+0x4e/0x190 mm/slab_common.c:980
kmalloc include/linux/slab.h:584 [inline]
tty_buffer_alloc+0x275/0x2f0 drivers/tty/tty_buffer.c:180
__tty_buffer_request_room+0x15b/0x2d0 drivers/tty/tty_buffer.c:278
tty_insert_flip_string_fixed_flag+0x90/0x250 drivers/tty/tty_buffer.c:327
tty_insert_flip_string include/linux/tty_flip.h:41 [inline]
tty_insert_flip_string_and_push_buffer+0x42/0x160 drivers/tty/tty_buffer.c:629
pty_write+0xda/0x100 drivers/tty/pty.c:118
ppp_async_push+0x5cb/0x1660 drivers/net/ppp/ppp_async.c:670
ppp_async_send+0xe2/0x110 drivers/net/ppp/ppp_async.c:638
__ppp_channel_push+0xe3/0x230 drivers/net/ppp/ppp_generic.c:2150
ppp_channel_push+0x1c2/0x250 drivers/net/ppp/ppp_generic.c:2177
ppp_write+0x22b/0x2d0 drivers/net/ppp/ppp_generic.c:523
do_loop_readv_writev fs/read_write.c:759 [inline]
do_loop_readv_writev fs/read_write.c:743 [inline]
do_iter_write+0x4ef/0x700 fs/read_write.c:863
vfs_writev+0x1aa/0x670 fs/read_write.c:934
do_pwritev fs/read_write.c:1031 [inline]
__do_sys_pwritev fs/read_write.c:1078 [inline]
__se_sys_pwritev fs/read_write.c:1073 [inline]
__x64_sys_pwritev+0x22f/0x310 fs/read_write.c:1073
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f76be0b97d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffcac24418 EFLAGS: 00000246 ORIG_RAX: 0000000000000128
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f76be0b97d9
RDX: 0000000000000003 RSI: 0000000020000380 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000004 R09: 000000000000000d
R10: 00000000000000f1 R11: 0000000000000246 R12: 00007fffcac24430
R13: 00000000000f4240 R14: 0000000000000000 R15: 0000000000000000
</TASK>

Hillf Danton

unread,
Mar 26, 2023, 9:59:55 AM3/26/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 25 Mar 2023 16:02:47 -0700
> HEAD commit: 65aca32efdcb Merge tag 'mm-hotfixes-stable-2023-03-24-17-0..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12ac6789c80000

Print node info without zonelist_update_seq held.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 65aca32efdcb

--- x/mm/page_alloc.c
+++ y/mm/page_alloc.c
@@ -6537,10 +6537,12 @@ static void build_zonelists(pg_data_t *p

build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
build_thisnode_zonelists(pgdat);
+ write_sequnlock(&zonelist_update_seq);
pr_info("Fallback order for Node %d: ", local_node);
for (node = 0; node < nr_nodes; node++)
pr_cont("%d ", node_order[node]);
pr_cont("\n");
+ write_seqlock(&zonelist_update_seq);
}

#ifdef CONFIG_HAVE_MEMORYLESS_NODES
--

syzbot

unread,
Mar 26, 2023, 10:35:24 AM3/26/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+223c74...@syzkaller.appspotmail.com

Tested on:

commit: 65aca32e Merge tag 'mm-hotfixes-stable-2023-03-24-17-0..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12790be5c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ef27c4ff127cda5
dashboard link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=11270cd5c80000

Note: testing is done by a robot and is best-effort only.

Tetsuo Handa

unread,
Apr 2, 2023, 6:48:39 AM4/2/23
to Mel Gorman, Patrick Daly, Michal Hocko, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock, for zonelist_update_seq is checked
while doing GFP_ATOMIC allocation.

Since zonelist_update_seq is checked for avoiding unnecessary OOM kill,
there is no need to check zonelist_update_seq for memory allocations
which will fail without OOM kill.

Therefore, let's keep locking dependency simple, by not checking
zonelist_update_seq from !__GFP_DIRECT_RECLAIM allocations.

Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
mm/page_alloc.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..80ef79b23865 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4759,17 +4759,17 @@ EXPORT_SYMBOL_GPL(fs_reclaim_release);
*/
static DEFINE_SEQLOCK(zonelist_update_seq);

-static unsigned int zonelist_iter_begin(void)
+static unsigned int zonelist_iter_begin(gfp_t gfp_mask)
{
- if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+ if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp_mask & __GFP_DIRECT_RECLAIM))
return read_seqbegin(&zonelist_update_seq);

return 0;
}

-static unsigned int check_retry_zonelist(unsigned int seq)
+static unsigned int check_retry_zonelist(unsigned int seq, gfp_t gfp_mask)
{
- if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+ if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp_mask & __GFP_DIRECT_RECLAIM))
return read_seqretry(&zonelist_update_seq, seq);

return seq;
@@ -5083,7 +5083,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
no_progress_loops = 0;
compact_priority = DEF_COMPACT_PRIORITY;
cpuset_mems_cookie = read_mems_allowed_begin();
- zonelist_iter_cookie = zonelist_iter_begin();
+ zonelist_iter_cookie = zonelist_iter_begin(gfp_mask);

/*
* The fast path uses conservative alloc_flags to succeed only until
@@ -5261,7 +5261,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* a unnecessary OOM kill.
*/
if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
- check_retry_zonelist(zonelist_iter_cookie))
+ check_retry_zonelist(zonelist_iter_cookie, gfp_mask))
goto restart;

/* Reclaim has failed us, start killing things */
@@ -5287,7 +5287,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* a unnecessary OOM kill.
*/
if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
- check_retry_zonelist(zonelist_iter_cookie))
+ check_retry_zonelist(zonelist_iter_cookie, gfp_mask))
goto restart;

/*
--
2.34.1

Michal Hocko

unread,
Apr 3, 2023, 4:15:59 AM4/3/23
to Tetsuo Handa, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Sun 02-04-23 19:48:29, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock, for zonelist_update_seq is checked
> while doing GFP_ATOMIC allocation.

Could you explain the the deadlock scenario?

> Since zonelist_update_seq is checked for avoiding unnecessary OOM kill,
> there is no need to check zonelist_update_seq for memory allocations
> which will fail without OOM kill.
>
> Therefore, let's keep locking dependency simple, by not checking
> zonelist_update_seq from !__GFP_DIRECT_RECLAIM allocations.
>
> Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>

Is this
https://lore.kernel.org/all/0000000000001d...@google.com/ the
underlying report ?

> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10
> Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
> Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
--
Michal Hocko
SUSE Labs

Tetsuo Handa

unread,
Apr 3, 2023, 7:14:40 AM4/3/23
to Michal Hocko, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
On 2023/04/03 17:15, Michal Hocko wrote:
> Is this
> https://lore.kernel.org/all/0000000000001d...@google.com/ the
> underlying report ?

Yes.

> Could you explain the the deadlock scenario?

build_zonelists() from __build_all_zonelists() calls printk() with
zonelist_update_seq held.

printk() holds console_owner lock for synchronous printing, and then upon
unlock of console_owner lock, printk() holds port_lock_key and port->lock.

tty_insert_flip_string_and_push_buffer() from pty_write() conditionally calls
kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. But since commit 3d36424b3b58,
zonelist_update_seq is checked by GFP_ATOMIC allocation (i.e. a new locking dependency
was added by that commit).

CPU0 CPU1
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// interrupt handler starts
handle_irq() {
serial8250_interrupt() {
serial8250_tx_chars() {
tty_port_tty_get() {
spin_lock_irqsave(&port->lock, flags); // spins here waiting for kmalloc() from tty_insert_flip_string() to complete
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq) {
// spins here waiting for interrupt handler to complete if zonelist_update_seq.seqcount is odd
tty = tty_kref_get(port->tty);
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
// interrupt handler ends
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
}
}

Well, it seems that read_mems_allowed_begin() is protected by calling
local_irq_save(flags) before write_seqcount_begin(&current->mems_allowed_seq).

Can zonelist_iter_begin() be protected as well (i.e. call local_irq_save(flags)
before write_seqlock(&zonelist_update_seq)) ?

But even if write_seqlock(&zonelist_update_seq) is called with local irq disabled,
port_lock_key after all makes this warning again?



This bug report might be a suggestion that we want to use two versions of
__alloc_pages_slowpath(), one for atomic context which is geared towards smaller
kernel stack usage and simplified locking dependency (because atomic allocation can
happen from subtle context including interrupt handler) and the other for noinline
version for schedulable context which is geared towards larger kernel stack usage
and complicated locking dependency for implementing rich retry paths including
direct reclaim and OOM kill...

Michal Hocko

unread,
Apr 3, 2023, 8:09:29 AM4/3/23
to Tetsuo Handa, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness
Thank you! IIUC this can only happen when there is a race with the
memory hotplug. So pretty much a very rare event. Also I am not really
sure this really requires any changes at the allocator level. I would
much rather sacrifice the printk in build_zonelists or pull it out of
the locked section. Or would printk_deferred help in this case?

Tetsuo Handa

unread,
Apr 3, 2023, 8:51:43 AM4/3/23
to Michal Hocko, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On 2023/04/03 21:09, Michal Hocko wrote:
> On Mon 03-04-23 20:14:28, Tetsuo Handa wrote:
>> Well, it seems that read_mems_allowed_begin() is protected by calling
>> local_irq_save(flags) before write_seqcount_begin(&current->mems_allowed_seq).
>>
>> Can zonelist_iter_begin() be protected as well (i.e. call local_irq_save(flags)
>> before write_seqlock(&zonelist_update_seq)) ?
>>
>> But even if write_seqlock(&zonelist_update_seq) is called with local irq disabled,
>> port_lock_key after all makes this warning again?

Hmm, local_irq_save(flags) before write_seqlock(&zonelist_update_seq) won't help.
Synchronous printk() will try to hold port->lock from process context even if local
irq is disabled, won't it? Not limited to interrupt handler but any synchronous printk()
inside write_seqlock(&zonelist_update_seq) / write_sequnlock(&zonelist_update_seq)
section is not safe.

> Thank you! IIUC this can only happen when there is a race with the
> memory hotplug. So pretty much a very rare event.

Right.

> Also I am not really
> sure this really requires any changes at the allocator level. I would
> much rather sacrifice the printk in build_zonelists or pull it out of
> the locked section. Or would printk_deferred help in this case?

Just moving printk() out of write_seqlock(&zonelist_update_seq) / write_sequnlock(&zonelist_update_seq)
section is not sufficient. This problem will happen as long as interrupt handler tries to hold port->lock.
Also disabling local irq will be needed.

By the way, is this case qualified as a user of printk_deferred(), for printk_deferred() says

/*
* Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
*/
__printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);

?

Since this is a problem introduced by mm change, I think fixing this problem on the
mm side is the cleaner. Can't there be a different approach? For example, can't we
replace

cpuset_mems_cookie = read_mems_allowed_begin();
zonelist_iter_cookie = zonelist_iter_begin();

and

if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
check_retry_zonelist(zonelist_iter_cookie))

with different conditions, like recalculate cpuset/zonelist in the last second and
check immediately before giving up allocation or OOM kill whether they have changed?

Michal Hocko

unread,
Apr 3, 2023, 9:44:37 AM4/3/23
to Tetsuo Handa, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Mon 03-04-23 21:51:29, Tetsuo Handa wrote:
> On 2023/04/03 21:09, Michal Hocko wrote:
> > On Mon 03-04-23 20:14:28, Tetsuo Handa wrote:
> >> Well, it seems that read_mems_allowed_begin() is protected by calling
> >> local_irq_save(flags) before write_seqcount_begin(&current->mems_allowed_seq).
> >>
> >> Can zonelist_iter_begin() be protected as well (i.e. call local_irq_save(flags)
> >> before write_seqlock(&zonelist_update_seq)) ?
> >>
> >> But even if write_seqlock(&zonelist_update_seq) is called with local irq disabled,
> >> port_lock_key after all makes this warning again?
>
> Hmm, local_irq_save(flags) before write_seqlock(&zonelist_update_seq) won't help.
> Synchronous printk() will try to hold port->lock from process context even if local
> irq is disabled, won't it? Not limited to interrupt handler but any synchronous printk()
> inside write_seqlock(&zonelist_update_seq) / write_sequnlock(&zonelist_update_seq)
> section is not safe.
>
> > Thank you! IIUC this can only happen when there is a race with the
> > memory hotplug. So pretty much a very rare event.
>
> Right.
>
> > Also I am not really
> > sure this really requires any changes at the allocator level. I would
> > much rather sacrifice the printk in build_zonelists or pull it out of
> > the locked section. Or would printk_deferred help in this case?
>
> Just moving printk() out of write_seqlock(&zonelist_update_seq) / write_sequnlock(&zonelist_update_seq)
> section is not sufficient. This problem will happen as long as interrupt handler tries to hold port->lock.

I do not follow. How is a printk outside of zonelist_update_seq going to
cause a dead/live lock? There shouldn't be any other locks (apart from
hotplug) taken in that path IIRC.

> Also disabling local irq will be needed.

Why?

> By the way, is this case qualified as a user of printk_deferred(), for printk_deferred() says
>
> /*
> * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
> */
> __printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);
>
> ?

Dunno, question for printk maintainers. I know they want to limit the
usage. Maybe this qualifies as a exception worth case as well.


> Since this is a problem introduced by mm change, I think fixing this problem on the
> mm side is the cleaner.

Agreed. That would be one of the options I have mentioned. I do not
think the printk information serves such a big role we couldn't live
without it.

> Can't there be a different approach? For example, can't we
> replace
>
> cpuset_mems_cookie = read_mems_allowed_begin();
> zonelist_iter_cookie = zonelist_iter_begin();
>
> and
>
> if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
> check_retry_zonelist(zonelist_iter_cookie))
>
> with different conditions, like recalculate cpuset/zonelist in the last second and
> check immediately before giving up allocation or OOM kill whether they have changed?

Dunno and honestly that is a subtle piece of code and I would rather not
touch it just because we have limitations in printk usage. Especially
considerenig the above.

Petr Mladek

unread,
Apr 3, 2023, 11:15:35 AM4/3/23
to Michal Hocko, Tetsuo Handa, Sergey Senozhatsky, Steven Rostedt, John Ogness, Mel Gorman, Patrick Daly, David Hildenbrand, Andrew Morton, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
Sigh, I am afraid that printk_deferred() is the best option at this
moment.

I see that there are more pr_info()/pr_cont() calls in build_zonelists.
You might want to use printk_deferred_enter()/exit() around them.

These problems should disappear once printk() messages get processesed
in a kthread. And pr_info() is not critical information by definition
and it is a perfect candidate for deferring.

Unfortunately, introducing the kthreads takes ages. So, we will have
to live with printk_deferred() for some time.


Important:

printk_deferred() still should be used with care. The messages will
be pushed to the console in another random context that might be
atomic. The more messages we deffer the bigger risk of softlockups
we create.

A rules of thumb:

+ printk_deferred() would make sense for printing few lines
in a code where the dependency against the console driver
code is really complicated. It seems to be this case.

+ printk_deferred() is not a good solution for long reports. It
would just increase a risk of softlockups and could break
the system.

Best Regards,
Petr

Tetsuo Handa

unread,
Apr 3, 2023, 8:37:37 PM4/3/23
to Petr Mladek, Michal Hocko, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

We somehow need to prevent __alloc_pages_slowpath() from checking
this lock. Since Petr Mladek thinks that __build_all_zonelists() can
become a candidate for deferring printk() [2], let's make sure that
current CPU/thread won't reach __alloc_pages_slowpath() while this lock
is in use.

Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mho...@suse.com>
Cc: Petr Mladek <pml...@suse.com>
---
mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..64fa77b8d24a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;

+ /*
+ * Since __alloc_pages_slowpath() spins if zonelist_update_seq.seqcount
+ * is odd, any memory allocation while zonelist_update_seq.seqcount is
+ * odd have to be avoided.
+ *
+ * Explicitly disable local irqs in order to avoid calling
+ * kmalloc(GFP_ATOMIC) from e.g. timer interrupt handler.
+ * Also, explicitly prevent printk() from synchronously waiting for
+ * port->lock because tty_insert_flip_string_and_push_buffer() might
+ * call kmalloc(GFP_ATOMIC | __GFP_NOWARN) while holding port->lock.
+ */
+ local_irq_save(flags);
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);

#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
}

write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}

static noinline void __init
--
2.34.1

Sergey Senozhatsky

unread,
Apr 4, 2023, 3:25:14 AM4/4/23
to Tetsuo Handa, Petr Mladek, Michal Hocko, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On (23/04/04 09:37), Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.
>
> We somehow need to prevent __alloc_pages_slowpath() from checking
> this lock. Since Petr Mladek thinks that __build_all_zonelists() can
> become a candidate for deferring printk() [2], let's make sure that
> current CPU/thread won't reach __alloc_pages_slowpath() while this lock
> is in use.
>
> Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
> Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
> Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
> Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mho...@suse.com>
> Cc: Petr Mladek <pml...@suse.com>

Yeah that looks like one of those printk_deferred() cases.
FWIW
Reviewed-by: Sergey Senozhatsky <senoz...@chromium.org>

Petr Mladek

unread,
Apr 4, 2023, 3:43:21 AM4/4/23
to Tetsuo Handa, Michal Hocko, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Tue 2023-04-04 09:37:25, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.
>
> We somehow need to prevent __alloc_pages_slowpath() from checking
> this lock. Since Petr Mladek thinks that __build_all_zonelists() can
> become a candidate for deferring printk() [2], let's make sure that
> current CPU/thread won't reach __alloc_pages_slowpath() while this lock
> is in use.
>
> Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
> Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
> Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]

From the description is far from obvious how printk() is involved.
It might make sense to paste the entire lockdep splat. The links
are not guaranteed to stay around.

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
> int nid;
> int __maybe_unused cpu;
> pg_data_t *self = data;
> + unsigned long flags;
>
> + /*
> + * Since __alloc_pages_slowpath() spins if zonelist_update_seq.seqcount
> + * is odd, any memory allocation while zonelist_update_seq.seqcount is
> + * odd have to be avoided.
> + *
> + * Explicitly disable local irqs in order to avoid calling
> + * kmalloc(GFP_ATOMIC) from e.g. timer interrupt handler.
> + * Also, explicitly prevent printk() from synchronously waiting for
> + * port->lock because tty_insert_flip_string_and_push_buffer() might
> + * call kmalloc(GFP_ATOMIC | __GFP_NOWARN) while holding port->lock.
> + */
> + local_irq_save(flags);

The comment above printk_deferred_enter definition in
include/linux/printk.h says that interrupts need to be disabled.

But strictly speaking, it should be enough to disable preemption
there days. The reason is that is uses per-CPU reference counter.

Note that it used to be really important to disable interrupts
in the past. The messages were temporary stored in a per-CPU buffer
and the lockless algorithm was not safe for reentrancy.

> + printk_deferred_enter();
> write_seqlock(&zonelist_update_seq);
>
> #ifdef CONFIG_NUMA
> @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
> }
>
> write_sequnlock(&zonelist_update_seq);
> + printk_deferred_exit();
> + local_irq_restore(flags);
> }
>
> static noinline void __init

Otherwise, it looks fine from the printk() POV.

Best Regards,
Petr

Michal Hocko

unread,
Apr 4, 2023, 3:54:20 AM4/4/23
to Tetsuo Handa, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Tue 04-04-23 09:37:25, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.
>
> We somehow need to prevent __alloc_pages_slowpath() from checking
> this lock. Since Petr Mladek thinks that __build_all_zonelists() can
> become a candidate for deferring printk() [2], let's make sure that
> current CPU/thread won't reach __alloc_pages_slowpath() while this lock
> is in use.

I agree with Petr that the lockdep splat and the lockup explanation
should be part of the changelog. Just reuse what you had in previous
email.
This explanation of local_irq_save just doesn't make any sense. You do
not prevent any other cpu from entering the IRQ and doing the same
thing. If the sole purpose of local_irq_save is to conform
printk_deferred_enter then state that instead. Although it seems that
Petr believes that preempt_disable should be sufficient and then it
would be preferred as well. This would require update to the comment for
printk_deferred_enter though.

Thanks!

> + */
> + local_irq_save(flags);
> + printk_deferred_enter();
> write_seqlock(&zonelist_update_seq);
>
> #ifdef CONFIG_NUMA
> @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
> }
>
> write_sequnlock(&zonelist_update_seq);
> + printk_deferred_exit();
> + local_irq_restore(flags);
> }
>
> static noinline void __init
> --
> 2.34.1

Tetsuo Handa

unread,
Apr 4, 2023, 4:20:16 AM4/4/23
to Michal Hocko, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On 2023/04/04 16:54, Michal Hocko wrote:
>> + /*
>> + * Since __alloc_pages_slowpath() spins if zonelist_update_seq.seqcount
>> + * is odd, any memory allocation while zonelist_update_seq.seqcount is
>> + * odd have to be avoided.
>> + *
>> + * Explicitly disable local irqs in order to avoid calling
>> + * kmalloc(GFP_ATOMIC) from e.g. timer interrupt handler.
>> + * Also, explicitly prevent printk() from synchronously waiting for
>> + * port->lock because tty_insert_flip_string_and_push_buffer() might
>> + * call kmalloc(GFP_ATOMIC | __GFP_NOWARN) while holding port->lock.
>
> This explanation of local_irq_save just doesn't make any sense. You do
> not prevent any other cpu from entering the IRQ and doing the same
> thing.

There is no need to prevent other CPUs from doing the same thing.
The intent of local_irq_save() here is to avoid below sequence.

CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// forever spins because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}

> If the sole purpose of local_irq_save is to conform
> printk_deferred_enter then state that instead.

As described above, this local_irq_save() is not intended for conforming
printk_deferred_enter(). This is the same with set_mems_allowed() calling
local_irq_save() before write_seqcount_begin(&current->mems_allowed_seq).

> Although it seems that
> Petr believes that preempt_disable should be sufficient and then it
> would be preferred as well. This would require update to the comment for
> printk_deferred_enter though.

This patch is supposed to be backported to stable kernels where commit
3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists
and page allocation") was backported.

Michal Hocko

unread,
Apr 4, 2023, 7:05:38 AM4/4/23
to Tetsuo Handa, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
OK, now we are on the same page. Your previous bc630622-8b24-e4ca...@I-love.SAKURA.ne.jp
explanation had an intermediary lock in the dependency (port->lock). I
haven't realized that the actual scenario could be simpler than that.
But you are right that GFP_ATOMIC allocations from IRQ context are quite
common so this is a more general situation that doesn't really need to
involve TTY and some locking oddities there.

This all is quite subtle and easy to miss so please make sure to
describe both scenarios in the changelog. For the comment above I would
rephrase as follows:
/*
* Explicitly disable interrupts before taking seqlock to prevent
* any IRQ handler from calling into the page allocator
* (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and live
* lock.
*/

Thanks!

Tetsuo Handa

unread,
Apr 4, 2023, 7:20:00 AM4/4/23
to Michal Hocko, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On 2023/04/04 20:05, Michal Hocko wrote:
> But you are right that GFP_ATOMIC allocations from IRQ context are quite
> common so this is a more general situation that doesn't really need to
> involve TTY and some locking oddities there.

Yes, that's why "[PATCH] mm/page_alloc: don't check zonelist_update_seq from
atomic allocations" was proposed; I consider that zonelist_update_seq should
not be checked from atomic context.

From lockdep perspective, keeping locking dependency simpler is the better,
even if we go with printk_deferred_enter()/printk_deferred_exit() approach.

Tetsuo Handa

unread,
Apr 4, 2023, 10:32:13 AM4/4/23
to Michal Hocko, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer()
with port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}

This deadlock scenario can be eliminated by

preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mho...@suse.com>
Cc: Petr Mladek <pml...@suse.com>
---
Changes in v2:
Update patch description and comment.

mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..e8b4f294d763 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;

+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */

Michal Hocko

unread,
Apr 4, 2023, 11:20:37 AM4/4/23
to Tetsuo Handa, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Tue 04-04-23 23:31:58, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.
>
> One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
>
> CPU0
> ----
> __build_all_zonelists() {
> write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
> // e.g. timer interrupt handler runs at this moment
> some_timer_func() {
> kmalloc(GFP_ATOMIC) {
> __alloc_pages_slowpath() {
> read_seqbegin(&zonelist_update_seq) {
> // spins forever because zonelist_update_seq.seqcount is odd
> }
> }
> }
> }
> // e.g. timer interrupt handler finishes
> write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
> }
>
> This deadlock scenario can be easily eliminated by not calling
> read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
> requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
> requests. But Michal Hocko does not know whether we should go with this
> approach.

It would have been more useful to explain why that is not preferred or
desirable.
I have really hard time to wrap my head around this changelog. I would
rephrase as follows.

The syzbot has noticed the following deadlock scenario[1]
CPU0 CPU1
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); (A)
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags); (B)
spin_lock_irqsave(&port->lock, flags); <<< spinning on (B)
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); <<< spinning on (A)

This can happen during memory hotplug operation. This means that
write_seqlock on zonelist_update_seq is not allowed to call into
synchronous printk code path. This can be avoided by using a deferred
printk context.

This is not the only problematic scenario though. Another one would be
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); <<< (A)
<IRQ>
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) <<< spinning on (A)

Allocations from (soft)IRQ contexts are quite common. This can be
avoided by disabling interrupts for this path so we won't self livelock.

> Reported-by: syzbot <syzbot+223c74...@syzkaller.appspotmail.com>
> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
> Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
> Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
> Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mho...@suse.com>
> Cc: Petr Mladek <pml...@suse.com>

Anyway the patch is correct
Acked-by: Michal Hocko <mho...@suse.com>

Andrew Morton

unread,
Apr 4, 2023, 5:25:33 PM4/4/23
to Tetsuo Handa, Michal Hocko, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
On Tue, 4 Apr 2023 23:31:58 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:

> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.

I queued this, along with a note that an updated changelog is likely.

Do we feel that a -stable backport is warranted? I think so, from your
earlier comments. Please add the cc:stable to the changelog in this
situation.

Michal Hocko

unread,
Apr 5, 2023, 4:28:11 AM4/5/23
to Andrew Morton, Tetsuo Handa, Petr Mladek, Patrick Daly, Mel Gorman, David Hildenbrand, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
Memory hotplug is pretty rare event so the deadlock is quite unlikely.
On the other hand the fix is pretty easy so it shouldn't hurt to have it
in stable kernels.

Petr Mladek

unread,
Apr 5, 2023, 4:53:16 AM4/5/23
to Michal Hocko, Andrew Morton, Tetsuo Handa, Patrick Daly, Mel Gorman, David Hildenbrand, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Järvinen, syzbot, linux-mm
Note that printk_deferred_enter()/exit() has been added in v5.15-rc1
by the commit 85e3e7fbbb720b9897 ("printk: remove NMI tracking").

The commit has non-trivial dependencies. Any backport for older stable
kernel would need a custom patch just adding these two wrappers.

Best Regards,
Petr

Mel Gorman

unread,
Apr 5, 2023, 5:10:39 AM4/5/23
to Michal Hocko, Tetsuo Handa, Petr Mladek, Patrick Daly, David Hildenbrand, Andrew Morton, Sergey Senozhatsky, Steven Rostedt, John Ogness, syzkall...@googlegroups.com, Ilpo Jï¿œrvinen, syzbot, linux-mm
It would have undesirable side-effects. !__GFP_DIRECT_RECLAIM does not mean
that the caller is happening from interrupt context or is a re-entrant
allocation request from IRQ context. Special casing __GFP_DIRECT_RECLAIM
could allow a speculative allocation to simply prematurely fail due to
memory hotplug.

This deadlock could be mitigated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM
allocation request but it has undesirable consequences.
!GFP_DIRECT_RECLAIM applies to allocations other than
atomic allocations from interrupt context such as GFP_NOWAIT
allocations. These type of allocations could prematurely fail due to
a memory hotplug race or cpuset update and while this is probably
harmless and recoverable, it's undesirable and unnecessary to fix
the deadlock.

I imagine there could also be checks for callers from IRQ context but on
some older chips, checking if IRQs are disabled is expensive so also is
an undesirable fix. Maybe the same is true for newer CPUs, but I didn't
check. The printk_deferred option seems the most harmless option for now
and maybe printk_deferred will ultimately go away.

Other than potential changelog updates

Acked-by: Mel Gorman <mgo...@techsingularity.net>

--
Mel Gorman
SUSE Labs
Reply all
Reply to author
Forward
0 new messages