[syzbot] [pci?] linux-next test error: general protection fault in msix_capability_init

11 views
Skip to first unread message

syzbot

unread,
Mar 24, 2025, 7:58:33 PM3/24/25
to bhel...@google.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, linu...@vger.kernel.org, s...@canb.auug.org.au, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 882a18c2c14f Add linux-next specific files for 20250324
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17d24804580000
kernel config: https://syzkaller.appspot.com/x/.config?x=30e7faf61be4d27e
dashboard link: https://syzkaller.appspot.com/bug?extid=d33642573545e529ab61
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ea720fb0d677/disk-882a18c2.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/723a320ec217/vmlinux-882a18c2.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4f23b2e1eb2c/bzImage-882a18c2.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d33642...@syzkaller.appspotmail.com

ntfs3: Enabled Linux POSIX ACLs support
ntfs3: Read-only LZX/Xpress compression included
efs: 1.0a - http://aeschi.ch.eu.org/efs/
jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc.
romfs: ROMFS MTD (C) 2007 Red Hat, Inc.
QNX4 filesystem 0.2.3 registered.
qnx6: QNX6 filesystem 1.0.0 registered.
fuse: init (API version 7.43)
orangefs_debugfs_init: called with debug mask: :none: :0:
orangefs_init: module version upstream loaded
JFS: nTxBlock = 8192, nTxLock = 65536
SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
9p: Installing v9fs 9p2000 file system support
NILFS version 2 loaded
befs: version: 0.9.3
ocfs2: Registered cluster interface o2cb
ocfs2: Registered cluster interface user
OCFS2 User DLM kernel interface loaded
gfs2: GFS2 installed
ceph: loaded (mds proto 32)
NET: Registered PF_ALG protocol family
xor: automatically using best checksumming function avx
async_tx: api initialized (async)
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Asymmetric key parser 'pkcs8' registered
Key type pkcs7_test registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 238)
io scheduler mq-deadline registered
io scheduler kyber registered
io scheduler bfq registered
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: button: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: button: Sleep Button [SLPF]
ioatdma: Intel(R) QuickData Technology Driver 5.00
ACPI: \_SB_.LNKC: Enabled at IRQ 11
virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKD: Enabled at IRQ 10
virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKB: Enabled at IRQ 10
virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
N_HDLC line discipline registered with maxframe=4096
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
Non-volatile memory driver v1.3
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc7-next-20250324-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
RIP: 0010:msix_prepare_msi_desc drivers/pci/msi/msi.c:616 [inline]
RIP: 0010:msix_setup_msi_descs drivers/pci/msi/msi.c:640 [inline]
RIP: 0010:msix_setup_interrupts drivers/pci/msi/msi.c:680 [inline]
RIP: 0010:msix_capability_init+0x7a9/0x1550 drivers/pci/msi/msi.c:743
Code: 10 00 74 0f e8 28 9f de fc 48 ba 00 00 00 00 00 fc ff df 48 89 9c 24 d0 00 00 00 48 89 9c 24 98 01 00 00 4c 89 f0 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 86 02 00 00 41 8b 1e be 00 00 40 00 21 de
RSP: 0000:ffffc90000066ee0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000009e008 RCX: ffff8881412b8000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc90000067078
RBP: ffffc90000067138 R08: ffffffff854ea585 R09: 0000000000000000
R10: ffffc90000067020 R11: fffff5200000ce10 R12: 0000000000000000
R13: 0000000000000101 R14: 0000000000000000 R15: 1ffff9200000ce0d
FS: 0000000000000000(0000) GS:ffff888124fc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000eb38000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__pci_enable_msix_range+0x5c7/0x710 drivers/pci/msi/msi.c:851
pci_alloc_irq_vectors_affinity+0x10e/0x2b0 drivers/pci/msi/api.c:270
vp_request_msix_vectors drivers/virtio/virtio_pci_common.c:160 [inline]
vp_find_vqs_msix+0x5da/0xeb0 drivers/virtio/virtio_pci_common.c:417
vp_find_vqs+0xa0/0x7e0 drivers/virtio/virtio_pci_common.c:525
virtio_find_vqs include/linux/virtio_config.h:226 [inline]
virtio_find_single_vq include/linux/virtio_config.h:237 [inline]
probe_common+0x37b/0x6b0 drivers/char/hw_random/virtio-rng.c:155
virtio_dev_probe+0x931/0xc80 drivers/virtio/virtio.c:341
really_probe+0x2b9/0xad0 drivers/base/dd.c:658
__driver_probe_device+0x1a2/0x390 drivers/base/dd.c:800
driver_probe_device+0x50/0x430 drivers/base/dd.c:830
__driver_attach+0x45f/0x710 drivers/base/dd.c:1216
bus_for_each_dev+0x23e/0x2b0 drivers/base/bus.c:370
bus_add_driver+0x346/0x670 drivers/base/bus.c:678
driver_register+0x23a/0x320 drivers/base/driver.c:249
do_one_initcall+0x24a/0x940 init/main.c:1257
do_initcall_level+0x157/0x210 init/main.c:1319
do_initcalls+0x71/0xd0 init/main.c:1335
kernel_init_freeable+0x432/0x5d0 init/main.c:1567
kernel_init+0x1d/0x2b0 init/main.c:1457
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:msix_prepare_msi_desc drivers/pci/msi/msi.c:616 [inline]
RIP: 0010:msix_setup_msi_descs drivers/pci/msi/msi.c:640 [inline]
RIP: 0010:msix_setup_interrupts drivers/pci/msi/msi.c:680 [inline]
RIP: 0010:msix_capability_init+0x7a9/0x1550 drivers/pci/msi/msi.c:743
Code: 10 00 74 0f e8 28 9f de fc 48 ba 00 00 00 00 00 fc ff df 48 89 9c 24 d0 00 00 00 48 89 9c 24 98 01 00 00 4c 89 f0 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 86 02 00 00 41 8b 1e be 00 00 40 00 21 de
RSP: 0000:ffffc90000066ee0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000009e008 RCX: ffff8881412b8000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc90000067078
RBP: ffffc90000067138 R08: ffffffff854ea585 R09: 0000000000000000
R10: ffffc90000067020 R11: fffff5200000ce10 R12: 0000000000000000
R13: 0000000000000101 R14: 0000000000000000 R15: 1ffff9200000ce0d
FS: 0000000000000000(0000) GS:ffff8881250c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000eb38000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 10 00 adc %al,(%rax)
2: 74 0f je 0x13
4: e8 28 9f de fc call 0xfcde9f31
9: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx
10: fc ff df
13: 48 89 9c 24 d0 00 00 mov %rbx,0xd0(%rsp)
1a: 00
1b: 48 89 9c 24 98 01 00 mov %rbx,0x198(%rsp)
22: 00
23: 4c 89 f0 mov %r14,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 0f b6 04 10 movzbl (%rax,%rdx,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 0f 85 86 02 00 00 jne 0x2bc
36: 41 8b 1e mov (%r14),%ebx
39: be 00 00 40 00 mov $0x400000,%esi
3e: 21 de and %ebx,%esi


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Aithal, Srikanth

unread,
Mar 24, 2025, 11:57:26 PM3/24/25
to syzbot, bhel...@google.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, linu...@vger.kernel.org, s...@canb.auug.org.au, syzkall...@googlegroups.com
Hello,

Even I hit similar crash while boot a KVM guest with next-20250325 kernel.

[ 1.472006] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 1.472243] #PF: supervisor read access in kernel mode
[ 1.472243] #PF: error_code(0x0000) - not-present page
[ 1.472243] PGD 0 P4D 0
[ 1.472243] Oops: Oops: 0000 [#1] SMP NOPTI
[ 1.472243] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.14.0-rc7-next-20250324 #10 PREEMPT(voluntary)
[ 1.472243] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
unknown 02/02/2022
[ 1.472243] RIP: 0010:msix_prepare_msi_desc+0x2f/0x80
[ 1.472243] Code: 00 00 48 89 f0 48 8b 52 20 66 81 4e 4c 01 01 c7 46
04 01 00 00 00 8b 8f 94 03 00 00 89 4e 50 48 8b b7 a8 07 00 00 48 89 70
58 <8b> 0a 31 d2 81 e1 00 00 40 00 75 0c 0f b6 50 4d d0 ea 83 f2 01 83
[ 1.472243] RSP: 0018:ffffa1b940027988 EFLAGS: 00010202
[ 1.472243] RAX: ffffa1b9400279c8 RBX: 0000000000000000 RCX:
0000000000000017
[ 1.472243] RDX: 0000000000000000 RSI: ffffa1b940089000 RDI:
ffff8b73c55ed000
[ 1.472243] RBP: ffffa1b9400279c8 R08: 0000000000000002 R09:
ffffa1b94002795c
[ 1.472243] R10: 0000000000000000 R11: ffffffff98493250 R12:
ffff8b73d8fb0000
[ 1.472243] R13: 0000000000000000 R14: ffff8b73c55ed0c0 R15:
0000000000000100
[ 1.472243] FS: 0000000000000000(0000) GS:ffff8b7494849000(0000)
knlGS:0000000000000000
[ 1.472243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.472243] CR2: 0000000000000000 CR3: 0008000073e44000 CR4:
00000000003506f0
[ 1.472243] Call Trace:
[ 1.472243] <TASK>
[ 1.472243] ? __die+0x1a/0x60
[ 1.472243] ? page_fault_oops+0x153/0x500
[ 1.472243] ? srso_return_thunk+0x5/0x5f
[ 1.472243] ? do_user_addr_fault+0x5f/0x680
[ 1.472243] ? srso_return_thunk+0x5/0x5f
[ 1.472243] ? exc_page_fault+0x66/0x140
[ 1.472243] ? asm_exc_page_fault+0x22/0x30
[ 1.472243] ? __pfx_pci_conf1_read+0x10/0x10
[ 1.472243] ? msix_prepare_msi_desc+0x2f/0x80
[ 1.472243] ? srso_return_thunk+0x5/0x5f
[ 1.472243] __pci_enable_msix_range+0x37f/0x650
[ 1.472243] pci_alloc_irq_vectors_affinity+0xa2/0x100
[ 1.472243] vp_find_vqs_msix+0x188/0x470
[ 1.472243] vp_find_vqs+0x36/0x260
[ 1.472243] ? srso_return_thunk+0x5/0x5f
[ 1.472243] vp_modern_find_vqs+0xe/0x60
[ 1.472243] init_vq+0x348/0x3a0
[ 1.472243] ? __pfx_default_calc_sets+0x10/0x10
[ 1.472243] virtblk_probe+0x108/0xa20
[ 1.472243] ? srso_return_thunk+0x5/0x5f
[ 1.472243] ? ct_nmi_exit+0xbf/0x1d0
[ 1.472243] virtio_dev_probe+0x1aa/0x260
[ 1.472243] really_probe+0xbe/0x2c0
[ 1.472243] ? __pfx___driver_attach+0x10/0x10
[ 1.472243] __driver_probe_device+0x6e/0x110
[ 1.472243] driver_probe_device+0x1a/0xe0
[ 1.472243] __driver_attach+0x7f/0x180
[ 1.472243] bus_for_each_dev+0x6d/0xc0
[ 1.472243] bus_add_driver+0xdf/0x210
[ 1.472243] driver_register+0x50/0x100
[ 1.472243] virtio_blk_init+0x47/0x90
[ 1.472243] ? __pfx_virtio_blk_init+0x10/0x10
[ 1.472243] do_one_initcall+0x3e/0x200
[ 1.472243] ? __x86_indirect_jump_thunk_r14+0x20/0x20
[ 1.472243] kernel_init_freeable+0x199/0x2d0
[ 1.472243] ? __pfx_kernel_init+0x10/0x10
[ 1.472243] kernel_init+0x11/0x1b0
[ 1.472243] ret_from_fork+0x2b/0x40
[ 1.472243] ? __pfx_kernel_init+0x10/0x10
[ 1.472243] ret_from_fork_asm+0x1a/0x30
[ 1.472243] </TASK>
[ 1.472243] Modules linked in:
[ 1.472243] CR2: 0000000000000000
[ 1.472243] ---[ end trace 0000000000000000 ]---
[ 1.472243] RIP: 0010:msix_prepare_msi_desc+0x2f/0x80
[ 1.472243] Code: 00 00 48 89 f0 48 8b 52 20 66 81 4e 4c 01 01 c7 46
04 01 00 00 00 8b 8f 94 03 00 00 89 4e 50 48 8b b7 a8 07 00 00 48 89 70
58 <8b> 0a 31 d2 81 e1 00 00 40 00 75 0c 0f b6 50 4d d0 ea 83 f2 01 83
[ 1.472243] RSP: 0018:ffffa1b940027988 EFLAGS: 00010202
[ 1.472243] RAX: ffffa1b9400279c8 RBX: 0000000000000000 RCX:
0000000000000017
[ 1.472243] RDX: 0000000000000000 RSI: ffffa1b940089000 RDI:
ffff8b73c55ed000
[ 1.472243] RBP: ffffa1b9400279c8 R08: 0000000000000002 R09:
ffffa1b94002795c
[ 1.472243] R10: 0000000000000000 R11: ffffffff98493250 R12:
ffff8b73d8fb0000
[ 1.472243] R13: 0000000000000000 R14: ffff8b73c55ed0c0 R15:
0000000000000100
[ 1.472243] FS: 0000000000000000(0000) GS:ffff8b7494849000(0000)
knlGS:0000000000000000
[ 1.472243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.472243] CR2: 0000000000000000 CR3: 0008000073e44000 CR4:
00000000003506f0
[ 1.472243] note: swapper/0[1] exited with irqs disabled
[ 1.509205] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000009
[ 1.510194] Kernel Offset: 0x16200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1.510194] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x00000009 ]---


The guest kernel was built using the attached kernel config file.
guest_config

Aithal, Srikanth

unread,
Mar 25, 2025, 12:33:55 AM3/25/25
to syzbot, bhel...@google.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, linu...@vger.kernel.org, s...@canb.auug.org.au, syzkall...@googlegroups.com
Hello,


I tried booting host with my host config, there I am seeing different
issue but host crashes



[ 246.606736] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 246.613936] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 246.621509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.630243] task:swapper/0 state:D stack:0 pid:1 tgid:1
ppid:0 task_flags:0x0140 flags:0x00004000
[ 246.642375] Call Trace:
[ 246.645099] <TASK>
[ 246.647435] __schedule+0x45e/0x14f0
[ 246.651424] ? mntput+0x28/0x40
[ 246.654924] ? path_put+0x22/0x30
[ 246.658617] schedule+0x2b/0xf0
[ 246.662116] async_synchronize_cookie_domain+0xd0/0x120
[ 246.667942] ? __pfx_autoremove_wake_function+0x10/0x10
[ 246.673769] ? __pfx_kernel_init+0x10/0x10
[ 246.678334] async_synchronize_full+0x1b/0x30
[ 246.683182] kernel_init+0x24/0x200
[ 246.687068] ret_from_fork+0x41/0x60
[ 246.691052] ? __pfx_kernel_init+0x10/0x10
[ 246.695617] ret_from_fork_asm+0x1a/0x30
[ 246.699989] RIP: 1f0f:0x0
[ 246.702907] RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX:
1f0f2e6600000000
[ 246.711546] RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX:
2e66000000000084
[ 246.719503] RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI:
00841f0f2e660000
[ 246.727460] RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09:
000000841f0f2e66
[ 246.735418] R10: 0000000000841f0f R11: 2e66000000000084 R12:
000000841f0f2e66
[ 246.743376] R13: 0000000000841f0f R14: 2e66000000000084 R15:
1f0f2e6600000000
[ 246.751335] </TASK>
[ 246.753796] INFO: task kworker/u1028:5:1665 blocked for more than 120
seconds.
[ 246.761851] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 246.769413] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.778137] task:kworker/u1028:5 state:D stack:0 pid:1665
tgid:1665 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 246.790558] Workqueue: async async_run_entry_fn
[ 246.795608] Call Trace:
[ 246.798330] <TASK>
[ 246.800656] __schedule+0x45e/0x14f0
[ 246.804640] ? sched_clock_noinstr+0xd/0x20
[ 246.809304] ? sched_clock_cpu+0x71/0x1d0
[ 246.813773] schedule+0x2b/0xf0
[ 246.817271] schedule_timeout+0xdf/0xf0
[ 246.821545] __wait_for_common+0x93/0x180
[ 246.826013] ? __pfx_schedule_timeout+0x10/0x10
[ 246.831063] wait_for_completion+0x28/0x30
[ 246.835628] __flush_work+0x2e2/0x400
[ 246.839711] ? __pfx_wq_barrier_func+0x10/0x10
[ 246.844665] work_on_cpu_key+0x71/0x90
[ 246.848841] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 246.853793] ? __pfx_local_pci_probe+0x10/0x10
[ 246.858749] pci_device_probe+0x200/0x260
[ 246.863218] really_probe+0xf1/0x3b0
[ 246.867204] __driver_probe_device+0x8c/0x170
[ 246.872058] driver_probe_device+0x24/0xd0
[ 246.876623] __driver_attach_async_helper+0x72/0x100
[ 246.882157] async_run_entry_fn+0x37/0x120
[ 246.886721] process_one_work+0x191/0x3d0
[ 246.891180] worker_thread+0x2ce/0x400
[ 246.895348] ? __pfx_worker_thread+0x10/0x10
[ 246.900096] kthread+0x101/0x230
[ 246.903693] ? __pfx_kthread+0x10/0x10
[ 246.907870] ret_from_fork+0x41/0x60
[ 246.911853] ? __pfx_kthread+0x10/0x10
[ 246.916029] ret_from_fork_asm+0x1a/0x30
[ 246.920401] </TASK>
[ 246.922832] INFO: task kworker/u1028:6:1667 blocked for more than 121
seconds.
[ 246.930887] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 246.938447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.947179] task:kworker/u1028:6 state:D stack:0 pid:1667
tgid:1667 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 246.959598] Workqueue: async async_run_entry_fn
[ 246.964649] Call Trace:
[ 246.967362] <TASK>
[ 246.969686] __schedule+0x45e/0x14f0
[ 246.973661] ? sched_clock_noinstr+0xd/0x20
[ 246.978322] ? sched_clock_cpu+0x71/0x1d0
[ 246.982791] schedule+0x2b/0xf0
[ 246.986289] schedule_timeout+0xdf/0xf0
[ 246.990564] __wait_for_common+0x93/0x180
[ 246.995031] ? __pfx_schedule_timeout+0x10/0x10
[ 247.000081] wait_for_completion+0x28/0x30
[ 247.004645] __flush_work+0x2e2/0x400
[ 247.008725] ? __pfx_wq_barrier_func+0x10/0x10
[ 247.013678] work_on_cpu_key+0x71/0x90
[ 247.017854] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 247.022798] ? __pfx_local_pci_probe+0x10/0x10
[ 247.027751] pci_device_probe+0x200/0x260
[ 247.032219] really_probe+0xf1/0x3b0
[ 247.036192] __driver_probe_device+0x8c/0x170
[ 247.041039] driver_probe_device+0x24/0xd0
[ 247.045604] __driver_attach_async_helper+0x72/0x100
[ 247.051139] async_run_entry_fn+0x37/0x120
[ 247.055703] process_one_work+0x191/0x3d0
[ 247.060170] worker_thread+0x2ce/0x400
[ 247.064338] ? __pfx_worker_thread+0x10/0x10
[ 247.069097] kthread+0x101/0x230
[ 247.072692] ? __pfx_kthread+0x10/0x10
[ 247.076859] ret_from_fork+0x41/0x60
[ 247.080834] ? __pfx_kthread+0x10/0x10
[ 247.085010] ret_from_fork_asm+0x1a/0x30
[ 247.089381] </TASK>
[ 247.091811] INFO: task kworker/u1028:7:1669 blocked for more than 121
seconds.
[ 247.099865] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 247.107435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 247.116167] task:kworker/u1028:7 state:D stack:0 pid:1669
tgid:1669 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 247.128578] Workqueue: async async_run_entry_fn
[ 247.133626] Call Trace:
[ 247.136348] <TASK>
[ 247.138682] __schedule+0x45e/0x14f0
[ 247.142656] ? sched_clock_noinstr+0xd/0x20
[ 247.147318] ? sched_clock_cpu+0x71/0x1d0
[ 247.151777] schedule+0x2b/0xf0
[ 247.155275] schedule_timeout+0xdf/0xf0
[ 247.159549] __wait_for_common+0x93/0x180
[ 247.164008] ? __pfx_schedule_timeout+0x10/0x10
[ 247.169057] wait_for_completion+0x28/0x30
[ 247.173612] __flush_work+0x2e2/0x400
[ 247.177692] ? __pfx_wq_barrier_func+0x10/0x10
[ 247.182644] work_on_cpu_key+0x71/0x90
[ 247.186822] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 247.191774] ? __pfx_local_pci_probe+0x10/0x10
[ 247.196726] pci_device_probe+0x200/0x260
[ 247.201195] really_probe+0xf1/0x3b0
[ 247.205176] __driver_probe_device+0x8c/0x170
[ 247.210030] driver_probe_device+0x24/0xd0
[ 247.214595] __driver_attach_async_helper+0x72/0x100
[ 247.220128] async_run_entry_fn+0x37/0x120
[ 247.224693] process_one_work+0x191/0x3d0
[ 247.229161] worker_thread+0x2ce/0x400
[ 247.233338] ? __pfx_worker_thread+0x10/0x10
[ 247.238097] kthread+0x101/0x230
[ 247.241693] ? __pfx_kthread+0x10/0x10
[ 247.245871] ret_from_fork+0x41/0x60
[ 247.249853] ? __pfx_kthread+0x10/0x10
[ 247.254021] ret_from_fork_asm+0x1a/0x30
[ 247.258391] </TASK>
[ 247.260813] INFO: task kworker/u1028:8:1670 blocked for more than 121
seconds.
[ 247.268868] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 247.276436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 247.285168] task:kworker/u1028:8 state:D stack:0 pid:1670
tgid:1670 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 247.297586] Workqueue: async async_run_entry_fn
[ 247.302626] Call Trace:
[ 247.305349] <TASK>
[ 247.307682] __schedule+0x45e/0x14f0
[ 247.311667] ? sched_clock_noinstr+0xd/0x20
[ 247.316329] ? sched_clock_cpu+0x71/0x1d0
[ 247.320797] schedule+0x2b/0xf0
[ 247.324294] schedule_timeout+0xdf/0xf0
[ 247.328558] __wait_for_common+0x93/0x180
[ 247.333027] ? __pfx_schedule_timeout+0x10/0x10
[ 247.338075] wait_for_completion+0x28/0x30
[ 247.342640] __flush_work+0x2e2/0x400
[ 247.346710] ? __pfx_wq_barrier_func+0x10/0x10
[ 247.351663] work_on_cpu_key+0x71/0x90
[ 247.355832] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 247.360785] ? __pfx_local_pci_probe+0x10/0x10
[ 247.365737] pci_device_probe+0x200/0x260
[ 247.370205] really_probe+0xf1/0x3b0
[ 247.374186] __driver_probe_device+0x8c/0x170
[ 247.379041] driver_probe_device+0x24/0xd0
[ 247.383595] __driver_attach_async_helper+0x72/0x100
[ 247.389131] async_run_entry_fn+0x37/0x120
[ 247.393694] process_one_work+0x191/0x3d0
[ 247.398161] worker_thread+0x2ce/0x400
[ 247.402336] ? __pfx_worker_thread+0x10/0x10
[ 247.407093] kthread+0x101/0x230
[ 247.410688] ? __pfx_kthread+0x10/0x10
[ 247.414865] ret_from_fork+0x41/0x60
[ 247.418848] ? __pfx_kthread+0x10/0x10
[ 247.423015] ret_from_fork_asm+0x1a/0x30
[ 247.427387] </TASK>
[ 247.429833] INFO: task modprobe:1973 blocked for more than 121 seconds.
[ 247.437208] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 247.444778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 247.453510] task:modprobe state:D stack:0 pid:1973
tgid:1973 ppid:1572 task_flags:0x400100 flags:0x00004002
[ 247.465831] Call Trace:
[ 247.468552] <TASK>
[ 247.470887] __schedule+0x45e/0x14f0
[ 247.474871] ? add_uevent_var+0x99/0x190
[ 247.479243] ? kobject_get_path+0x72/0x120
[ 247.483809] schedule+0x2b/0xf0
[ 247.487306] async_synchronize_cookie_domain+0xd0/0x120
[ 247.493121] ? __pfx_autoremove_wake_function+0x10/0x10
[ 247.498947] async_synchronize_full+0x1b/0x30
[ 247.503803] do_init_module+0x1f3/0x270
[ 247.508078] load_module+0x2bb4/0x2d10
[ 247.512255] ? kernel_read_file+0x2a4/0x320
[ 247.516917] init_module_from_file+0x96/0xd0
[ 247.521667] ? init_module_from_file+0x96/0xd0
[ 247.526620] idempotent_init_module+0xfc/0x2e0
[ 247.531572] __x64_sys_finit_module+0x77/0xf0
[ 247.536419] x64_sys_call+0x1f9e/0x20c0
[ 247.540694] do_syscall_64+0x6d/0x110
[ 247.544775] ? __pfx_page_put_link+0x10/0x10
[ 247.549534] ? strncpy_from_user+0x2b/0x100
[ 247.554196] ? putname+0x63/0x80
[ 247.557790] ? do_sys_openat2+0x8b/0xd0
[ 247.562067] ? __x64_sys_openat+0x58/0xa0
[ 247.566534] ? syscall_exit_to_user_mode+0x57/0x1b0
[ 247.571973] ? do_syscall_64+0x79/0x110
[ 247.576246] ? irqentry_exit+0x3f/0x50
[ 247.580423] ? exc_page_fault+0x94/0x1b0
[ 247.584793] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 247.590425] RIP: 0033:0x7f3595b5425d
[ 247.594407] RSP: 002b:00007fff77994d68 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 247.602841] RAX: ffffffffffffffda RBX: 000055edc4460b60 RCX:
00007f3595b5425d
[ 247.610790] RDX: 0000000000000000 RSI: 000055eda89bfe52 RDI:
0000000000000000
[ 247.618748] RBP: 00007fff77994e20 R08: 0000000000000040 R09:
00007fff77994db0
[ 247.626707] R10: 00007f3595c30b20 R11: 0000000000000246 R12:
000055eda89bfe52
[ 247.634665] R13: 0000000000040000 R14: 000055edc4460f60 R15:
0000000000000000
[ 247.642622] </TASK>
[ 256.846894] nvme nvme5: I/O tag 15 (100f) QID 0 timeout, completion
polled
[ 256.854920] nvme nvme6: I/O tag 3 (1003) QID 0 timeout, completion polled
[ 256.862591] nvme nvme7: I/O tag 11 (100b) QID 0 timeout, completion
polled
[ 256.870268] nvme nvme8: I/O tag 3 (1003) QID 0 timeout, completion polled
[ 318.286852] nvme nvme5: I/O tag 12 (200c) QID 0 timeout, completion
polled
[ 318.294546] nvme nvme6: I/O tag 0 (2000) QID 0 timeout, completion polled
[ 318.302129] nvme nvme7: I/O tag 8 (2008) QID 0 timeout, completion polled
[ 318.309771] nvme nvme8: I/O tag 0 (2000) QID 0 timeout, completion polled
[ 369.486768] INFO: task swapper/0:1 blocked for more than 243 seconds.
[ 369.493957] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 369.501529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.510262] task:swapper/0 state:D stack:0 pid:1 tgid:1
ppid:0 task_flags:0x0140 flags:0x00004000
[ 369.522389] Call Trace:
[ 369.525110] <TASK>
[ 369.527445] __schedule+0x45e/0x14f0
[ 369.531430] ? mntput+0x28/0x40
[ 369.534929] ? path_put+0x22/0x30
[ 369.538620] schedule+0x2b/0xf0
[ 369.542118] async_synchronize_cookie_domain+0xd0/0x120
[ 369.547944] ? __pfx_autoremove_wake_function+0x10/0x10
[ 369.553769] ? __pfx_kernel_init+0x10/0x10
[ 369.558326] async_synchronize_full+0x1b/0x30
[ 369.563180] kernel_init+0x24/0x200
[ 369.567059] ret_from_fork+0x41/0x60
[ 369.571042] ? __pfx_kernel_init+0x10/0x10
[ 369.575598] ret_from_fork_asm+0x1a/0x30
[ 369.579960] RIP: 1f0f:0x0
[ 369.582876] RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX:
1f0f2e6600000000
[ 369.591512] RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX:
2e66000000000084
[ 369.599469] RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI:
00841f0f2e660000
[ 369.607427] RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09:
000000841f0f2e66
[ 369.615384] R10: 0000000000841f0f R11: 2e66000000000084 R12:
000000841f0f2e66
[ 369.623340] R13: 0000000000841f0f R14: 2e66000000000084 R15:
1f0f2e6600000000
[ 369.631299] </TASK>
[ 369.633778] INFO: task kworker/u1028:5:1665 blocked for more than 243
seconds.
[ 369.641833] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 369.649394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.658128] task:kworker/u1028:5 state:D stack:0 pid:1665
tgid:1665 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 369.670547] Workqueue: async async_run_entry_fn
[ 369.675596] Call Trace:
[ 369.678317] <TASK>
[ 369.680652] __schedule+0x45e/0x14f0
[ 369.684636] ? sched_clock_noinstr+0xd/0x20
[ 369.689299] ? sched_clock_cpu+0x71/0x1d0
[ 369.693766] schedule+0x2b/0xf0
[ 369.697254] schedule_timeout+0xdf/0xf0
[ 369.701519] __wait_for_common+0x93/0x180
[ 369.705979] ? __pfx_schedule_timeout+0x10/0x10
[ 369.711028] wait_for_completion+0x28/0x30
[ 369.715592] __flush_work+0x2e2/0x400
[ 369.719672] ? __pfx_wq_barrier_func+0x10/0x10
[ 369.724625] work_on_cpu_key+0x71/0x90
[ 369.728801] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 369.733754] ? __pfx_local_pci_probe+0x10/0x10
[ 369.738707] pci_device_probe+0x200/0x260
[ 369.743175] really_probe+0xf1/0x3b0
[ 369.747148] __driver_probe_device+0x8c/0x170
[ 369.751996] driver_probe_device+0x24/0xd0
[ 369.756560] __driver_attach_async_helper+0x72/0x100
[ 369.762094] async_run_entry_fn+0x37/0x120
[ 369.766649] process_one_work+0x191/0x3d0
[ 369.771116] worker_thread+0x2ce/0x400
[ 369.775292] ? __pfx_worker_thread+0x10/0x10
[ 369.780042] kthread+0x101/0x230
[ 369.783638] ? __pfx_kthread+0x10/0x10
[ 369.787814] ret_from_fork+0x41/0x60
[ 369.791788] ? __pfx_kthread+0x10/0x10
[ 369.795965] ret_from_fork_asm+0x1a/0x30
[ 369.800336] </TASK>
[ 369.802759] INFO: task kworker/u1028:6:1667 blocked for more than 244
seconds.
[ 369.810813] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 369.818384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.827116] task:kworker/u1028:6 state:D stack:0 pid:1667
tgid:1667 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 369.839526] Workqueue: async async_run_entry_fn
[ 369.844574] Call Trace:
[ 369.847294] <TASK>
[ 369.849628] __schedule+0x45e/0x14f0
[ 369.853602] ? sched_clock_noinstr+0xd/0x20
[ 369.858264] ? sched_clock_cpu+0x71/0x1d0
[ 369.862722] schedule+0x2b/0xf0
[ 369.866211] schedule_timeout+0xdf/0xf0
[ 369.870476] __wait_for_common+0x93/0x180
[ 369.874934] ? __pfx_schedule_timeout+0x10/0x10
[ 369.879985] wait_for_completion+0x28/0x30
[ 369.884549] __flush_work+0x2e2/0x400
[ 369.888628] ? __pfx_wq_barrier_func+0x10/0x10
[ 369.893573] work_on_cpu_key+0x71/0x90
[ 369.897750] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 369.902703] ? __pfx_local_pci_probe+0x10/0x10
[ 369.907656] pci_device_probe+0x200/0x260
[ 369.912125] really_probe+0xf1/0x3b0
[ 369.916107] __driver_probe_device+0x8c/0x170
[ 369.920955] driver_probe_device+0x24/0xd0
[ 369.925519] __driver_attach_async_helper+0x72/0x100
[ 369.931053] async_run_entry_fn+0x37/0x120
[ 369.935607] process_one_work+0x191/0x3d0
[ 369.940075] worker_thread+0x2ce/0x400
[ 369.944244] ? __pfx_worker_thread+0x10/0x10
[ 369.949002] kthread+0x101/0x230
[ 369.952598] ? __pfx_kthread+0x10/0x10
[ 369.956774] ret_from_fork+0x41/0x60
[ 369.960747] ? __pfx_kthread+0x10/0x10
[ 369.964923] ret_from_fork_asm+0x1a/0x30
[ 369.969294] </TASK>
[ 369.971715] INFO: task kworker/u1028:7:1669 blocked for more than 244
seconds.
[ 369.979771] Not tainted 6.14.0-rc7-next-20250324-882a18c2c1-base #1
[ 369.987340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.996073] task:kworker/u1028:7 state:D stack:0 pid:1669
tgid:1669 ppid:2 task_flags:0x4208060 flags:0x00004000
[ 370.008492] Workqueue: async async_run_entry_fn
[ 370.013541] Call Trace:
[ 370.016261] <TASK>
[ 370.018595] __schedule+0x45e/0x14f0
[ 370.022579] ? sched_clock_noinstr+0xd/0x20
[ 370.027241] ? sched_clock_cpu+0x71/0x1d0
[ 370.031708] schedule+0x2b/0xf0
[ 370.035198] schedule_timeout+0xdf/0xf0
[ 370.039471] __wait_for_common+0x93/0x180
[ 370.043938] ? __pfx_schedule_timeout+0x10/0x10
[ 370.048987] wait_for_completion+0x28/0x30
[ 370.053551] __flush_work+0x2e2/0x400
[ 370.057621] ? __pfx_wq_barrier_func+0x10/0x10
[ 370.062565] work_on_cpu_key+0x71/0x90
[ 370.066741] ? __pfx_work_for_cpu_fn+0x10/0x10
[ 370.071684] ? __pfx_local_pci_probe+0x10/0x10
[ 370.076637] pci_device_probe+0x200/0x260
[ 370.081105] really_probe+0xf1/0x3b0
[ 370.085087] __driver_probe_device+0x8c/0x170
[ 370.089942] driver_probe_device+0x24/0xd0
[ 370.094498] __driver_attach_async_helper+0x72/0x100
[ 370.100022] async_run_entry_fn+0x37/0x120
[ 370.104586] process_one_work+0x191/0x3d0
[ 370.109053] worker_thread+0x2ce/0x400
[ 370.113230] ? __pfx_worker_thread+0x10/0x10
[ 370.117989] kthread+0x101/0x230
[ 370.121584] ? __pfx_kthread+0x10/0x10
[ 370.125753] ret_from_fork+0x41/0x60
[ 370.129737] ? __pfx_kthread+0x10/0x10
[ 370.133914] ret_from_fork_asm+0x1a/0x30
[ 370.138285] </TASK>
[ 370.140707] Future hung task reports are suppressed, see sysctl
kernel.hung_task_warnings
ubuntu_config

Takamitsu Iwai

unread,
Mar 25, 2025, 2:50:39 PM3/25/25
to srai...@amd.com, bhel...@google.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, linu...@vger.kernel.org, s...@canb.auug.org.au, syzbot+d33642...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, taka...@amazon.com
Subject: Re: [syzbot] [pci?] linux-next test error: general protection fault in msix_capability_init


> I tried booting host with my host config, there I am seeing different
> issue but host crashes

This issue is also reported in [0] that it is related to the commit
"d9f2164238d8" (PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain
flag).

The new patch is prepared in [1].

[0] https://lore.kernel.org/all/qn7fzggcj6qe6r6gdbwcz23pzdz2jx64aldccmsuheabhmjgrt@tawf5nfwuvw7/
[1] https://lore.kernel.org/all/87v7rxzct0.ffs@tglx/

Michael Roth

unread,
Mar 25, 2025, 7:04:50 PM3/25/25
to Aithal, Srikanth, linu...@vger.kernel.org, linux-...@vger.kernel.org, bhel...@google.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, linux...@vger.kernel.org, Roger Pau Monne, Juergen Gross
Also able to reproduce this trace on every boot with a basic KVM guest on an
EPYC Milan system using next-20250325 for both host/guest.

A bisect of commits to drivers/pci/msi seems to indicate the following commit
is the source of the regression:

commit d9f2164238d814d119e8c979a3579d1199e271bb
Author: Roger Pau Monne <roge...@citrix.com>
Date: Wed Feb 19 10:20:57 2025 +0100

PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag

Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.

However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip). One example is devices behind a VMD PCI
bridge. In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table. The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s). Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.

Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis. Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.

Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.

This fixes using devices behind a VMD bridge on Xen PV hardware domains.

Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them. By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain. That's the whole point of the series.

Signed-off-by: Roger Pau Monné <roge...@citrix.com>
Reviewed-by: Thomas Gleixner <tg...@linutronix.de>
Acked-by: Juergen Gross <jgr...@suse.com>
Acked-by: Bjorn Helgaas <bhel...@google.com>
Message-ID: <20250219092059.9...@citrix.com>
Signed-off-by: Juergen Gross <jgr...@suse.com>

Thanks,

Mike

Roger Pau Monné

unread,
Mar 26, 2025, 5:25:41 AM3/26/25
to 423b87d8-2ae3-48af...@amd.com, Aithal, Srikanth, linu...@vger.kernel.org, linux-...@vger.kernel.org, bhel...@google.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, linux...@vger.kernel.org, Juergen Gross
On Tue, Mar 25, 2025 at 05:37:52PM -0500, Michael Roth wrote:
> Also able to reproduce this trace on every boot with a basic KVM guest on an
> EPYC Milan system using next-20250325 for both host/guest.

Sorry for the breakage, there's a fix from Thomas at:

https://lore.kernel.org/xen-devel/87v7rxzct0.ffs@tglx/

Regards, Roger.

syzbot

unread,
Mar 31, 2025, 11:51:03 PM3/31/25
to kent.ov...@linux.dev, kent.ov...@linux.dev, linux-...@vger.kernel.org, syzkall...@googlegroups.com
> #syz close

unknown command "close"

syzbot

unread,
May 26, 2025, 2:34:27 PM5/26/25
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages