[syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue

2 views
Skip to first unread message

syzbot

unread,
Feb 17, 2026, 3:55:37 PM (2 days ago) Feb 17
to cas...@kernel.org, dle...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
kernel config: https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-ca4ee40b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c714adf37ddd/vmlinux-ca4ee40b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4d56cd9f6175/bzImage-ca4ee40b.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1f77b8...@syzkaller.appspotmail.com

------------[ cut here ]------------
UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
CPU: 2 UID: 0 PID: 1282 Comm: kworker/2:1H Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events_highpri ata_scsi_deferred_qc_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
__ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
process_scheduled_works kernel/workqueue.c:3358 [inline]
worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
kthread+0x370/0x450 kernel/kthread.c:467
ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
---[ end trace ]---
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 2 UID: 0 PID: 1282 Comm: kworker/2:1H Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events_highpri ata_scsi_deferred_qc_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
vpanic+0x552/0x970 kernel/panic.c:650
panic+0xd1/0xe0 kernel/panic.c:787
check_panic_on_warn kernel/panic.c:524 [inline]
check_panic_on_warn.cold+0x19/0x34 kernel/panic.c:519
__ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
process_scheduled_works kernel/workqueue.c:3358 [inline]
worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
kthread+0x370/0x450 kernel/kthread.c:467
ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Niklas Cassel

unread,
Feb 18, 2026, 4:45:49 AM (2 days ago) Feb 18
to syzbot, dle...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Tue, Feb 17, 2026 at 12:55:35PM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: ca4ee40bf13d Partly revert "drm/hyperv: Remove reference t..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13c6c722580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a771bfd268751cd6
> dashboard link: https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff
> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-ca4ee40b.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/c714adf37ddd/vmlinux-ca4ee40b.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/4d56cd9f6175/bzImage-ca4ee40b.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+1f77b8...@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'

4210818301 is 0xfafbfcfd

0xfafbfcfd is ATA_TAG_POISON.

ATA_TAG_POISON is set by ata_qc_free(), so it appears that
ata_scsi_deferred_qc_work() is trying to issue a QC that has
already been freed.


Kind regards,
Niklas

Damien Le Moal

unread,
Feb 19, 2026, 5:28:39 AM (21 hours ago) Feb 19
to Niklas Cassel, syzbot, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
I checked the code but I fail to see any path that can lead to this happening.
I did more tests using qemu q35 machine as used by syzbot, and everything looks
fine. So not sure what is happening here. I will dig further.

--
Damien Le Moal
Western Digital Research

Niklas Cassel

unread,
Feb 19, 2026, 5:44:42 PM (8 hours ago) Feb 19
to syzbot, syzk...@googlegroups.com, dle...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello syzkaller folks,

We syzkaller seems to have found a bug that it can reproduce very easily.

Looking at the dashboard for this bug:
https://syzkaller.appspot.com/bug?extid=1f77b8ca15336fff21ff

It has so far been reproduced 4 times in 3 days.

However, there is no reproducer yet.

Any advice on how we can try to trigger this without an exact reproducer
available yet?


Kind regards,
Niklas


On Tue, Feb 17, 2026 at 12:55:35PM -0800, syzbot wrote:

Niklas Cassel

unread,
Feb 19, 2026, 7:55:46 PM (6 hours ago) Feb 19
to Damien Le Moal, syzbot, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
> >> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> >> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
> >
> > 4210818301 is 0xfafbfcfd
> >
> > 0xfafbfcfd is ATA_TAG_POISON.
> >
> > ATA_TAG_POISON is set by ata_qc_free(), so it appears that
> > ata_scsi_deferred_qc_work() is trying to issue a QC that has
> > already been freed.
>
> I checked the code but I fail to see any path that can lead to this happening.
> I did more tests using qemu q35 machine as used by syzbot, and everything looks
> fine. So not sure what is happening here. I will dig further.

Hello Damien,


My best guess:
since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
on ap->deferred_qc.

If it was an NCQ abort, ata_eh_set_pending() would have been called to
clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
appears that we did not get an error IRQ.

To me, that leaves a timeout as the most likely scenario.

I.e. SCSI EH is called without ata_eh_set_pending() having been called.
(Currently ata_eh_set_pending() is the function that clears
ap->deferred_qc)



If I look at ata_scsi_cmd_error_handler() it will only break if:

if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)

If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
(because ATA_QCFLAG_ACTIVE is only set by qc_issue()).

Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
else clause which calls:
scsi_eh_finish_cmd(scmd, &ap->eh_done_q);


That might potentially free the tag to the block layer to reuse,
while ap->deferred_qc is still set (with the same tag).

Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
tag? because block layer has now reused the tag (since SCSI completed the
command).

I would possibly have expected some kind of print from SCSI in this case.
(But since the else clause finishes the command normally, perhaps not?)

But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
which clears ap->deferred_qc.



Another possibility... again, timed out commands will not have called
ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
which will queue delayed work, and the worker function scmd_eh_abort_handler()
will call scsi_eh_scmd_add(), which calls
scsi_host_set_state(shost, SHOST_RECOVERY).

We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
issue non-internal commands once EH is pending"), so that we will defer commands
even when EH is pending. But in the case of timeout, there will be no error IRQ,
so we will not do an early return in __ata_scsi_queuecmd(), so we could set
qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
scsi_host_set_state(shost, SHOST_RECOVERY).

Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
should handle this case.


I would probably hack some QEMU to not send a reply, so that we will get block
layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
most likely problematic code to me.


Kind regards,
Niklas

Damien Le Moal

unread,
Feb 19, 2026, 8:06:17 PM (6 hours ago) Feb 19
to Niklas Cassel, syzbot, linu...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2/20/26 09:55, Niklas Cassel wrote:
> On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>
>>> 4210818301 is 0xfafbfcfd
>>>
>>> 0xfafbfcfd is ATA_TAG_POISON.
>>>
>>> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
>>> ata_scsi_deferred_qc_work() is trying to issue a QC that has
>>> already been freed.
>>
>> I checked the code but I fail to see any path that can lead to this happening.
>> I did more tests using qemu q35 machine as used by syzbot, and everything looks
>> fine. So not sure what is happening here. I will dig further.
>
> Hello Damien,
>
>
> My best guess:
> since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
> on ap->deferred_qc.
>
> If it was an NCQ abort, ata_eh_set_pending() would have been called to
> clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
> appears that we did not get an error IRQ.
>
> To me, that leaves a timeout as the most likely scenario.

Good point. I think the timeout case was completely overlooked...
That should be fairly easy to debug: I just need to add have the deferred work
do nothing to see the deferred qc timeout.

Let me hack something and come up with a fix.
Reply all
Reply to author
Forward
0 new messages