On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
> >> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
> >> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
> >
> > 4210818301 is 0xfafbfcfd
> >
> > 0xfafbfcfd is ATA_TAG_POISON.
> >
> > ATA_TAG_POISON is set by ata_qc_free(), so it appears that
> > ata_scsi_deferred_qc_work() is trying to issue a QC that has
> > already been freed.
>
> I checked the code but I fail to see any path that can lead to this happening.
> I did more tests using qemu q35 machine as used by syzbot, and everything looks
> fine. So not sure what is happening here. I will dig further.
Hello Damien,
My best guess:
since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
on ap->deferred_qc.
If it was an NCQ abort, ata_eh_set_pending() would have been called to
clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
appears that we did not get an error IRQ.
To me, that leaves a timeout as the most likely scenario.
I.e. SCSI EH is called without ata_eh_set_pending() having been called.
(Currently ata_eh_set_pending() is the function that clears
ap->deferred_qc)
If I look at ata_scsi_cmd_error_handler() it will only break if:
if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)
If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
(because ATA_QCFLAG_ACTIVE is only set by qc_issue()).
Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
else clause which calls:
scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
That might potentially free the tag to the block layer to reuse,
while ap->deferred_qc is still set (with the same tag).
Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
tag? because block layer has now reused the tag (since SCSI completed the
command).
I would possibly have expected some kind of print from SCSI in this case.
(But since the else clause finishes the command normally, perhaps not?)
But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
which clears ap->deferred_qc.
Another possibility... again, timed out commands will not have called
ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
which will queue delayed work, and the worker function scmd_eh_abort_handler()
will call scsi_eh_scmd_add(), which calls
scsi_host_set_state(shost, SHOST_RECOVERY).
We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
issue non-internal commands once EH is pending"), so that we will defer commands
even when EH is pending. But in the case of timeout, there will be no error IRQ,
so we will not do an early return in __ata_scsi_queuecmd(), so we could set
qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
scsi_host_set_state(shost, SHOST_RECOVERY).
Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
should handle this case.
I would probably hack some QEMU to not send a reply, so that we will get block
layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
most likely problematic code to me.
Kind regards,
Niklas