terminated ioc 804b scsi 0 state c xfer 0

Dan Langille

unread,

Apr 24, 2016, 9:36:00 AM4/24/16

to

More of the pasted output is also at https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae <https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae> and added smartctl output.

I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI SAS2008 card. It's running a zfs root system.

This morning the system was unresponsive via ssh. Attempts to log in at the console did not yield a password prompt.

A power cycle brought the system online. Inspecting /var/log/messages, I found about 63,000 entries similar to those which appear below.

zpool status of all are OK. A scrub is in progress for one pool (since before this issue arose). da7 is in that pool.

Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID 674 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 00 18 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer 12288
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b scsi 0 state c xfer 0

Is this a cabling issue? The drive is a SATA device (smartctl output in the URL above). Anyone familiar with these errors?

--
Dan Langille - BSDCan / PGCon
d...@langille.org

_______________________________________________
freebs...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi...@freebsd.org"

Alan Somers

unread,

Apr 24, 2016, 10:16:58 AM4/24/16

to

"terminated ioc" means that the HBA decided to terminate the command.
"804b" is an LSI internal code. Steve (CC'd) might be able to make sense
of it. I doubt there's anything wrong with your cabling, but if a power
cycle fixed the problem, you might've had a firmware crash in the HDD, HBA,
or expander.

-Alan

Dan Langille

unread,

Apr 25, 2016, 8:17:55 AM4/25/16

to

>
> On Apr 24, 2016, at 9:35 AM, Dan Langille <d...@langille.org> wrote:
>
> More of the pasted output is also at https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae <https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae> and added smartctl output.
>
> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI SAS2008 card. It's running a zfs root system.
>
> This morning the system was unresponsive via ssh. Attempts to log in at the console did not yield a password prompt.
>
> A power cycle brought the system online. Inspecting /var/log/messages, I found about 63,000 entries similar to those which appear below.
>
> zpool status of all are OK. A scrub is in progress for one pool (since before this issue arose). da7 is in that pool.
>
>
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID 674 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 00 18 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer 12288
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b scsi 0 state c xfer 0
>
> Is this a cabling issue? The drive is a SATA device (smartctl output in the URL above). Anyone familiar with these errors?

This morning:

13410079654596185797 REMOVED 0 0 0 was /dev/da7p3

At least I know i'm looking for Serial Number: 13Q8PNBYS

From the logs:

Apr 25 05:34:50 knew kernel: da7 at mps1 bus 0 scbus1 target 17 lun 0
Apr 25 05:34:50 knew kernel: da7: <ATA TOSHIBA DT01ACA3 ABB0> s/n 13Q8PNBYS detached
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 53 e0 00 00 08 00 length 4096 SMID 88 terminated ioc 804b scsi 0 state c xfer 0
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00 length 16384 SMID 204 terminated ioc 804b scsi 0 state c xfer(da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 53 e0 00 00 08 00
Apr 25 05:34:51 knew kernel: 0
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 d8 00 00 20 00 length 16384 SMID 260 terminated ioc 804b scsi 0 state c xfer(da7: 0
Apr 25 05:34:51 knew kernel: mps1:0: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 00 00 10 00 length 8192 SMID 484 terminated ioc 804b scsi 0 state c xfer 17:0
Apr 25 05:34:51 knew kernel: 0): (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 00 00 90 00 length 73728 SMID 548 terminated ioc 804b scsi 0 state c xfeError 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: r 0
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00 length 4096 SMID 435 terminated ioc 804b scsi 0 state c xfer (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: 0
Apr 25 05:34:51 knew kernel: (da7:mps1: mps1:0:IOCStatus = 0x4b while resetting device 0xa
Apr 25 05:34:51 knew kernel: 17:mps1: 0): Unfreezing devq for target ID 17
Apr 25 05:34:51 knew kernel: Error 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 d8 00 00 20 00
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 00 00 10 00
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 00 00 90 00
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
Apr 25 05:34:51 knew kernel: GEOM_MIRROR: Device swap: provider da7p2 disconnected.
Apr 25 05:34:51 knew devd: Executing 'logger -p kern.notice -t ZFS 'vdev is removed, pool_guid=15378250086669402288 vdev_guid=13410079654596185797''
Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Periph destroyed
Apr 25 05:34:51 knew ZFS: vdev is removed, pool_guid=15378250086669402288 vdev_guid=13410079654596185797

--
Dan Langille - BSDCan / PGCon
d...@langille.org

Scott Long via freebsd-scsi

unread,

Apr 25, 2016, 11:35:51 AM4/25/16

to

Hi Dan,

Can you share the entire console log for the uptime? What you’ve pasted is missing the initial messages of the problem. the “terminated ioc” messages are likely because the driver has decided to reset the drive and terminate all outstanding I/O to it. In other words, they’re red herrings. The reason for the driver deciding to do the reset is likely earlier in the log.

Thanks,
Scott

> On Apr 24, 2016, at 7:35 AM, Dan Langille <d...@langille.org> wrote:
>
> More of the pasted output is also at https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae <https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae> and added smartctl output.
>
> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI SAS2008 card. It's running a zfs root system.
>
> This morning the system was unresponsive via ssh. Attempts to log in at the console did not yield a password prompt.
>
> A power cycle brought the system online. Inspecting /var/log/messages, I found about 63,000 entries similar to those which appear below.
>
> zpool status of all are OK. A scrub is in progress for one pool (since before this issue arose). da7 is in that pool.
>
>
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID 674 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 00 18 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer 12288
> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0
> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b scsi 0 state c xfer 0
>
> Is this a cabling issue? The drive is a SATA device (smartctl output in the URL above). Anyone familiar with these errors?
>

Dan Langille

unread,

Apr 25, 2016, 11:36:24 AM4/25/16

to

> On Apr 25, 2016, at 11:32 AM, Scott Long <scott...@yahoo.com> wrote:
>
> Hi Dan,
>
> Can you share the entire console log for the uptime? What you’ve pasted is missing the initial messages of the problem. the “terminated ioc” messages are likely because the driver has decided to reset the drive and terminate all outstanding I/O to it. In other words, they’re red herrings. The reason for the driver deciding to do the reset is likely earlier in the log.

Yes, does this help anything?

The 'core dumped' messages relate to Bacula regression testing. I don't think there is anything helpful here for you:

Apr 13 07:59:52 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 13 12:06:14 knew kernel: pid 57706 (bacula-sd), uid 1001: exited on signal 11 (core dumped)
Apr 13 15:17:42 knew sshd[31059]: fatal: Read from socket failed: Connection reset by peer [preauth]
Apr 14 07:23:05 knew kernel: sonewconn: pcb 0xfffff8035dd21dc8: Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences)
Apr 16 12:54:07 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 17 03:19:05 knew kernel: pid 38425 (bacula-sd), uid 1001: exited on signal 11 (core dumped)
Apr 17 06:43:26 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 17 06:55:53 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 17 09:21:16 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 19 18:12:19 knew kernel: (sa1:mps0:0:0:0): 64512-byte tape record bigger than supplied buffer
Apr 20 14:03:05 knew su: BAD SU dan to root on /dev/pts/2
Apr 20 14:03:11 knew last message repeated 2 times
Apr 20 14:03:15 knew su: dan to root on /dev/pts/2
Apr 20 18:52:14 knew kernel: (sa1:mps0:0:0:0): 64512-byte tape record bigger than supplied buffer
Apr 21 08:10:52 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer
Apr 23 03:56:28 knew kernel: pid 80961 (bacula-fd), uid 1002: exited on signal 11 (core dumped)
Apr 23 09:41:33 knew kernel: pid 51735 (bacula-sd), uid 1002: exited on signal 11 (core dumped)
Apr 24 05:14:46 knew kernel: pid 4529 (bacula-dir), uid 1002: exited on signal 11 (core dumped)
Apr 24 07:22:09 knew kernel: (sa0:sym0:0:1:0): 64512-byte tape record bigger than supplied buffer

Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 terminated ioc 804b scsi 0 state c xfer 0
Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0

It includes three lines which appear below

Dan Langille

unread,

Apr 25, 2016, 11:40:07 AM4/25/16

to

> On Apr 25, 2016, at 8:17 AM, Dan Langille <d...@langille.org> wrote:

>
>>
>> On Apr 24, 2016, at 9:35 AM, Dan Langille <d...@langille.org> wrote:
>>
>> More of the pasted output is also at https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae <https://gist.github.com/dlangille/1fa3135334089c6603e2ec5da946d9ae> and added smartctl output.
>>
>> I have a FreeBSD 10.2-RELEASE-p14 box in which there is an LSI SAS2008 card. It's running a zfs root system.
>>
>> This morning the system was unresponsive via ssh. Attempts to log in at the console did not yield a password prompt.
>>
>> A power cycle brought the system online. Inspecting /var/log/messages, I found about 63,000 entries similar to those which appear below.
>>
>> zpool status of all are OK. A scrub is in progress for one pool (since before this issue arose). da7 is in that pool.
>>
>>
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8d 90 c6 18 00 00 10 00 length 8192 SMID 774 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 614 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 08 00 00 20 00 length 16384 SMID 974 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b 6f ef 50 00 00 08 00 length 4096 SMID 674 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 8b 0f a2 48 00 00 18 00 length 12288 SMID 177 terminated ioc 804b scsi 0 state c xfer 12288
>> Apr 24 11:25:55 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 ab 8f a1 38 00 00 08 00 length 4096 SMID 908 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 70 00 00 20 00 length 16384 SMID 376 terminated ioc 804b scsi 0 state c xfer 0
>> Apr 24 11:25:56 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8b d9 97 50 00 00 20 00 length 16384 SMID 172 terminated ioc 804b scsi 0 state c xfer 0
>>
>> Is this a cabling issue? The drive is a SATA device (smartctl output in the URL above). Anyone familiar with these errors?
>

> This morning:
>
> 13410079654596185797 REMOVED 0 0 0 was /dev/da7p3
>
> At least I know i'm looking for Serial Number: 13Q8PNBYS
>
> From the logs:
>
> Apr 25 05:34:50 knew kernel: da7 at mps1 bus 0 scbus1 target 17 lun 0
> Apr 25 05:34:50 knew kernel: da7: <ATA TOSHIBA DT01ACA3 ABB0> s/n 13Q8PNBYS detached
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 53 e0 00 00 08 00 length 4096 SMID 88 terminated ioc 804b scsi 0 state c xfer 0
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00 length 16384 SMID 204 terminated ioc 804b scsi 0 state c xfer(da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 53 e0 00 00 08 00
> Apr 25 05:34:51 knew kernel: 0
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 d8 00 00 20 00 length 16384 SMID 260 terminated ioc 804b scsi 0 state c xfer(da7: 0
> Apr 25 05:34:51 knew kernel: mps1:0: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 00 00 10 00 length 8192 SMID 484 terminated ioc 804b scsi 0 state c xfer 17:0
> Apr 25 05:34:51 knew kernel: 0): (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 00 00 90 00 length 73728 SMID 548 terminated ioc 804b scsi 0 state c xfeError 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: r 0
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 f8 00 00 20 00
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00 length 4096 SMID 435 terminated ioc 804b scsi 0 state c xfer (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request
> Apr 25 05:34:51 knew kernel: 0
> Apr 25 05:34:51 knew kernel: (da7:mps1: mps1:0:IOCStatus = 0x4b while resetting device 0xa
> Apr 25 05:34:51 knew kernel: 17:mps1: 0): Unfreezing devq for target ID 17
> Apr 25 05:34:51 knew kernel: Error 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 d8 33 26 d8 00 00 20 00
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 e6 6c 42 40 00 00 10 00

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 e4 d8 2a 90 00 00 90 00

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 4d ac ed b8 00 00 08 00

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): CAM status: Unconditionally Re-queue Request

> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Error 5, Periph was invalidated
> Apr 25 05:34:51 knew kernel: GEOM_MIRROR: Device swap: provider da7p2 disconnected.
> Apr 25 05:34:51 knew devd: Executing 'logger -p kern.notice -t ZFS 'vdev is removed, pool_guid=15378250086669402288 vdev_guid=13410079654596185797''
> Apr 25 05:34:51 knew kernel: (da7:mps1:0:17:0): Periph destroyed
> Apr 25 05:34:51 knew ZFS: vdev is removed, pool_guid=15378250086669402288 vdev_guid=13410079654596185797

Current status: after powering off the box, reseating the cables for that drive, I powered up the system and a resilver commenced which completed in 30 minutes.

Seems OK now. I am not sure if the two events are related.

Stephen McConnell via freebsd-scsi

unread,

Apr 25, 2016, 12:39:05 PM4/25/16

to

Recently, a bug was uncovered where a device is gets 'lost'.

Here's what happens:
The first message in your "failure on Monday" log is for
'mpssas_prepare_remove'. This message is likely logged because the FW sends
an event to the driver that the device is no longer responsive (pulled,
cable issue, or something else). When the driver gets this event, it sends
a reset to the device to clear out any pending I/O. This is where the
'terminated ioc' messages come from. When the reset completes, the driver
is supposed to send a SAS_IO_UNIT message to FW so that the DevHandle for
that disk is removed from FW's list. Then, when the device comes back
on-line, everything is fine. But, with this bug, before that SAS_IO_UNIT
message is sent to FW, the driver exits the function where that happens
(mpssas_remove_device). This happens where you see the log message,
"IOCStatus - 0x4b while resetting device 0x0a". The driver logs that
message and then exits. What the driver should do is log that message and
continue on to send the SAS_IO_UNIT message to FW. The fix is to remove the
two lines in the driver shown here with '>>':

if (le16toh(reply->IOCStatus) != MPI2_IOCSTATUS_SUCCESS) {
mps_printf(sc, "IOCStatus = 0x%x while resetting device
0x%x\n",
le16toh(reply->IOCStatus), handle);
>> mpssas_free_tm(sc, tm);
>> return;
}

A reboot will solve the problem, as you saw, but the real fix is to remove
the DevHandle as described above. This fix will go into the driver before
the next scheduled release and then MFC'd to 10.x.

Steve

Scott Long via freebsd-scsi

unread,

Apr 25, 2016, 1:33:08 PM4/25/16

to

Thanks for the diagnosis, Steve. I forgot about that case. We should also make
this chain of events more evident in the syslog, it’s very confusing when it happens.
I’m not exactly sure yet what it should look like.

Scott

Dan Langille

unread,

May 30, 2016, 2:27:55 PM5/30/16

to

Just for the record, this happened again this morning. Fixed by power cycle.

May 30 03:22:08 knew kernel: mps1: mpssas_prepare_remove: Sending reset for target ID 17
May 30 03:22:10 knew kernel: da7 at mps1 bus 0 scbus1 target 17 lun 0
May 30 03:22:10 knew kernel: da7: <ATA TOSHIBA DT01ACA3 ABB0> s/n 13Q8PNBYS detached
May 30 03:22:10 knew kernel: (da7:mps1:0:17:0): READ(10). CDB: 28 00 8c 5c 91 c0 00 00 08 00 length 4096 SMID 179 terminated ioc 804b scsi 0 state c xfer 0
May 30 03:22:10 knew kernel: (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 6b bf db a0 00 00 f0 00 length 122880 SMID 938 terminated ioc 804b scsi 0 state c xf(da7:mps1:0:17:0): READ(10). CDB: 28 00 8c 5c 91 c0 00 00 08 00
May 30 03:22:10 knew kernel: er 122880

> Steve
>
>>
>> --
>> Dan Langille - BSDCan / PGCon
>> d...@langille.org
>>
>>
>>
>>

Stephen McConnell via freebsd-scsi

unread,

May 31, 2016, 12:18:06 PM5/31/16

to

I just realized that you're using mps, not mpr. The fix went into the mpr
driver, but not mps yet. It'll have to be ported over to mps.

Steve

Dan Langille

unread,

May 31, 2016, 12:20:29 PM5/31/16

to

This hit me again last night. Same drive again. Power cycle cleared it.

Now I'm wondering if it's heat or dud drive related.