Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Bug 212914] CAM: SATA drives are getting deleted and then re-added after controller rescan

2 views
Skip to first unread message

bugzilla...@freebsd.org

unread,
Sep 22, 2016, 11:23:00 PM9/22/16
to freebs...@freebsd.org
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212914

Bug ID: 212914
Summary: CAM: SATA drives are getting deleted and then re-added
after controller rescan
Product: Base System
Version: 11.0-RC1
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebs...@FreeBSD.org
Reporter: kashya...@broadcom.com

This issue is even common for LSI/Broadcom IT HBA (mpr driver), so looks to be
a common issue.

If you have SATA drive and just do "camcontrol rescan all", SATA disc is added
and removed back by CAM layer.


On FreeBSD11.0 RC1, we are facing an issue where SATA drives connected behind
LSI's MegaRAID controller getting deleted and added back after controller
reset.
I am using Broadcom/Avago/LSI's MegaRAID Invader controller(device ID-
0x005d). The point to note here is- this behavior is not observed with SAS
drives on FreeBSD11.0-RC1.
Also on FreeBSD10.3 this behavior is not at all observed on SATA as well.
We are debugging the issue but it would be much helpful if we can get quick
inputs/pointers.

Please find below the detailed information-

OS: FreeBSD 11.0 RC1
Controller: LSI's MegaRAID invader controller

Connected devices list:

root@freeBSD11:~ # camcontrol devlist
<ST500NM0011 PA09> at scbus5 target 0 lun 0 (pass0,ada0)
<AHCI SGPIO Enclosure 1.00 0001> at scbus6 target 0 lun 0 (ses0,pass1)
<ATA ST9250610NS SN01> at scbus8 target 51 lun 0
(da9,pass11)----------------------------------------->this is SATA drive which
is getting deleted and re-added post controller reset
<SEAGATE ST9300605SS 0004> at scbus8 target 163 lun 0 (da8,pass10)
<LSI Default 5.00> at scbus9 target 0 lun 0 (da6,pass8)
<LSI Default 5.00> at scbus9 target 1 lun 0 (da2,pass4)
<LSI Default 5.00> at scbus9 target 2 lun 0 (da0,pass2)
<LSI Default 5.00> at scbus9 target 3 lun 0 (da7,pass9)
<LSI Default 5.00> at scbus9 target 4 lun 0 (da3,pass5)
<LSI Default 5.00> at scbus9 target 5 lun 0 (da1,pass3)
<SEAGATE ST600MP0005 VS09> at scbus10 target 48 lun 0 (da4,pass6)
<SEAGATE ST600MP0005 VS09> at scbus10 target 54 lun 0 (da5,pass7)


Relevant dmesg logs snippet(da9 is SATA drive which is getting deleted and
added back):

================================
mrsas0: Initiaiting OCR because of FW fault!
mrsas0: Waiting for FW to come to ready state
mrsas0: Jbod map is supported
mrsas0: Reset successful
da9 at mrsas0 bus 1 scbus8 target 51 lun 0
da9: <ATA ST9250610NS SN01> s/n 9XE02AR2 detached
(da9:mrsas0:1:51:0): Periph destroyed
(da9:mrsas0:1:51:0): UNMAPPED
(da9:mrsas0:1:51:0): fatal error, could not acquire reference count
g_access(918): provider da9 has error
g_access(918): provider da9 has error
g_access(918): provider da9 has error
(da9:mrsas0:1:51:0): UNMAPPED
da9 at mrsas0 bus 1 scbus8 target 51 lun 0
da9: <ATA ST9250610NS SN01> Fixed Direct Access SPC-4 SCSI device
da9: Serial Number 9XE02AR2
da9: 150.000MB/s transfers
da9: 238475MB (488397168 512 byte sectors) =================================

--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebs...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs...@freebsd.org"

bugzilla...@freebsd.org

unread,
Oct 12, 2016, 10:04:11 AM10/12/16
to freebs...@freebsd.org
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212914

--- Comment #1 from Kashyap <kashya...@broadcom.com> ---

Any update/pointer on this ? Issue happen only with SATA driver attached via
CAM layer.

Do we need to address this in driver or will there be any fix in CAM layer ?

bugzilla...@freebsd.org

unread,
Oct 12, 2016, 2:10:37 PM10/12/16
to freebs...@freebsd.org
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212914

--- Comment #2 from Kenneth D. Merry <k...@FreeBSD.org> ---
The device may be going away for several possible reasons:

1. A CCB is returned with the status CAM_DEV_NOT_THERE or CAM_SEL_TIMEOUT.
2. A CCB is returned with the SCSI ASC/ASCQ 0x25,0x00, Logical Unit Not
Supported.
3. Someone is doing an xpt_async(AC_LOST_DEVICE, ...)


A device may go away and come back as a result of a rescan if any of the
following changes:

SCSI Standard Inquiry Data (Full inquiry data, including Vendor, Product,
Revision)
SCSI page 0x80 serial number

The first thing I would look at here is what status is getting returned from
the drive in question after a reset. If that all looks good, look at whether
someone is issuing a rescan, and whether the device is returning inconsistent
results. Those inconsistent results could be buried in the part of the Inquiry
data that isn't displayed. Standard inquiry data is checksummed along with the
serial number and any change in the checksum will make the device go away and
come back.

Although this is a SATA drive, obviously the only thing that matters is the
SCSI response, because CAM is communicating with it as if it is a SCSI protocol
device.
0 new messages