Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problems with scsi_scan.c

2 views
Skip to first unread message

Poul Petersen

unread,
Jun 18, 2001, 9:07:47 PM6/18/01
to
I have a Dell 2300 running RedHat 7.1 with the 2.4.3 kernel. We are
using a qlogic QLA2200 FC card to connect to a Zzyzx RocketStor Raid array.
The Zzyzx array has two FC ports which appear as two targets on the SAN and
can assign raid sets to arbitrary Lun numbers on either port. The problem we
observed is that the Linux host could not see any of the raid sets on a
given port unless the raid sets were assigned to sequential Luns starting at
0. Furthermore, if there was a gap in the Luns, the Linux host could not see
any of the raid sets past the gap. For example, if we deleted the raid set
assigned to Lun 3, all of the raid sets with a Lun higher than 3 would
"disappear" from the Linux host. I modified the blacklist in scsi_scan.c and
added the following entry:

{"Zzyzx", "RocketStor 500S", "*", BLIST_SPARSELUN}

After this change, the host could see non-sequential Luns provided
that a raid set was also mapped to Lun 0. A bit of investigation revealed
that the scan_scsis_single subroutine was exiting on scanning Lun 0 at an
error test (near line 532) before the call to get_device_flags.
Consequently, if no device is mapped to Lun 0, then the BLIST_SPARSELUN flag
is never set and the remaining Luns do not get scanned. At a loss for a
better solution, I added a new flag BLIST_FORCESCAN to ignore the error
tests and I moved the call to get_device_flags before the error tests.
This works great, but is perhaps inelegant. Other ideas? Attached is a diff:

Thanks,

-poul

Poul E.J. Petersen
Rogue Wave Software

--- /archive/Sys/Kernels/linux-2.4.3/drivers/scsi/scsi_scan.c Sun Feb 4
10:05:30 2001
+++ ./scsi_scan.c Mon Jun 18 18:03:16 2001
@@ -38,6 +38,7 @@
#define BLIST_MAX5LUN 0x080
#define BLIST_ISDISK 0x100
#define BLIST_ISROM 0x200
+#define BLIST_FORCESCAN 0x400 // This forces a scan even if errors

static void print_inquiry(unsigned char *data);
static int scan_scsis_single(int channel, int dev, int lun, int
*max_scsi_dev,
@@ -146,6 +147,7 @@
{"SONY", "TSL", "*", BLIST_FORCELUN}, // DDS3 & DDS4
autoloaders
{"DELL", "PERCRAID", "*", BLIST_FORCELUN},
{"HP", "NetRAID-4M", "*", BLIST_FORCELUN},
+ {"Zzyzx", "RocketStor 500S", "*", BLIST_FORCESCAN |
BLIST_SPARSELUN},

/*
* Must be at end of list...
@@ -523,7 +525,13 @@
SCSI_LOG_SCAN_BUS(3, printk("scsi: INQUIRY %s with code 0x%x\n",
SRpnt->sr_result ? "failed" : "successful",
SRpnt->sr_result));

- if (SRpnt->sr_result) {
+ /*
+ * Get any flags for this device.
+ */
+
+ bflags = get_device_flags (scsi_result);
+
+ if (SRpnt->sr_result && ! (bflags & BLIST_FORCESCAN)) {
scsi_release_request(SRpnt);
return 0; /* assume no peripheral if any sort of error
*/
}
@@ -532,16 +540,10 @@
* Check the peripheral qualifier field - this tells us whether LUNS
* are supported here or not.
*/
- if ((scsi_result[0] >> 5) == 3) {
+ if ((scsi_result[0] >> 5) == 3 && ! (bflags & BLIST_FORCESCAN)) {
scsi_release_request(SRpnt);
return 0; /* assume no peripheral if any sort of error
*/
}
-
- /*
- * Get any flags for this device.
- */
- bflags = get_device_flags (scsi_result);
-

/* The Toshiba ROM was "gender-changed" here as an inline hack.
This is now much more generic.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majo...@vger.kernel.org

Matt_...@dell.com

unread,
Jun 19, 2001, 9:08:45 AM6/19/01
to
> I have a Dell 2300 running RedHat 7.1 with the 2.4.3
> kernel. We are
> using a qlogic QLA2200 FC card to connect to a Zzyzx
> RocketStor Raid array.
> The Zzyzx array has two FC ports which appear as two targets
> on the SAN and
> can assign raid sets to arbitrary Lun numbers on either port.
> The problem we
> observed is that the Linux host could not see any of the raid
> sets on a
> given port unless the raid sets were assigned to sequential
> Luns starting at
> 0. Furthermore, if there was a gap in the Luns, the Linux
> host could not see
> any of the raid sets past the gap. For example, if we deleted
> the raid set
> assigned to Lun 3, all of the raid sets with a Lun higher than 3 would
> "disappear" from the Linux host. I modified the blacklist in
> scsi_scan.c and
> added the following entry:
>
> {"Zzyzx", "RocketStor 500S", "*", BLIST_SPARSELUN}

* Check the peripheral qualifier field - this tells us whether LUNS


* are supported here or not.
*/

if ((scsi_result[0] >> 5) == 3) {

return 0; /* assume no peripheral if any sort of error
*/


Even if LUN 0 is masked off from you, the SCSI layer must still present a
device at LUN 0 (per SCSI spec). On the Dell PowerVault storage arrays I've
used, the test here returns 001b (The target is capable of supporting the
specified peripheral device type on this logical unit, however the physical
device is not currently connected to this logical unit - the case where you
can't access LUN 0 except for inquiry) or 000b (you can access this LUN).
If your device is returning 011b, the spec says "the target is not capable
of supporting a physical device on this logical unit", and this test should
properly force a return.

* Check the peripheral qualifier field - this tells us whether LUNS
* are supported here or not.
*/

if ((scsi_result[0] >> 5) == 3) {

return 0; /* assume no peripheral if any sort of error
*/


I've found that you must add two lines to the blacklist, one for the case
where you can see a disk at LUN 0, and one for the case where you can't.
e.g.

{"DGC", "RAID", "*", BLIST_SPARSELUN}, // Dell PV 650F (tgt @ LUN 0)
{"DGC", "DISK", "*", BLIST_SPARSELUN}, // Dell PV 650F (no tgt @ LUN
0)


This way, the device at LUN 0 is found to start the scan, and then later
/* Use the peripheral qualifier field to determine online/offline */
if (((scsi_result[0] >> 5) & 7) == 1) SDpnt->online = FALSE;

we simply mark the LUN offline if we can't access it for real.

This has worked on all Dell PowerVault SANs. If the RocketStor 500S is
returning 011b rather than 001b there, then I believe that's a bug in their
SCSI implementation. Can you add a check to see what's actually in the
peripheral qualifier field when there's no LUN available?

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
www.dell.com/linux

Poul Petersen

unread,
Jun 19, 2001, 1:17:33 PM6/19/01
to
> This has worked on all Dell PowerVault SANs. If the
> RocketStor 500S is
> returning 011b rather than 001b there, then I believe that's
> a bug in their
> SCSI implementation. Can you add a check to see what's
> actually in the
> peripheral qualifier field when there's no LUN available?
>
> Thanks,
> Matt
>
> --
> Matt Domsch
> Sr. Software Engineer
> Dell Linux Solutions
> www.dell.com/linux

Indeed, that appears to be the problem. This is what I get when I
map a raid set
only to LUN 2 (This is my slightly hacked version of scsi_scan, so it
ignores the exit on
the peripheral qualifier check):

Scanning dev=2 lun=0 (sparse=0)
bflags=1088
peripheral qualifier=3 *DOH*
scsi: unknown type 31
Vendor: Zzyzx Model: RocketStor 500S Rev: 3100
Type: Unknown ANSI SCSI revision: 04
Scanning dev=2 lun=1 (sparse=1)
bflags=1088
peripheral qualifier=3
scsi: unknown type 31
Vendor: Zzyzx Model: RocketStor 500S Rev: 3100
Type: Unknown ANSI SCSI revision: 04
Scanning dev=2 lun=2 (sparse=1)
bflags=1088
peripheral qualifier=0
Vendor: Zzyzx Model: RocketStor 500S Rev: 3100
Type: Direct-Access ANSI SCSI revision: 04

Thanks for clearing that up - I knew it was exiting at the
peripheral qualifier
check, but I did not know enough about the SCSI protocol to know why, or if
this was
a bug with the RocketStor, etc.

I suppose I'll take this up with Zzyzx now :)

-poul

Poul E.J. Petersen
Rogue Wave Software

Poul Petersen

unread,
Jun 19, 2001, 2:12:00 PM6/19/01
to
> This way, the device at LUN 0 is found to start the scan, and
> then later

On a more general question - in order to work around the
Zzyzx problem, I have moved the call to get_device_flags
as before and added (I also removed all traces of BLIST_FORCESCAN)

bflags = get_device_flags (scsi_result);

if (bflags & BLIST_SPARSELUN) {*sparse_lun = 1;}

if (SRpnt->sr_result) {
scsi_release_request(SRpnt);


return 0; /* assume no peripheral if any sort of error
*/
}

...

This way, even though the peripheral qualifier check exits, the
sparse_lun parameter gets set first so scanning can continue on the
Zzyzx device.

BTW, what should the peripheral qualifier be returned as
when scanning LUN 0 of a multiple LUN capable device which has no devices;
for example if the RocketStor had no LUN mappings? If it should return
011b in this case, then the above modification would scan all the LUNs,
while the original code would not. If it still returns 001b, then there
won't be any difference...


Thanks again, your help is very much appreciated.

Patrick Mansfield

unread,
Jun 20, 2001, 2:35:34 PM6/20/01
to
On Tue, Jun 19, 2001 at 11:08:40AM -0700, Poul Petersen wrote:
> > This way, the device at LUN 0 is found to start the scan, and
> > then later
>
> On a more general question - in order to work around the
> Zzyzx problem, I have moved the call to get_device_flags
> as before and added (I also removed all traces of BLIST_FORCESCAN)
>
> bflags = get_device_flags (scsi_result);
>
> if (bflags & BLIST_SPARSELUN) {*sparse_lun = 1;}
>
> if (SRpnt->sr_result) {
> scsi_release_request(SRpnt);
> return 0; /* assume no peripheral if any sort of error
> */
> }
>
> ...

The above is a good idea, but the get/check of the flags should
happen after the sr_result.

>
> This way, even though the peripheral qualifier check exits, the
> sparse_lun parameter gets set first so scanning can continue on the
> Zzyzx device.
>
> BTW, what should the peripheral qualifier be returned as
> when scanning LUN 0 of a multiple LUN capable device which has no devices;
> for example if the RocketStor had no LUN mappings? If it should return
> 011b in this case, then the above modification would scan all the LUNs,
> while the original code would not. If it still returns 001b, then there
> won't be any difference...

A peripheral qualifier (PQ) of 011b should not halt the scan for a sparse
LUN device - current behaviour without your change above is to halt all
scans if LUN 0 has 011b, but for sparse LUN devices, if LUN 0 is "found",
other LUNs with 011b would not halt the scan.

I think the SCSI spec is a little vague on what a PQ of 011b means, it says
(SPC-2 R19, ftp://ftp.t10.org/t10/drafts/spc2/spc2r19.pdf) for PQ values
of 001b and 011b:

001b

The device server is capable of supporting the specified peripheral device
type on this logical unit. However, the physical device is not currently
connected to this logical unit.

011b

The device server is not capable of supporting a physical device on this
logical unit. For this peripheral qualifier the peripheral device type
shall be set to 1Fh to provide compatibility with previous versions of
SCSI. All other peripheral device type values are reserved for this
peripheral qualifier.

"not capable" seems somewhat ambiguous for cases where the device does not
currently have a LUN mapped - the device is currently "not capable" of
handling the LUN; 001b seems more appropriate, even though the device
will not ever be "connected".

For linux, returning a 001b can be bad, as it will allocate a Scsi_Device,
and use up an sd entry (and you can never access the device). Doug Ledford
posted a patch to skip these (apparently it is included in the redhat
7.1 2.4.2-2 kernel).

You might want to check the specs, and see if you can modify the PQ value
returned - some array devices can be configured to return different PQ
values, since different operating systems behave differently based on
the PQ.

-- Patrick Mansfield

Poul Petersen

unread,
Jun 21, 2001, 1:53:16 PM6/21/01
to
> The above is a good idea, but the get/check of the flags should
> happen after the sr_result.

Good point.

> A peripheral qualifier (PQ) of 011b should not halt the scan
> for a sparse
> LUN device - current behaviour without your change above is
> to halt all
> scans if LUN 0 has 011b, but for sparse LUN devices, if LUN 0
> is "found",
> other LUNs with 011b would not halt the scan.

...


> For linux, returning a 001b can be bad, as it will allocate a
> Scsi_Device,
> and use up an sd entry (and you can never access the device).
> Doug Ledford
> posted a patch to skip these (apparently it is included in the redhat
> 7.1 2.4.2-2 kernel).

Indeed - How do we sort this out then? As it stands, scsi_scan
halts even on a known sparse_lun device if LUN 0 returns 011b. This is
undesirable because a sparse LUN device need not have a device at LUN 0.
If instead the device returns 001b on LUN 0 even when there is no device
present at LUN 0, then the scan will work properly for the remaining LUNs
but we may get an invalid SCSI device at LUN 0.

It would seem that the call to get_device_flags and the setting of
*sparse_lun should occur before the peripheral qualifier test. Then it
would be irrelevant whether the sparse LUN device returned 001b or 011b
on LUN 0. In fact, returning 011b in this circumstance works perfectly and
returning 001b works just as before.

BTW, I've heard the IRIX has the same problem with Zzyzx - that is
it recognizes the sparse LUNs, but only if a device is present at LUN 0.
Certainly we don't want to be like SGI? :)

Thanks for your thoughts,

-poul

0 new messages