Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

deterministic scsi order with async scan

104 views
Skip to first unread message

da...@lang.hm

unread,
Jul 15, 2009, 9:09:31 PM7/15/09
to linux-kernel, linux...@vger.kernel.org
is there any way to get deterministic device ordering with scsi async
scanning?

currently (2.6.30) it seems that the various scsi busses are loaded in the
order that they are detected, which can vary from boot to boot depending
on how long it takes for the card to initialize.

would it be possible to detect the cards/drives, but not register them
until all the detection is complete so that they can be registered in a
deterministic order?

having two drives on two different controllers swap positions from boot to
boot makes it very painful. yes I can make an initrd that fixes this up in
user space by examining each drive and creating links to re-order them,
but this is a lot of work to fix randomization that can be prevented in
the first place.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Boaz Harrosh

unread,
Jul 16, 2009, 7:33:46 AM7/16/09
to da...@lang.hm, linux-kernel, linux...@vger.kernel.org
On 07/16/2009 04:09 AM, da...@lang.hm wrote:
> is there any way to get deterministic device ordering with scsi async
> scanning?
>
> currently (2.6.30) it seems that the various scsi busses are loaded in the
> order that they are detected, which can vary from boot to boot depending
> on how long it takes for the card to initialize.
>
> would it be possible to detect the cards/drives, but not register them
> until all the detection is complete so that they can be registered in a
> deterministic order?
>
> having two drives on two different controllers swap positions from boot to
> boot makes it very painful. yes I can make an initrd that fixes this up in
> user space by examining each drive and creating links to re-order them,
> but this is a lot of work to fix randomization that can be prevented in
> the first place.
>
> David Lang

It is highly discouraged to setup any kind of system that depends
on device-names for block-devices. mounts have the mount by-label
or mount by-uuid. Any other subsystem should go by /dev/disk/by-id/*
slinks to find a persistent raw block-device. the id is generated
from characteristics inside the disk itself so it will be the same
no matter what host connection or bus it is connected too (almost).

This is because even if the boot order is consistent, the device-name
is so volatile in the life-span of a system. Did I boot with a removable
USB inserted. that camera or printer was on or off, disk was connected
to the other port. Any such change will break things and give you a very
poor user experience.

I would say that "scsi async" is a grate blessing

Boaz

Matthew Wilcox

unread,
Jul 16, 2009, 7:57:18 AM7/16/09
to da...@lang.hm, linux-kernel, linux...@vger.kernel.org
On Wed, Jul 15, 2009 at 06:09:22PM -0700, da...@lang.hm wrote:
> is there any way to get deterministic device ordering with scsi async
> scanning?
>
> currently (2.6.30) it seems that the various scsi busses are loaded in
> the order that they are detected, which can vary from boot to boot
> depending on how long it takes for the card to initialize.

I think you're confused. The async scsi scanning was designed to _not_
move devices around randomly. There are other asynchronous schemes in
the kernel, some of which were not designed with the same care.

The SCSI async scan can't do much about it if the busses are detected
in a different order.

> would it be possible to detect the cards/drives, but not register them
> until all the detection is complete so that they can be registered in a
> deterministic order?

That's exactly how the scsi async scanning works.

> having two drives on two different controllers swap positions from boot
> to boot makes it very painful. yes I can make an initrd that fixes this
> up in user space by examining each drive and creating links to re-order
> them, but this is a lot of work to fix randomization that can be
> prevented in the first place.
>
> David Lang
> --

> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in


> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

da...@lang.hm

unread,
Jul 16, 2009, 1:22:50 PM7/16/09
to Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

for a laptop you areprobably correct, but for a server or embedded system
that doesn't have it's hardware changing all the time you are not correct.

especially on a system with lots of drives, why should I have to create an
initrd that goes and searches dozens or hundreds of drives to find out
which one to boot from?

I am building a system that will have two drives in a hardware mirror on
one SCSI card, and 160 drives on a FC (SCSI) card. why should my boot have
to go and examine all 162 drives to look for an ID on every partition just
to find the boot drive?

> I would say that "scsi async" is a grate blessing

it's great for startup time, but doing the async detection doesn't
_require_ doing async registration.

David Lang

da...@lang.hm

unread,
Jul 16, 2009, 1:23:54 PM7/16/09
to Matthew Wilcox, linux-kernel, linux...@vger.kernel.org
On Thu, 16 Jul 2009, Matthew Wilcox wrote:

> On Wed, Jul 15, 2009 at 06:09:22PM -0700, da...@lang.hm wrote:
>> is there any way to get deterministic device ordering with scsi async
>> scanning?
>>
>> currently (2.6.30) it seems that the various scsi busses are loaded in
>> the order that they are detected, which can vary from boot to boot
>> depending on how long it takes for the card to initialize.
>
> I think you're confused. The async scsi scanning was designed to _not_
> move devices around randomly. There are other asynchronous schemes in
> the kernel, some of which were not designed with the same care.
>
> The SCSI async scan can't do much about it if the busses are detected
> in a different order.
>
>> would it be possible to detect the cards/drives, but not register them
>> until all the detection is complete so that they can be registered in a
>> deterministic order?
>
> That's exactly how the scsi async scanning works.

hmm, in that case how can I troubleshoot why this system is detecting the
two different PCI-E cards in different orders on different boots.

David Lang

>> having two drives on two different controllers swap positions from boot
>> to boot makes it very painful. yes I can make an initrd that fixes this
>> up in user space by examining each drive and creating links to re-order
>> them, but this is a lot of work to fix randomization that can be
>> prevented in the first place.
>>
>> David Lang
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majo...@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--

Matthew Wilcox

unread,
Jul 16, 2009, 2:15:34 PM7/16/09
to da...@lang.hm, linux-kernel, linux...@vger.kernel.org
On Thu, Jul 16, 2009 at 10:23:45AM -0700, da...@lang.hm wrote:
> hmm, in that case how can I troubleshoot why this system is detecting the
> two different PCI-E cards in different orders on different boots.

I don't know. Are the cards actually being detected in a different
order, or are the Linux drivers being bound to them in a different order?
Are you using modules or are these drivers built-in?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

da...@lang.hm

unread,
Jul 16, 2009, 2:31:36 PM7/16/09
to Matthew Wilcox, linux-kernel, linux...@vger.kernel.org
On Thu, 16 Jul 2009, Matthew Wilcox wrote:

> On Thu, Jul 16, 2009 at 10:23:45AM -0700, da...@lang.hm wrote:
>> hmm, in that case how can I troubleshoot why this system is detecting the
>> two different PCI-E cards in different orders on different boots.
>
> I don't know. Are the cards actually being detected in a different
> order, or are the Linux drivers being bound to them in a different order?

I don't know, the end result is that what device is scsi1, scsi2 is
sometimes different when the machine boots. unfortunantly when they show
up in the wrong order the system can't find it's boot drive

I guess that the fact that the bios is finding lilo and lilo is finding
the kernel to try to boot is probably indicating that the hardware is
being detected in the same order.

> Are you using modules or are these drivers built-in?

built-in

James Smart

unread,
Jul 16, 2009, 2:32:53 PM7/16/09
to da...@lang.hm, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

da...@lang.hm wrote:
> On Thu, 16 Jul 2009, Boaz Harrosh wrote:
>
>
>> It is highly discouraged to setup any kind of system that depends
>> on device-names for block-devices. mounts have the mount by-label
>> or mount by-uuid. Any other subsystem should go by /dev/disk/by-id/*
>> slinks to find a persistent raw block-device. the id is generated
>> from characteristics inside the disk itself so it will be the same
>> no matter what host connection or bus it is connected too (almost).
>>
>> This is because even if the boot order is consistent, the device-name
>> is so volatile in the life-span of a system. Did I boot with a removable
>> USB inserted. that camera or printer was on or off, disk was connected
>> to the other port. Any such change will break things and give you a very
>> poor user experience.
>>
>
> for a laptop you areprobably correct, but for a server or embedded system
> that doesn't have it's hardware changing all the time you are not correct.
>
> especially on a system with lots of drives, why should I have to create an
> initrd that goes and searches dozens or hundreds of drives to find out
> which one to boot from?
>

Boaz is correct. Many enterprise SCSI subsystems (FC, SAS) do not have
hard transport addresses for each device like Parallel SCSI used to.
Thus, any difference in order of appearance of the devices (power-up
ordering, FC ALPA assignment based on who's loop master, order that
switch reports them, is an array in a failover mode with 1 controller
non-existent), or if LUN configuration on an array changes, or as a
drive may fail (especially with hundreds), there's no guarantee you will
see the same thing in the same order w/o name binding. Same thing is
true if one of those adapters fails or is swapped out.

> I am building a system that will have two drives in a hardware mirror on
> one SCSI card, and 160 drives on a FC (SCSI) card. why should my boot have
> to go and examine all 162 drives to look for an ID on every partition just
>

Because its the only safe way to ensure proper device identification.

-- james s

da...@lang.hm

unread,
Jul 16, 2009, 2:44:18 PM7/16/09
to James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

yes, but does your system change the order of your internal direct
attached drives with your FC/SAN drives?

David Lang

James Bottomley

unread,
Jul 16, 2009, 3:39:33 PM7/16/09
to da...@lang.hm, James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

Certainly, it can. The way BIOS booting gets around this is either to
use some type of physical indicator (like phy number for SAS) to find C:
or to use a persistent ID mapping scheme (which is pretty much
equivalent to our /dev/disk/by-id/ udev one).

James

da...@lang.hm

unread,
Jul 16, 2009, 3:48:22 PM7/16/09
to James Bottomley, James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

so if I don't use udev but do want the async detection my only option to
have it boot from card 1 instead of card 2 is to just keep rebooting the
machine until it guesses right?

David Lang

Boaz Harrosh

unread,
Jul 16, 2009, 3:53:57 PM7/16/09
to da...@lang.hm, linux-kernel, linux...@vger.kernel.org

there is *no* searching with /dev/disk/by-id/ or /dev/disk/by-uuid/
Udev comes reads a small piece of information and puts up a link.

now: not(initrd+Udev) == Kernel_with_no_legs

> I am building a system that will have two drives in a hardware mirror on
> one SCSI card, and 160 drives on a FC (SCSI) card. why should my boot have
> to go and examine all 162 drives to look for an ID on every partition just
> to find the boot drive?
>

Again no searching is done here, just read of a sector for uuid and some
query command for by-id

>> I would say that "scsi async" is a grate blessing
>
> it's great for startup time, but doing the async detection doesn't
> _require_ doing async registration.
>
> David Lang

Boaz

James Bottomley

unread,
Jul 16, 2009, 3:56:23 PM7/16/09
to da...@lang.hm, James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

Well, for multiple cards that's effectively true with or without async
scanning ... the kernel doesn't know how you've enabled the bios scans
on the cards, so it takes first bus discovery order, so your boot drive
can always end up as /dev/sdb etc.

In theory, async probing shouldn't be racy, but we've likely got a
problem between async SCSI scanning and async sd driver attachment, so
when those are sorted out it should be no worse with than without.

James

Boaz Harrosh

unread,
Jul 16, 2009, 3:58:51 PM7/16/09
to da...@lang.hm, James Bottomley, James Smart, linux-kernel, linux...@vger.kernel.org
On 07/16/2009 10:48 PM, da...@lang.hm wrote:
>
> so if I don't use udev but do want the async detection my only option to
> have it boot from card 1 instead of card 2 is to just keep rebooting the
> machine until it guesses right?
>

I guess sync detection would be faster then the reboots, on average ;-)

Matthew Wilcox

unread,
Jul 16, 2009, 4:06:02 PM7/16/09
to da...@lang.hm, James Bottomley, James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org
On Thu, Jul 16, 2009 at 12:48:06PM -0700, da...@lang.hm wrote:
> so if I don't use udev but do want the async detection my only option to
> have it boot from card 1 instead of card 2 is to just keep rebooting the
> machine until it guesses right?

you could make the driver for card2 a module ...

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

da...@lang.hm

unread,
Jul 16, 2009, 4:59:34 PM7/16/09
to James Bottomley, James Smart, Boaz Harrosh, linux-kernel, linux...@vger.kernel.org

that's what I am attempting to do, but it's not stable.

I fully agree that if you move cards or change the bios scan order things
will change. I'm not talking about a case like that. I'm talking about a
case where the hardware and BIOS do not change.

> In theory, async probing shouldn't be racy, but we've likely got a
> problem between async SCSI scanning and async sd driver attachment, so
> when those are sorted out it should be no worse with than without.

so is there something that I can do to debug this case where it is racy?
I have a repeatable test case right now. if there is something I can do to
test this to help track down the race I will do so, otherwise I will need
to disable the async scanning as being unreliable.

David Lang

0 new messages