Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

LSI MegaRAID SAS 9240-4i hangs system at boot

2,467 views
Skip to first unread message

Ramon Hofer

unread,
May 18, 2012, 2:30:02 PM5/18/12
to
Hi all

I finally got my LSI 9240-4i and the Intel SAS expander.

Unfortunately it prevents the system from booting. I only got this
message on the screen:

megasas: INIT adapter done
hub 4-1:1.0 over-current condition on port 7
hub 4-1:1.0 over-current condition on port 8

I also got the over-current messages when the LSI card is removed. Here's
the output of lsusb:

Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
Laser
Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Nevertheless I think the module for the card should be loaded but then it
somehow hangs.

And after a while there are more messages which I don't understand. I
have taken a picture:
http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

Then there are lots of messages like this:

INFO: task modprobe:123 blocked for more than 120 seconds.
"echo 0..." disables this message

Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
kworker/u:1:164, 165 are listed.

I can enter the BIOS of the card just fine. It detect the disks and by
defaults sets jbod option for them. This is fine because I want to use
linux RAID.

May this problem be the same:
http://www.spinics.net/lists/raid/msg30359.html
Should I try a firmware upgrade?

This card was recommended to me by the list:
http://lists.debian.org/debian-user/2012/05/msg00104.html

I hope I can get some hints here :-)


Best regards
Ramon



--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp5m1n$dee$1...@dough.gmane.org

Shane Johnson

unread,
May 18, 2012, 2:50:03 PM5/18/12
to
Over current problems from what I have seen are hardware problems - I would make sure the intel expander doesn't need a external power source and if it does that it is functioning properly.  After that I would look to see if something isn't shorting out a USB port.

Shane
--
Shane D. Johnson
IT Administrator
Rasmussen Equipment


Ramon Hofer

unread,
May 18, 2012, 5:20:03 PM5/18/12
to
> Over current problems from what I have seen are hardware problems - I
> would make sure the intel expander doesn't need a external power source
> and if it does that it is functioning properly. After that I would look
> to see if something isn't shorting out a USB port.
>

Thanks for your answer. But the over-current message is present as well
without any cards. It's also there if I only have the bare mainboard and
a disk in use.

However this doesn't bother me much for now because it doesn't seem to be
the source of my problem.

But I'd like to know if someone has experience with the LSI card and if a
firmware upgrade would be a good idea?
I don't want to break anything.


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp609u$40o$1...@dough.gmane.org

Camaleón

unread,
May 18, 2012, 8:20:01 PM5/18/12
to
On Fri, 18 May 2012 14:23:51 +0000, Ramon Hofer wrote:

> I finally got my LSI 9240-4i and the Intel SAS expander.
>
> Unfortunately it prevents the system from booting. I only got this
> message on the screen:
>
> megasas: INIT adapter done
> hub 4-1:1.0 over-current condition on port 7
> hub 4-1:1.0 over-current condition on port 8

How bad, but don't panic, these things happen ;-(

Are you running Squeeze?

> I also got the over-current messages when the LSI card is removed.

And you installed the system with no glitches and then it hangs?

> Here's the output of lsusb:

(...)

What's the point for listing the USB devices? :-?

> Nevertheless I think the module for the card should be loaded but then
> it somehow hangs.
>
> And after a while there are more messages which I don't understand. I
> have taken a picture:
> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

Something wrong with udevd when listing an usb?? device or hub.

> Then there are lots of messages like this:
>
> INFO: task modprobe:123 blocked for more than 120 seconds. "echo 0..."
> disables this message
>
> Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
> kworker/u:1:164, 165 are listed.

Those messages are coming from the kernel side but I can't guess the
source that trigger them.

> I can enter the BIOS of the card just fine. It detect the disks and by
> defaults sets jbod option for them. This is fine because I want to use
> linux RAID.

Mmm... the strange here is that there is no clear indication about the
nature of the problem, that is, what's preventing your system from
booting. Can you at least get into the single-user mode?

> May this problem be the same:
> http://www.spinics.net/lists/raid/msg30359.html
> Should I try a firmware upgrade?

(...)

Wait, wait, wait... that looks a completely different scenario (different
driver -mt2sas-, different raid card, encryption in place, different
error...). And while updating the firmware is usually good, you better
first ensure what's what you want to correct (we still don't know) and
what firmware version solves the problem.

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp6ar6$jg9$1...@dough.gmane.org

Ramon Hofer

unread,
May 18, 2012, 10:00:02 PM5/18/12
to
On Fri, 18 May 2012 20:18:46 +0000, Camaleón wrote:

> On Fri, 18 May 2012 14:23:51 +0000, Ramon Hofer wrote:
>
>> I finally got my LSI 9240-4i and the Intel SAS expander.
>>
>> Unfortunately it prevents the system from booting. I only got this
>> message on the screen:
>>
>> megasas: INIT adapter done
>> hub 4-1:1.0 over-current condition on port 7 hub 4-1:1.0 over-current
>> condition on port 8
>
> How bad, but don't panic, these things happen ;-(
>
> Are you running Squeeze?

Yes, sorry forgot to mention.

I installed squeeze amd64 yesterday on a raid1 (just to try). Today when
the card was here I put it in and couldn't boot. Then I installed squeeze
with the card present without problems but booting afterwards didn't work
again.
Without the card installed bpo amd64 kernel but couldn't boot again.



>> I also got the over-current messages when the LSI card is removed.
>
> And you installed the system with no glitches and then it hangs?

Without the LSI card there are no problems (except the over-current
message which is also present with only the mb and a disk).
Installation works ok with and without card.


>> Here's the output of lsusb:
>
> (...)
>
> What's the point for listing the USB devices? :-?

Because I thought I should mention the over-current message and it's
related to usb.
But I think it's a completely different thing. And I don't even know
where port 7 is but port 8 is definitely empty :-?


>> Nevertheless I think the module for the card should be loaded but then
>> it somehow hangs.
>>
>> And after a while there are more messages which I don't understand. I
>> have taken a picture:
>> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
>
> Something wrong with udevd when listing an usb?? device or hub.

Ok, unfortunately I have no clue what this means. But this message isn't
there without card but it's pci-e?


>> Then there are lots of messages like this:
>>
>> INFO: task modprobe:123 blocked for more than 120 seconds. "echo 0..."
>> disables this message
>>
>> Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
>> kworker/u:1:164, 165 are listed.
>
> Those messages are coming from the kernel side but I can't guess the
> source that trigger them.

How can I find out what they mean? It seems as if many different problems
lead to such messages?

>> I can enter the BIOS of the card just fine. It detect the disks and by
>> defaults sets jbod option for them. This is fine because I want to use
>> linux RAID.
>
> Mmm... the strange here is that there is no clear indication about the
> nature of the problem, that is, what's preventing your system from
> booting. Can you at least get into the single-user mode?

I can't get to any login. Or is there a way to get into single-user mode?
If you mean recovery mode: no luck either :-(


>> May this problem be the same:
>> http://www.spinics.net/lists/raid/msg30359.html Should I try a firmware
>> upgrade?
>
> (...)
>
> Wait, wait, wait... that looks a completely different scenario
> (different driver -mt2sas-, different raid card, encryption in place,
> different error...). And while updating the firmware is usually good,
> you better first ensure what's what you want to correct (we still don't
> know) and what firmware version solves the problem.

Ok. But I have no clue either how to find this out.
Maybe you could point into the right direction :-)


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp6ggn$gm5$1...@dough.gmane.org

Stan Hoeppner

unread,
May 18, 2012, 10:50:01 PM5/18/12
to
On 5/18/2012 9:23 AM, Ramon Hofer wrote:
> Hi all
>
> I finally got my LSI 9240-4i and the Intel SAS expander.
>
> Unfortunately it prevents the system from booting. I only got this
> message on the screen:
>
> megasas: INIT adapter done
> hub 4-1:1.0 over-current condition on port 7
> hub 4-1:1.0 over-current condition on port 8

These over-current errors are reported by USB, not megasas. Unplug all
of your USB devices until you get everything else running.

> I also got the over-current messages when the LSI card is removed. Here's
> the output of lsusb:
>
> Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
> Laser
> Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
> Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Again, this is because the over-current issue has nothing to with the
HBA, but the USB subsystem.

> Nevertheless I think the module for the card should be loaded but then it
> somehow hangs.

You're assuming it's the HBA/module hanging the system. I see no
evidence of that so far.

> And after a while there are more messages which I don't understand. I
> have taken a picture:
> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

It shows that udev is having serious trouble handling one of the USB
devices.

> Then there are lots of messages like this:
>
> INFO: task modprobe:123 blocked for more than 120 seconds.
> "echo 0..." disables this message
>
> Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
> kworker/u:1:164, 165 are listed.

Posting log snippets like this is totally useless. Please post your
entire dmesg output to pastebin and provide the link.

> I can enter the BIOS of the card just fine. It detect the disks and by
> defaults sets jbod option for them. This is fine because I want to use
> linux RAID.

Sure, because the card and expander are working properly.

> May this problem be the same:
> http://www.spinics.net/lists/raid/msg30359.html
> Should I try a firmware upgrade?

Your hang problem seems unrelated to the HBA. Exhaust all other
possibilities before attempting a firmware upgrade. If there is some
other system level problem, it could botch the FW upgrade and brick the
card, leaving you in a far worse situation than you are now.

Post your FW version here. It's likely pretty recent already.

> This card was recommended to me by the list:
> http://lists.debian.org/debian-user/2012/05/msg00104.html

Yes, I recommended it. It's the best card available in its class.

> I hope I can get some hints here :-)

When troubleshooting potential hardware issues, always disconnect
everything you can to isolate the component you believe may have an
issue. If that device still has a problem, work until you resolve that
problem. Then add your other hardware back into the system one device
at a time until you run into the next problem. Rinse, repeat, until all
problems are resolved. Isolating components during testing is the key.
This is called "process of elimination" testing--eliminate everything
but the one device you're currently testing.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB6D19A...@hardwarefreak.com

Stan Hoeppner

unread,
May 18, 2012, 11:00:02 PM5/18/12
to
On 5/18/2012 9:39 AM, Shane Johnson wrote:
> Over current problems from what I have seen are hardware problems - I would
> make sure the intel expander doesn't need a external power source and if it
> does that it is functioning properly.

Ramon is well aware of the power configuration for the expander. And it
has nothing to do with the over-current error, which is USB related.
The two lines are adjacent in dmesg but have nothing to do with one
another. Anyone whose every looked at dmesg output should know of this.

> After that I would look to see if
> something isn't shorting out a USB port.

Yes, USB is the cause of the over-current errors, which is plainly
evident in his screen shot. But we don't yet know if this USB problem
is what's hanging the system. Further troubleshooting is required.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB6D3F4...@hardwarefreak.com

Stan Hoeppner

unread,
May 18, 2012, 11:30:03 PM5/18/12
to
On 5/18/2012 4:55 PM, Ramon Hofer wrote:

> I installed squeeze amd64 yesterday on a raid1 (just to try).

You need to explain this in detail: "installed on raid1"

Installed onto what raid1? Does this mean you created an mdadm raid1
pair during the Squeeze installation process, and installed to that? To
what SAS/SATA controller are these two disks attached? Please provide
as much detail as possible about this controller chip and if it is on
the motherboard. If so, please provide the motherboard brand/model.

> Today when
> the card was here I put it in and couldn't boot.

Please be technical in your descriptions and provide as much detail as
possible. The above statement sounds like something from a person who
has never touched a PC before. Providing detail is what solves
problems. Lack of detail is what causes problems to linger on until
people take hammers to hardware. I assume you prefer the former. :)

> Then I installed squeeze
> with the card present without problems but booting afterwards didn't work
> again.

Detail, detail detail! To what did you install Squeeze? Which disks,
attached to which controller? We *NEED* these details to assist you.

> Without the card installed bpo amd64 kernel but couldn't boot again.

If you installed to disks attached to the expander/9240 and then yanked
the card, of course it wouldn't boot. Again, this is why we need
*details*. ALWAYS supply the details!

> Without the LSI card there are no problems (except the over-current
> message which is also present with only the mb and a disk).
> Installation works ok with and without card.

Ok, so the USB over-current error has nothing to do with the hang during
boot.

>>> Nevertheless I think the module for the card should be loaded but then
>>> it somehow hangs.

Only full dmesg output will tell us this.

> Ok. But I have no clue either how to find this out.
> Maybe you could point into the right direction :-)

Again, do not flash the HBA firmware at this point. Provide the details
I requested and we'll move forward from there. It may very well be that
the RAID firmware is causing the boot problem and you need the straight
JBOD firmware, but lets get all the other details first so we can
determine that instead of making wild guesses.

BTW, did you disable all "boot" related options in the 9240 BIOS and
force it to JBOD mode? Did you read the instructions in their entirety
before mounting the HBA into the machine? This isn't a $20 SATA card
you simply slap in and go. It's an SAS RAID controller. More
care/learning is required.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB6DB05...@hardwarefreak.com

Ramon Hofer

unread,
May 19, 2012, 8:00:01 AM5/19/12
to
On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:

> On 5/18/2012 9:39 AM, Shane Johnson wrote:
>
>> After that I would look to see if
>> something isn't shorting out a USB port.
>
> Yes, USB is the cause of the over-current errors, which is plainly
> evident in his screen shot. But we don't yet know if this USB problem
> is what's hanging the system. Further troubleshooting is required.

The strange thing is as I mentioned in another post is that on the mb usb
port 8 there's nothing attached and I haven't found where port 7 is :-?


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp7jgc$cje$1...@dough.gmane.org

Stan Hoeppner

unread,
May 19, 2012, 9:20:02 AM5/19/12
to
On 5/19/2012 2:52 AM, Ramon Hofer wrote:
> On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
>
>> On 5/18/2012 9:39 AM, Shane Johnson wrote:
>>
>>> After that I would look to see if
>>> something isn't shorting out a USB port.
>>
>> Yes, USB is the cause of the over-current errors, which is plainly
>> evident in his screen shot. But we don't yet know if this USB problem
>> is what's hanging the system. Further troubleshooting is required.
>
> The strange thing is as I mentioned in another post is that on the mb usb
> port 8 there's nothing attached and I haven't found where port 7 is :-?

I wouldn't worry about the USB errors at this point. Unless there is
some larger issue with insufficient power on the motherboard causing the
USB current error, it's likely unrelated to the storage hardware issue.
Fix it first, then worry about the USB errors. Given you have no
device plugged into those ports, it could be a phantom error.

--
Stan



--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB765A5...@hardwarefreak.com

Camaleón

unread,
May 19, 2012, 9:20:03 AM5/19/12
to
On Fri, 18 May 2012 21:55:36 +0000, Ramon Hofer wrote:

> On Fri, 18 May 2012 20:18:46 +0000, Camaleón wrote:

>> Are you running Squeeze?
>
> Yes, sorry forgot to mention.
>
> I installed squeeze amd64 yesterday on a raid1 (just to try). Today when
> the card was here I put it in and couldn't boot. Then I installed
> squeeze with the card present without problems but booting afterwards
> didn't work again.
> Without the card installed bpo amd64 kernel but couldn't boot again.

You have to be extremely precise while describing the situation because
there are missing pieces in the above stanza and the whole steps you
followed :-)

Okay, let's start over.

You installed the lsi card in one of the motherboard slots, configured
the BIOS to use a JBOD disk layout and then boot the installation CD for
Squeeze, right?

The installation proccess was smoothly (you selected a mdadm
configuration for the disks and then formatted them with no problems),
when the installer finished and the system first rebooted, you selected
the new installed system from GRUB2's menu and then, the booting proccess
halted displaying the mentioned messages in the screen, right?

>> And you installed the system with no glitches and then it hangs?
>
> Without the LSI card there are no problems (except the over-current
> message which is also present with only the mb and a disk). Installation
> works ok with and without card.

So you think the system stalls because of the raid card despite you get
the same output messages at boot and there's no additional evidence of a
problem related to the hard disks or the controller.

Mmm... weird it is, my young padawan :-) that's for sure but it can be
something coming from your Supermicro motherboard's BIOS and the raid
controller. Check if there's a BIOS update for your motherboard (but just
check, don't install!) and if so, ask Supermicro technical support about
the exact problems it corrects and tell them you are using a LSI raid
card and you're having problems to boot your system from it.

>> What's the point for listing the USB devices? :-?
>
> Because I thought I should mention the over-current message and it's
> related to usb.
> But I think it's a completely different thing. And I don't even know
> where port 7 is but port 8 is definitely empty :-?

Yes, I agree. It seems an unrelated problem that you can try to solve
once you correct the booting issue if the error still persists.

>> Something wrong with udevd when listing an usb?? device or hub.
>
> Ok, unfortunately I have no clue what this means. But this message isn't
> there without card but it's pci-e?

Ah, that's a very interesting discovery, man. To me it can mean the
motherboard is not correctly detecting the card, hence a BIOS issue.

>> Those messages are coming from the kernel side but I can't guess the
>> source that trigger them.
>
> How can I find out what they mean? It seems as if many different
> problems lead to such messages?

I would center first in solving the core of the problem.

>> Mmm... the strange here is that there is no clear indication about the
>> nature of the problem, that is, what's preventing your system from
>> booting. Can you at least get into the single-user mode?
>
> I can't get to any login. Or is there a way to get into single-user
> mode? If you mean recovery mode: no luck either :-(

Are you reaching the GRUB2 menu? If yes, you can select "recovery mode/
single-user mode".

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp7o10$27j$8...@dough.gmane.org

Ramon Hofer

unread,
May 19, 2012, 9:40:01 AM5/19/12
to
On Fri, 18 May 2012 18:28:05 -0500, Stan Hoeppner wrote:

> On 5/18/2012 4:55 PM, Ramon Hofer wrote:
>
>> I installed squeeze amd64 yesterday on a raid1 (just to try).
>
> You need to explain this in detail: "installed on raid1"
>
> Installed onto what raid1? Does this mean you created an mdadm raid1
> pair during the Squeeze installation process, and installed to that? To
> what SAS/SATA controller are these two disks attached? Please provide
> as much detail as possible about this controller chip and if it is on
> the motherboard. If so, please provide the motherboard brand/model.

Sorry I try to give you some more details. But to be honest I'm just an
interested consumer ;-)
What I want to say is that probably I just don't know how to get the
information. Like I can't get to the syslog when the system doesn't boot.
But I hope with your help I can learn about ways on how to get to the
information :-)

I installed Squeeze AMD64 Netinstall to a raid1 with the disks directly
attached to the mainboard. During installation I partitioned the disks,
set the filesystem to raid and created md raids during the installation
then chose the md raids to be mounted as /boot, swap, /, /var, /usr, /tmp
and /home.
This was just done because of curiosity.

Now the same system partitions are directly on one of the disks. It is
still attached directly to the mainboard

The mainboard is a Supermicro C7P67 with a Marvel 88SE91xx adapter
onboard.


>> Then I installed squeeze
>> with the card present without problems but booting afterwards didn't
>> work again.
>
> Detail, detail detail! To what did you install Squeeze? Which disks,
> attached to which controller? We *NEED* these details to assist you.

The system was installed to a disk directly attached to the mainboard. I
thought it might be a good idea anyway to use the SATA ports on the
mainboard for the os disk.


>> Without the card installed bpo amd64 kernel but couldn't boot again.
>
> If you installed to disks attached to the expander/9240 and then yanked
> the card, of course it wouldn't boot. Again, this is why we need
> *details*. ALWAYS supply the details!

No, sorry for all the misunderstanding.

Even if I only have the os disks (attached to the mainboard), the lsi
card and the expander (both mounted on pci-e x16 ports on the mainboard)
the system hangs on after the first three messages (megasas: INIT adapter
done and the two over-current messages).

And when I remove the LSI card only I see the over-current messages and
the system boot just fine.

As well when I remove the expander as well I see the over-current
messages and the system boots fine.


>> Without the LSI card there are no problems (except the over-current
>> message which is also present with only the mb and a disk).
>> Installation works ok with and without card.
>
> Ok, so the USB over-current error has nothing to do with the hang during
> boot.

Yes, this is what I think as well but didn't want to keep quiet about
that.

>>>> Nevertheless I think the module for the card should be loaded but
>>>> then it somehow hangs.
>
> Only full dmesg output will tell us this.

Yes. Unfortunately I don't know how to get the output when I can't login.

Oh ok, now I have removed the card again and found some interesting logs.

/var/log/syslog:
http://pastebin.com/raw.php?i=00rN1X8s

/var/log/installer/syslog:
http://pastebin.com/raw.php?i=sDmjbeey

/var/log/installer/hardware-summary:
http://pastebin.com/raw.php?i=V8fX4F0W


>> Ok. But I have no clue either how to find this out. Maybe you could
>> point into the right direction :-)
>
> Again, do not flash the HBA firmware at this point. Provide the details
> I requested and we'll move forward from there. It may very well be that
> the RAID firmware is causing the boot problem and you need the straight
> JBOD firmware, but lets get all the other details first so we can
> determine that instead of making wild guesses.
>
> BTW, did you disable all "boot" related options in the 9240 BIOS and
> force it to JBOD mode? Did you read the instructions in their entirety
> before mounting the HBA into the machine? This isn't a $20 SATA card
> you simply slap in and go. It's an SAS RAID controller. More
> care/learning is required.

To be honest I have never worked with anything else than the usual
consumer products.
So the most of the terms I don't understand. But I will work harder I
read how to disable these options.
What I saw is that it sets the disks connected to the expander to jbod
mode.
And I disbled the cards BIOS completely but with no luck.


I hope this helps a bit but please be gentle with a hobbyist :-)


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp7pi1$iik$1...@dough.gmane.org

Ramon Hofer

unread,
May 19, 2012, 10:40:01 AM5/19/12
to
On Fri, 18 May 2012 17:47:54 -0500, Stan Hoeppner wrote:

> On 5/18/2012 9:23 AM, Ramon Hofer wrote:
>> Hi all
>>
>> I finally got my LSI 9240-4i and the Intel SAS expander.
>>
>> Unfortunately it prevents the system from booting. I only got this
>> message on the screen:
>>
>> megasas: INIT adapter done
>> hub 4-1:1.0 over-current condition on port 7 hub 4-1:1.0 over-current
>> condition on port 8
>
> These over-current errors are reported by USB, not megasas. Unplug all
> of your USB devices until you get everything else running.

Even when I plug out the chassis usb connector and only have the onboard
usb connectors from the mainboard without connected anything to it the
message remains.

This is the device 1 on bus 4 right? So it should be ID 1d6b:0002 Linux
Foundation 2.0 root hub Bus?

>> I also got the over-current messages when the LSI card is removed.
>> Here's the output of lsusb:
>>
>> Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
>> Laser
>> Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching
>> Hub Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus
>> 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
>> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003
>> Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 002 Device
>> 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>
> Again, this is because the over-current issue has nothing to with the
> HBA, but the USB subsystem.

Yes this might have nothing to do with the problem. But I still wanted to
mention it because I didn't know if it's related or not. Or if I should
worry about it.

Mainboards somethimes say strange things :-)
On my htpc I always have the message cpu fan error probably because I
have a big passive cooler and use the chassis fans to cool them.
And this was no problems so far too.


>> Nevertheless I think the module for the card should be loaded but then
>> it somehow hangs.
>
> You're assuming it's the HBA/module hanging the system. I see no
> evidence of that so far.

I came to that conclusion because when the card is mounted to system
stops during booting.
When the card is remove the system boots.
There's this over-current problem that could cause something.
And maybe the pci-e slots have to do something with it. But I have
plugged the lsi card to both pci-e x16 slots on the mainboard but both
times the system didn't boot.
And the expander only uses the slot to draw it's power.

And I tried to switch the LSI bios off.

These are the things I tried to isolate the problem but unfortunately I
don't have any other ideas.
I will now thoroughly study the lsi documentary...


>> And after a while there are more messages which I don't understand. I
>> have taken a picture:
>> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
>
> It shows that udev is having serious trouble handling one of the USB
> devices.

Yes but only when the lsi card is attached. When it's removed the
messages don't appear. And I don't even have anything connected to the usb
ports. Really confusing...
I thought I had the same messages with the Supermicro AOC-SASLP-MV8
cards :-?
But when I switched to the bpo amd64 kernel it _seemed_ ok.

This is why I hoped with the megaraid module it would be the same.

Btw just left of the Ext. LED connector there's the CR1 LED constantly
(from the moment the system is powered) blinking with a 1 sec on / 1 sec
off period. I couldn't find the meaning of this LED in the LSI documents.
But to be honest I didn't read through the 500 page manual. Which I will
do now :-)


>> Then there are lots of messages like this:
>>
>> INFO: task modprobe:123 blocked for more than 120 seconds. "echo 0..."
>> disables this message
>>
>> Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
>> kworker/u:1:164, 165 are listed.
>
> Posting log snippets like this is totally useless. Please post your
> entire dmesg output to pastebin and provide the link.

I didn't have the idea yesterday that I could use the files under /var/
log. I was only missing the possibility to type dmesg in a terminal when
the error occurs.

But I have posted some logs in my previous post. I hope these help more.

>> I can enter the BIOS of the card just fine. It detect the disks and by
>> defaults sets jbod option for them. This is fine because I want to use
>> linux RAID.
>
> Sure, because the card and expander are working properly.

Yes, now I only have to convice the os to accept this :-)


>> May this problem be the same:
>> http://www.spinics.net/lists/raid/msg30359.html Should I try a firmware
>> upgrade?
>
> Your hang problem seems unrelated to the HBA. Exhaust all other
> possibilities before attempting a firmware upgrade. If there is some
> other system level problem, it could botch the FW upgrade and brick the
> card, leaving you in a far worse situation than you are now.
>
> Post your FW version here. It's likely pretty recent already.

The FW version is 2.70.04-0862.

I have a little confusion with the versioning from LSI. On their homepage
[1] they list the firmware name 4.6 - 10M09 P24 as the newest. The
filename of this file is
20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip. The starting
number 20.10.1-0777 is the newest version according to the readme. The
filename ends with 2.120.244-1482 which seems more in the format the
version listed in my cards BIOS.

[1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip


>> This card was recommended to me by the list:
>> http://lists.debian.org/debian-user/2012/05/msg00104.html
>
> Yes, I recommended it. It's the best card available in its class.

Yes, I'm really thankful for the recommendation.
And somehow I hoped you could jump in and help me :-)
But I didn't know if it's ok to ask you by name.

So thanks already for that too :-)


>> I hope I can get some hints here :-)
>
> When troubleshooting potential hardware issues, always disconnect
> everything you can to isolate the component you believe may have an
> issue. If that device still has a problem, work until you resolve that
> problem. Then add your other hardware back into the system one device
> at a time until you run into the next problem. Rinse, repeat, until all
> problems are resolved. Isolating components during testing is the key.
> This is called "process of elimination" testing--eliminate everything
> but the one device you're currently testing.

Thanks for the advice!
This is what I tried to do. I was at the point where I couldn't
disconnect anything anymore. Maybe there are ways to further isolate the
problem which I couldn't figure out myself.


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp7st1$iik$2...@dough.gmane.org

Ramon Hofer

unread,
May 19, 2012, 10:50:01 AM5/19/12
to
On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:

> On 5/19/2012 2:52 AM, Ramon Hofer wrote:
>> On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
>>
>>> On 5/18/2012 9:39 AM, Shane Johnson wrote:
>>>
>>>> After that I would look to see if
>>>> something isn't shorting out a USB port.
>>>
>>> Yes, USB is the cause of the over-current errors, which is plainly
>>> evident in his screen shot. But we don't yet know if this USB problem
>>> is what's hanging the system. Further troubleshooting is required.
>>
>> The strange thing is as I mentioned in another post is that on the mb
>> usb port 8 there's nothing attached and I haven't found where port 7 is
>> :-?
>
> I wouldn't worry about the USB errors at this point. Unless there is
> some larger issue with insufficient power on the motherboard causing the
> USB current error, it's likely unrelated to the storage hardware issue.
> Fix it first, then worry about the USB errors. Given you have no
> device plugged into those ports, it could be a phantom error.

Yes I hope you're right with the phantom error :-)
Especially because I can't find port 7. No label on the mb pcb nor in
it's documentation.


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp7t1k$iik$3...@dough.gmane.org

Ramon Hofer

unread,
May 19, 2012, 11:30:01 AM5/19/12
to
On Sat, 19 May 2012 09:09:52 +0000, Camaleón wrote:

> On Fri, 18 May 2012 21:55:36 +0000, Ramon Hofer wrote:
>
>> On Fri, 18 May 2012 20:18:46 +0000, Camaleón wrote:
>
>>> Are you running Squeeze?
>>
>> Yes, sorry forgot to mention.
>>
>> I installed squeeze amd64 yesterday on a raid1 (just to try). Today
>> when the card was here I put it in and couldn't boot. Then I installed
>> squeeze with the card present without problems but booting afterwards
>> didn't work again.
>> Without the card installed bpo amd64 kernel but couldn't boot again.
>
> You have to be extremely precise while describing the situation because
> there are missing pieces in the above stanza and the whole steps you
> followed :-)

Ok, sorry for that! I try to improve :-)


> Okay, let's start over.
>
> You installed the lsi card in one of the motherboard slots, configured
> the BIOS to use a JBOD disk layout and then boot the installation CD for
> Squeeze, right?

Yes, but I didn't set the LSI BIOS to use the cards as jbod it did it
automatically.

In the cards BIOS I saw that virtual drives can be setup. But since I
want to use them as jbod I don't think I have to set virtual drives.
The Controller Property pages are very hard to understand.
So I tried with the factory default.


> The installation proccess was smoothly (you selected a mdadm
> configuration for the disks and then formatted them with no problems),
> when the installer finished and the system first rebooted, you selected
> the new installed system from GRUB2's menu and then, the booting
> proccess halted displaying the mentioned messages in the screen, right?

Exactly.
I only saw the three messages (megasas: INIT adapter done and the over-
currents) for some time. Then the screen was filled with the timeout and
udev messages.


>>> And you installed the system with no glitches and then it hangs?
>>
>> Without the LSI card there are no problems (except the over-current
>> message which is also present with only the mb and a disk).
>> Installation works ok with and without card.
>
> So you think the system stalls because of the raid card despite you get
> the same output messages at boot and there's no additional evidence of a
> problem related to the hard disks or the controller.

I only get the two over current lines always.
The timeout and udev errors don't appear when the card is removed.


> Mmm... weird it is, my young padawan :-) that's for sure but it can be
> something coming from your Supermicro motherboard's BIOS and the raid
> controller. Check if there's a BIOS update for your motherboard (but
> just check, don't install!) and if so, ask Supermicro technical support
> about the exact problems it corrects and tell them you are using a LSI
> raid card and you're having problems to boot your system from it.

Thanks Master Camaleón :-D
The mb BIOS version is 2.10.1206. But I couldn't find the current
version. They only write R 2.0.
And the readmes in the firmware zip don't tell me more.

I will email Supermicro to ask them.


>>> What's the point for listing the USB devices? :-?
>>
>> Because I thought I should mention the over-current message and it's
>> related to usb.
>> But I think it's a completely different thing. And I don't even know
>> where port 7 is but port 8 is definitely empty :-?
>
> Yes, I agree. It seems an unrelated problem that you can try to solve
> once you correct the booting issue if the error still persists.

Will do that :-)


>>> Something wrong with udevd when listing an usb?? device or hub.
>>
>> Ok, unfortunately I have no clue what this means. But this message
>> isn't there without card but it's pci-e?
>
> Ah, that's a very interesting discovery, man. To me it can mean the
> motherboard is not correctly detecting the card, hence a BIOS issue.

Ah yes, maybe it thinks it's a usb device?
I have tried to check if I can see something in the mb BIOS to see if it
can tell me anything about the connected hardware. But I didn't find
anything in the PCI settings.


(...)

>>> Mmm... the strange here is that there is no clear indication about the
>>> nature of the problem, that is, what's preventing your system from
>>> booting. Can you at least get into the single-user mode?
>>
>> I can't get to any login. Or is there a way to get into single-user
>> mode? If you mean recovery mode: no luck either :-(
>
> Are you reaching the GRUB2 menu? If yes, you can select "recovery mode/
> single-user mode".

Ah ok. Yes I have tried that with both kernels in recovery mode but
without luck.
There are alot more messages with the last two of them the over-current
messages :-o


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp800h$sd8$1...@dough.gmane.org

Ramon Hofer

unread,
May 19, 2012, 11:40:02 AM5/19/12
to
On Sat, 19 May 2012 10:33:06 +0000, Ramon Hofer wrote:

> On Fri, 18 May 2012 17:47:54 -0500, Stan Hoeppner wrote:
>
>> Post your FW version here. It's likely pretty recent already.
>
> The FW version is 2.70.04-0862.
>
> I have a little confusion with the versioning from LSI. On their
> homepage [1] they list the firmware name 4.6 - 10M09 P24 as the newest.
> The filename of this file is
> 20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip. The starting
> number 20.10.1-0777 is the newest version according to the readme. The
> filename ends with 2.120.244-1482 which seems more in the format the
> version listed in my cards BIOS.
>
> [1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
> 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip

When I start the system the card shows it's version before it start
detecting the disks.

It's 4.14.00 and the date it shows is 29.1.2010.

So it seems a bit old?


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp8080$sd8$2...@dough.gmane.org

Camaleón

unread,
May 19, 2012, 1:50:02 PM5/19/12
to
On Sat, 19 May 2012 11:26:09 +0000, Ramon Hofer wrote:

> On Sat, 19 May 2012 09:09:52 +0000, Camaleón wrote:

>> You installed the lsi card in one of the motherboard slots, configured
>> the BIOS to use a JBOD disk layout and then boot the installation CD
>> for Squeeze, right?
>
> Yes, but I didn't set the LSI BIOS to use the cards as jbod it did it
> automatically.

I guess that's the default.

> In the cards BIOS I saw that virtual drives can be setup. But since I
> want to use them as jbod I don't think I have to set virtual drives. The
> Controller Property pages are very hard to understand. So I tried with
> the factory default.

It's important that you first read and get a global understanding about
the capabilities (and possibilities) of your card. A raid card is almost
a small computer by itself (it has the logic and the wires to act as
such), they're a very complex piece of hardware.

Moreover, the raid card has to stablish a perfect dialog with your
motherboard and the rest of the system components (OS, hard disks...),
and every single item in this chain (a BIOS problem, firmware glitch) can
fail or make the computer behave weirdly.

>> So you think the system stalls because of the raid card despite you get
>> the same output messages at boot and there's no additional evidence of
>> a problem related to the hard disks or the controller.
>
> I only get the two over current lines always. The timeout and udev
> errors don't appear when the card is removed.

Okay.

>> Mmm... weird it is, my young padawan :-) that's for sure but it can be
>> something coming from your Supermicro motherboard's BIOS and the raid
>> controller. Check if there's a BIOS update for your motherboard (but
>> just check, don't install!) and if so, ask Supermicro technical support
>> about the exact problems it corrects and tell them you are using a LSI
>> raid card and you're having problems to boot your system from it.
>
> Thanks Master Camaleón :-D
> The mb BIOS version is 2.10.1206. But I couldn't find the current
> version. They only write R 2.0.
> And the readmes in the firmware zip don't tell me more.
>
> I will email Supermicro to ask them.

Yes, do it ASAP. Look, Supermicro is somehow "special" in this regard.
They have top-quality motherboards which allows special configuration and
setups and thus they work very closely with the rest of the hardware
manufacturers (memory modules, HBA providers...). Should there's any
specific problem with your raid card and any of their boards they'll tell
the steps to follow.

>>>> Something wrong with udevd when listing an usb?? device or hub.
>>>
>>> Ok, unfortunately I have no clue what this means. But this message
>>> isn't there without card but it's pci-e?
>>
>> Ah, that's a very interesting discovery, man. To me it can mean the
>> motherboard is not correctly detecting the card, hence a BIOS issue.
>
> Ah yes, maybe it thinks it's a usb device?

Yes, sort of. It could be that the motherboard is having problems to
address some resources provided by your raid card.

> I have tried to check if I can see something in the mb BIOS to see if
> it can tell me anything about the connected hardware. But I didn't find
> anything in the PCI settings.

Mmm, have you tried to set a RAID level instead using JBOD? It's just for
testing... although this can only be done in a very early stage when the
disks are completely empty with no data on them because I'm afraid
changing this will destroy whatever contains.

>> Are you reaching the GRUB2 menu? If yes, you can select "recovery mode/
>> single-user mode".
>
> Ah ok. Yes I have tried that with both kernels in recovery mode but
> without luck.
> There are alot more messages with the last two of them the over-current
> messages :-o

If you could upload an image with the screen you get, it would be
great :-)

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jp87ud$27j$1...@dough.gmane.org

Henrique de Moraes Holschuh

unread,
May 19, 2012, 4:10:01 PM5/19/12
to
On Sat, 19 May 2012, Ramon Hofer wrote:
> >> And after a while there are more messages which I don't understand. I
> >> have taken a picture:
> >> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
> >
> > It shows that udev is having serious trouble handling one of the USB
> > devices.
>
> Yes but only when the lsi card is attached. When it's removed the

Get a better PSU, and if that doesn't work, either junk the motherboard, or
give up on adding any cards that require a bit more power.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120519160...@khazad-dum.debian.net

Henrique de Moraes Holschuh

unread,
May 19, 2012, 4:10:02 PM5/19/12
to
On Sat, 19 May 2012, Ramon Hofer wrote:
> On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:
> > On 5/19/2012 2:52 AM, Ramon Hofer wrote:
> >> On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
> >>
> >>> On 5/18/2012 9:39 AM, Shane Johnson wrote:
> >>>
> >>>> After that I would look to see if
> >>>> something isn't shorting out a USB port.
> >>>
> >>> Yes, USB is the cause of the over-current errors, which is plainly
> >>> evident in his screen shot. But we don't yet know if this USB problem
> >>> is what's hanging the system. Further troubleshooting is required.
> >>
> >> The strange thing is as I mentioned in another post is that on the mb
> >> usb port 8 there's nothing attached and I haven't found where port 7 is
> >> :-?
> >
> > I wouldn't worry about the USB errors at this point. Unless there is
> > some larger issue with insufficient power on the motherboard causing the
> > USB current error, it's likely unrelated to the storage hardware issue.
> > Fix it first, then worry about the USB errors. Given you have no
> > device plugged into those ports, it could be a phantom error.
>
> Yes I hope you're right with the phantom error :-)
> Especially because I can't find port 7. No label on the mb pcb nor in
> it's documentation.

It might well mean one of the power planes is oversubscribed, and THAT
can cause anything up to and including damage to hard disks, data
corruption, and crashes.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120519160...@khazad-dum.debian.net

Stan Hoeppner

unread,
May 19, 2012, 4:40:03 PM5/19/12
to
On 5/19/2012 5:33 AM, Ramon Hofer wrote:

> Yes, I'm really thankful for the recommendation.
> And somehow I hoped you could jump in and help me :-)

I'm actively working on it, have been for a couple of hours on and off.
I'm reading your responses as I go before responding so I hopefully
don't recommend something you've already tried. I'm still researching.
In the mean time, if you can, go ahead and flash the 9240 with the
latest firmware, precisely following the instructions.

Also try the following:

1. Power the Intel expander with a PSU 4 pin Molex connector instead of
using a PCIe slot. Molex are the large standard plugs, usually white,
used to connect hard drives for the past 25 years--two black wires, one
red, one yellow. With the chassis laying on your desk and the side/top
cover panel removed, lay the anti-static bag the expander shipped in on
top of the drive cage frame or PSU, then lay the expander card on its
back on top of the bag--heat sink facing the ceiling. Make sure it
doesn't fall off and ground out to the metal chassis or mobo, etc. This
will eliminate a possible PCIe power bug in the mobo.

2. With the expander powered directly from the PSU, try the 9240 in
each x16 slot until one works (I'm assuming you know that you must power
down the system before inserting/removing cards or you'll very likely
permanently damage the cards and/or mobo). If no success here...

3. Go into the mobo BIOS and set and test these options:

Quiet Boot: DISABLED
Interrupt 19 Capture: DISABLED
--save/reboot/test--
PCI Express Port: ENABLED
PEG Force Gen1: ENABLED
Detect Non-Compliance Device: ENABLED
--save/reboot/test--
XHCI Hand-off: ENABLED
Active State Power Management: ENABLED
PCIe (PCI Express) Max Read Request Size: 4096
--save/reboot/test--

If none of this works, disable both on board SATA controllers:

Serial-ATA Controller 0: DISABLED
Serial-ATA Controller 1: DISABLED

and connect all drives to the 9240, and re-enable
Interrupt 19 Capture: ENABLED

This will allow booting from the 9240. In the 9240 webBIOS, create a
RAID1 array device of two disks, make it bootable, save and initialize
the array. Reboot into the Squeeze install disk and install onto the
RAID1 device. The initialization should continue transparently in the
background while you're installing Debian. When finished reboot to see
if the boot hang persists.

Hopefully you won't need to do all of these things as it will be very
time consuming. I'm attempting to provide you a thorough
troubleshooting guide that covers most/all the possible/likely causes of
the hang.

> But I didn't know if it's ok to ask you by name.

I've been doing a "reply-to-all" with each reply, hoping you'd follow
suit. This list is very busy thus a reply-all ensures I won't miss your
posts.

Please feel free to address me by name and/or contact me directly off
list. I recommended this storage controller/expander solution to you
and it's not working yet. I'm not going to leave you twisting in the
wind. That's not how I roll. ;) Besides, look at my RHS domain. I
have a reputation to uphold. :)

--
Stan



--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB7CAF7...@hardwarefreak.com

Stan Hoeppner

unread,
May 20, 2012, 7:50:01 AM5/20/12
to
On 5/19/2012 11:05 AM, Henrique de Moraes Holschuh wrote:
> On Sat, 19 May 2012, Ramon Hofer wrote:
>>>> And after a while there are more messages which I don't understand. I
>>>> have taken a picture:
>>>> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
>>>
>>> It shows that udev is having serious trouble handling one of the USB
>>> devices.
>>
>> Yes but only when the lsi card is attached. When it's removed the
>
> Get a better PSU, and if that doesn't work, either junk the motherboard, or
> give up on adding any cards that require a bit more power.

This is absolutely horrible advice. Any moderate horsepower PCIe x16
GPU card from nVidia or AMD is going to draw 4-10 times the current of
these SAS boards. Too much PCIe power draw isn't the issue here, unless
the mobo is possibly defective. I doubt this is the case. It's most
likely a firmware bug in the HBA or the system BIOS, or a driver bug in
2.6.32, or a combination of these. We should know after Ramon runs
through the task list I provided earlier.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB8A1B3...@hardwarefreak.com

Henrique de Moraes Holschuh

unread,
May 20, 2012, 12:00:02 PM5/20/12
to
On Sun, 20 May 2012, Stan Hoeppner wrote:
> On 5/19/2012 11:05 AM, Henrique de Moraes Holschuh wrote:
> > On Sat, 19 May 2012, Ramon Hofer wrote:
> >>>> And after a while there are more messages which I don't understand. I
> >>>> have taken a picture:
> >>>> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
> >>>
> >>> It shows that udev is having serious trouble handling one of the USB
> >>> devices.
> >>
> >> Yes but only when the lsi card is attached. When it's removed the
> >
> > Get a better PSU, and if that doesn't work, either junk the motherboard, or
> > give up on adding any cards that require a bit more power.
>
> This is absolutely horrible advice. Any moderate horsepower PCIe x16

Well, yes. But mostly because I didn't add the proper "but first check if
you cannot supply extra power using MOLEX connectors". I apologise for that
one.

I *have* been through oversubscribed power rails due to el-cheap-o PSUs and
onboard (motherboard) voltage regulators before, as well as due to
undersized PSUs (in servers), and I've also been through overload scenarios
caused by bad memory modules, and a bad keyboard (which had developed low
resistance paths akin to very small short-circuits). The system goes
slightly insane, all sort of weird defects show up, INCLUDING tripping the
overcurrent detector on the root USB hub due to +5V floating too much, etc.

> these SAS boards. Too much PCIe power draw isn't the issue here, unless
> the mobo is possibly defective. I doubt this is the case. It's most

Or the PSU can't supply enough power to whichever rail the onboard VRs are
using to supply the PCIe slots and the chipset (might not be the 3.3/5V
ones, some boards prefer to do it using the 12V rail and a DC-DC VR).

> likely a firmware bug in the HBA or the system BIOS, or a driver bug in
> 2.6.32, or a combination of these. We should know after Ramon runs
> through the task list I provided earlier.

AFAIK, the only kernel bug that could cause overcurrent misdetects is a
problem on interrupt sharing, which should not be possible in a modern board
where everything PCIe uses MSI/MSI-X (the Linux USB core is still incapable
of using MSI/MSI-X, at least up to kernel 3.2)... or memory corruption,
which is less deterministic.

Firmware bugs in SMM code can cause just about anything, but it seems
unlikely they'd mess with the overcurrent alarm report bits in the USB
chipset because of a disk controller.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120520115...@khazad-dum.debian.net

Ramon Hofer

unread,
May 20, 2012, 12:20:02 PM5/20/12
to
On Sat, 19 May 2012 13:41:33 +0000, Camaleón wrote:

(...)

>> I have tried to check if I can see something in the mb BIOS to see if
>> it can tell me anything about the connected hardware. But I didn't find
>> anything in the PCI settings.
>
> Mmm, have you tried to set a RAID level instead using JBOD? It's just
> for testing... although this can only be done in a very early stage when
> the disks are completely empty with no data on them because I'm afraid
> changing this will destroy whatever contains.

I will play around this evening a bit.
Luckily my last attempt with the Supermicro HBAs wipped the disks already
so I have some disks to play with ;-)


>>> Are you reaching the GRUB2 menu? If yes, you can select "recovery
>>> mode/ single-user mode".
>>
>> Ah ok. Yes I have tried that with both kernels in recovery mode but
>> without luck.
>> There are alot more messages with the last two of them the over-current
>> messages :-o
>
> If you could upload an image with the screen you get, it would be great
> :-)

Here you go:
http://666kb.com/i/c3yd21ff71u88d4x8.jpg

But I could only get the last part when it stopped adding new lines. It
just was too fast to get anything before :-(


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpan47$87h$1...@dough.gmane.org

Ramon Hofer

unread,
May 20, 2012, 12:30:02 PM5/20/12
to
Thanks for the suggestion, Henrique!
The PSU is a 750 W so I think it should be enough for now.


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpan8g$87h$2...@dough.gmane.org

Camaleón

unread,
May 20, 2012, 1:30:04 PM5/20/12
to
On Sun, 20 May 2012 12:12:55 +0000, Ramon Hofer wrote:

> On Sat, 19 May 2012 13:41:33 +0000, Camaleón wrote:
>
>> Mmm, have you tried to set a RAID level instead using JBOD? It's just
>> for testing... although this can only be done in a very early stage
>> when the disks are completely empty with no data on them because I'm
>> afraid changing this will destroy whatever contains.
>
> I will play around this evening a bit. Luckily my last attempt with the
> Supermicro HBAs wipped the disks already so I have some disks to play
> with ;-)

Good :-)

Also, consider installing into the motherboard only the strictly required
devices to work (i.e., processor+heatsink, memory and a couple of hard
disks to test mdraid).

>> If you could upload an image with the screen you get, it would be great
>> :-)
>
> Here you go:
> http://666kb.com/i/c3yd21ff71u88d4x8.jpg

Thanks!

> But I could only get the last part when it stopped adding new lines. It
> just was too fast to get anything before :-(

Okay... I can't recall if you are already considered/tried disabling the
USB host controller from your BIOS.

Anyway, from the above messages it seems there are two USB hosts detected
("2-1" with 6 ports and "4-1" with 8 ports) and the latter is the one
exposing the over-current condition. Which OTOH is also weird because
according to the motherboard specifications¹, there has to be x2 USB 3
ports and x12 USB 2.0. There's something not matching here.

¹http://www.supermicro.com/products/motherboard/Core/P67/C7P67.cfm

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpar6c$u2v$1...@dough.gmane.org

Ramon Hofer

unread,
May 20, 2012, 2:30:02 PM5/20/12
to
On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:

> On 5/19/2012 5:33 AM, Ramon Hofer wrote:
>
>> Yes, I'm really thankful for the recommendation. And somehow I hoped
>> you could jump in and help me :-)
>
> I'm actively working on it, have been for a couple of hours on and off.
> I'm reading your responses as I go before responding so I hopefully
> don't recommend something you've already tried. I'm still researching.
> In the mean time, if you can, go ahead and flash the 9240 with the
> latest firmware, precisely following the instructions.

Should I first flash the new firmware and then test what you describe
below?

I am not very sure if I do the flashing right. Here's what I do:

1. Read the firmware readme file [1]

> Installation:
> =============
> Use MegaCLI to flash the SAS controllers. MegaCLI can be downloaded
> from the support and download section of www.lsi.com.
>
> Command syntax: MegaCli -adpfwflash -f imr_fw.rom -a0

So I download the MegaCLI from [2] and read the MegaCLI readme [3]:

> Installation Commands:
> ===================
> 1. Copy MegaCli.exe to a folder.
> 2. Run MegaCli from the Command Prompt. Use -h option to see help
menu.

I create a FreeDOS USB stick with unetbooting. Copy MegaCli.exe and the
imr_fw.rom [4] into a folder on the USB stick, boot it and run the above
command to flash the card?


[1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.txt

[2] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/8.00.40_Dos_Megacli.zip

[3] http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/
README_FOR_8.00.40_Dos_Megacli.zip.txt

[4] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip


(...)

>> But I didn't know if it's ok to ask you by name.
>
> I've been doing a "reply-to-all" with each reply, hoping you'd follow
> suit. This list is very busy thus a reply-all ensures I won't miss your
> posts.

I'm using pan to read the newsgroup where's no reply to all button. But
there's a mail to field which I'm now testing :-)


> Please feel free to address me by name and/or contact me directly off
> list. I recommended this storage controller/expander solution to you
> and it's not working yet. I'm not going to leave you twisting in the
> wind. That's not how I roll. ;) Besides, look at my RHS domain. I
> have a reputation to uphold. :)

That's very kind!
I know that everyone here does for free what they do. So I don't want to
ask for someone to spend his/her time for me. But of course I'm really
thankful for every single minute you spend to help me :-)


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpaurh$vas$1...@dough.gmane.org

Ramon Hofer

unread,
May 20, 2012, 6:20:01 PM5/20/12
to
On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:

> On 5/19/2012 5:33 AM, Ramon Hofer wrote:
>
>> Yes, I'm really thankful for the recommendation. And somehow I hoped
>> you could jump in and help me :-)
>
> I'm actively working on it, have been for a couple of hours on and off.
> I'm reading your responses as I go before responding so I hopefully
> don't recommend something you've already tried. I'm still researching.
> In the mean time, if you can, go ahead and flash the 9240 with the
> latest firmware, precisely following the instructions.

There were no problems upgrading the fw :-)

Unfortunately it didn't solve he problem.

> Also try the following:
>
> 1. Power the Intel expander with a PSU 4 pin Molex connector instead of
> using a PCIe slot. Molex are the large standard plugs, usually white,
> used to connect hard drives for the past 25 years--two black wires, one
> red, one yellow. With the chassis laying on your desk and the side/top
> cover panel removed, lay the anti-static bag the expander shipped in on
> top of the drive cage frame or PSU, then lay the expander card on its
> back on top of the bag--heat sink facing the ceiling. Make sure it
> doesn't fall off and ground out to the metal chassis or mobo, etc. This
> will eliminate a possible PCIe power bug in the mobo.

Did that but again no improvement.
Over-current messages still present and boot process still not finished
properly.

But the over-current message is always present even with only mb, ram, cpu
and graphics card.

Btw this is a PCIe x1 ATI FireMV 2260 card. With it I have both PCIe x16
for the LSI and Intel cards available.


> 2. With the expander powered directly from the PSU, try the 9240 in
> each x16 slot until one works (I'm assuming you know that you must power
> down the system before inserting/removing cards or you'll very likely
> permanently damage the cards and/or mobo). If no success here...

No success with the hba in either of the two slots. I have also tried to
plug the graphics card to another slot.
And the expander was completely removed for these tests with no SAS cable
connected to the lsi card.


> 3. Go into the mobo BIOS and set and test these options:
>
> Quiet Boot: DISABLED
> Interrupt 19 Capture: DISABLED
> --save/reboot/test--
> PCI Express Port: ENABLED
> PEG Force Gen1: ENABLED
> Detect Non-Compliance Device: ENABLED --save/reboot/
test--
> XHCI Hand-off: ENABLED
> Active State Power Management: ENABLED PCIe (PCI
Express) Max Read
> Request Size: 4096 --save/reboot/test--

None of this worked.


> If none of this works, disable both on board SATA controllers:
>
> Serial-ATA Controller 0: DISABLED
> Serial-ATA Controller 1: DISABLED
>
> and connect all drives to the 9240, and re-enable Interrupt 19
Capture:
> ENABLED
>
> This will allow booting from the 9240. In the 9240 webBIOS, create a
> RAID1 array device of two disks, make it bootable, save and initialize
> the array. Reboot into the Squeeze install disk and install onto the
> RAID1 device. The initialization should continue transparently in the
> background while you're installing Debian. When finished reboot to see
> if the boot hang persists.

I was able to set a RAID1 in the WebBIOS and set the bootable option. But
I'm not sure if the setting was accepted. Even though when I set the
bootable option again the WebBIOS tells me the option is already set - so
it should be ok?

Unfortunately the Debian installer doesn't list the RAID1 storage
device :-?


> Hopefully you won't need to do all of these things as it will be very
> time consuming. I'm attempting to provide you a thorough
> troubleshooting guide that covers most/all the possible/likely causes of
> the hang.

Thank you very much for your help so far :-)


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpbc81$moh$1...@dough.gmane.org

Stan Hoeppner

unread,
May 21, 2012, 2:30:02 AM5/21/12
to
On 5/20/2012 9:24 AM, Ramon Hofer wrote:
> On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:
>
>> On 5/19/2012 5:33 AM, Ramon Hofer wrote:
>>
>>> Yes, I'm really thankful for the recommendation. And somehow I hoped
>>> you could jump in and help me :-)
>>
>> I'm actively working on it, have been for a couple of hours on and off.
>> I'm reading your responses as I go before responding so I hopefully
>> don't recommend something you've already tried. I'm still researching.
>> In the mean time, if you can, go ahead and flash the 9240 with the
>> latest firmware, precisely following the instructions.
>
> Should I first flash the new firmware and then test what you describe
> below?

Flash the firmware, then try too boot the system from the drives
attached to the mobo SATA port, as you have been. If the system locks
as it did before, this will tell us the firmware update didn't solve the
problem. Given that the shipped FW was from 2010, I have high hopes the
new FW will fix this problem. I'm surprised your card shipped with a FW
that old. From what company did you purchase the 9240-4i? I'm
wondering if it may have been sitting on a shelf for a while.

> I am not very sure if I do the flashing right. Here's what I do:
>
> 1. Read the firmware readme file [1]
>
>> Installation:
>> =============
>> Use MegaCLI to flash the SAS controllers. MegaCLI can be downloaded
>> from the support and download section of www.lsi.com.
>>
>> Command syntax: MegaCli -adpfwflash -f imr_fw.rom -a0
>
> So I download the MegaCLI from [2] and read the MegaCLI readme [3]:
>
>> Installation Commands:
>> ===================
>> 1. Copy MegaCli.exe to a folder.
>> 2. Run MegaCli from the Command Prompt. Use -h option to see help
> menu.
>
> I create a FreeDOS USB stick with unetbooting. Copy MegaCli.exe and the
> imr_fw.rom [4] into a folder on the USB stick, boot it and run the above
> command to flash the card?

Yep.

>
> [1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
> 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.txt
>
> [2] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
> 20Files/8.00.40_Dos_Megacli.zip
>
> [3] http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/
> README_FOR_8.00.40_Dos_Megacli.zip.txt
>
> [4] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
> 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip
>
>
> (...)
>
>>> But I didn't know if it's ok to ask you by name.
>>
>> I've been doing a "reply-to-all" with each reply, hoping you'd follow
>> suit. This list is very busy thus a reply-all ensures I won't miss your
>> posts.
>
> I'm using pan to read the newsgroup where's no reply to all button. But
> there's a mail to field which I'm now testing :-)

Ahh, ok. I didn't realize some people read mailing lists via news groups.

When I reply-to-all, where does the copy end up that is sent to
ramon...@bluewin.ch? Surely you read your email in an MUA such as
ThunderBird or similar. You can reply-to-all from there.

>> Please feel free to address me by name and/or contact me directly off
>> list. I recommended this storage controller/expander solution to you
>> and it's not working yet. I'm not going to leave you twisting in the
>> wind. That's not how I roll. ;) Besides, look at my RHS domain. I
>> have a reputation to uphold. :)
>
> That's very kind!
> I know that everyone here does for free what they do. So I don't want to
> ask for someone to spend his/her time for me. But of course I'm really
> thankful for every single minute you spend to help me :-)

I'll be with you until it's fixed, working, or until we identify the
root cause. It's always possible that the HBA is defective in some way.
If neither the firmware update no other measures fix the problem, you
may need to send the card back for replacement.

BTW, after you flash the FW, power off the machine and remove the Intel
Expander from its PCIe slot. Disconnect the 8087 cable from the 9240.
Then power up and see if the system boots from the mobo connected
drives. This will isolate the 9240 from the downstream SAS expander and
drives.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB9A809...@hardwarefreak.com

Henrique de Moraes Holschuh

unread,
May 21, 2012, 2:40:02 AM5/21/12
to
Yes, it is probably enough. You have to do a lot to overpower a *good* 750W
PSU (a crappy one, OTOH...).

You should still do all testing with the minimal hardware setup. From
experience, you also need to be able to test using no keyboard or a
different keyboard (and mouse)... USB is supposed to be safe from this crap
as it can detect overcurrent, but since it IS detecting overcurrent in your
case (be it a faulty alarm or not)...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/2012052102...@khazad-dum.debian.net

Stan Hoeppner

unread,
May 21, 2012, 2:40:02 AM5/21/12
to
On 5/20/2012 1:13 PM, Ramon Hofer wrote:

> There were no problems upgrading the fw :-)
>
> Unfortunately it didn't solve he problem.

Grrr.

>> 3. Go into the mobo BIOS and set and test these options:
>>
>> Quiet Boot: DISABLED
>> Interrupt 19 Capture: DISABLED
>> --save/reboot/test--
>> PCI Express Port: ENABLED
>> PEG Force Gen1: ENABLED
>> Detect Non-Compliance Device: ENABLED --save/reboot/
> test--
>> XHCI Hand-off: ENABLED
>> Active State Power Management: ENABLED PCIe (PCI
> Express) Max Read
>> Request Size: 4096 --save/reboot/test--
>
> None of this worked.

Grrr.

>
>> If none of this works, disable both on board SATA controllers:
>>
>> Serial-ATA Controller 0: DISABLED
>> Serial-ATA Controller 1: DISABLED
>>
>> and connect all drives to the 9240, and re-enable Interrupt 19
> Capture:
>> ENABLED
>>
>> This will allow booting from the 9240. In the 9240 webBIOS, create a
>> RAID1 array device of two disks, make it bootable, save and initialize
>> the array. Reboot into the Squeeze install disk and install onto the
>> RAID1 device. The initialization should continue transparently in the
>> background while you're installing Debian. When finished reboot to see
>> if the boot hang persists.
>
> I was able to set a RAID1 in the WebBIOS and set the bootable option. But
> I'm not sure if the setting was accepted. Even though when I set the
> bootable option again the WebBIOS tells me the option is already set - so
> it should be ok?
>
> Unfortunately the Debian installer doesn't list the RAID1 storage
> device :-?

Grrrr.

Does the mobo BIOS show the disk device? If not, does the 9240 BIOS
show the disk device, RAID level, and its size?

What we need to figure out is whether this is a BIOS problem at this
point or a Debian installer kernel driver problem.

>> Hopefully you won't need to do all of these things as it will be very
>> time consuming. I'm attempting to provide you a thorough
>> troubleshooting guide that covers most/all the possible/likely causes of
>> the hang.
>
> Thank you very much for your help so far :-)

Sorry it hasn't helped you make forward progress.

Did you already flash the C7P67 BIOS to the latest version? I can't recall.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB9AA5F...@hardwarefreak.com

Stan Hoeppner

unread,
May 21, 2012, 3:10:02 AM5/21/12
to
On 5/20/2012 1:13 PM, Ramon Hofer wrote:

> I was able to set a RAID1 in the WebBIOS and set the bootable option. But
> I'm not sure if the setting was accepted. Even though when I set the
> bootable option again the WebBIOS tells me the option is already set - so
> it should be ok?
>
> Unfortunately the Debian installer doesn't list the RAID1 storage
> device :-?

Are you using the very latest Squeeze installer ISO?

It's possible the driver in 2.6.32-5 used in the original Squeeze
installer doesn't work with the 9240. Support for the 9240 was added in
2.6.32-29:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083

Something else to try:

If the disks that were attached to the mobo SATA ports are still intact
with Sqeueeze installed, boot the system with those attached to the mobo
SATA but with the 9240 and expander removed from the system.

Once booted, upgrade the kernel:

$ aptitude -t squeeze-backports install linux-image-3.2.0-0.bpo.2-amd64

Shutdown, install the 9240 only, power up and see if it boots without
hanging. If it does, power down, plug in the expander, cables, drives,
etc, power up and see if Debian sees the RAID1 virtual disk, and the
JBOD drives, if any are present.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FB9AFA9...@hardwarefreak.com

Ramon Hofer

unread,
May 21, 2012, 6:40:01 AM5/21/12
to
The PSU is a Thermaltake. I have two PSUs with less power. Maybe I should
try it with one of them?

I will try this evening with a old ps2 keyboard. But it would surprise me
if this is the source of the problem because the usb transmitter for the
keyboard / mouse is used in another computer without problems and the
over-current messages are always related to port 7 and 8. Using a
different usb port makes no difference...


Best regards


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpcnn1$6a2$1...@dough.gmane.org

Ramon Hofer

unread,
May 21, 2012, 6:50:02 AM5/21/12
to
On Sun, 20 May 2012 21:59:53 -0500, Stan Hoeppner wrote:

> On 5/20/2012 1:13 PM, Ramon Hofer wrote:
>
>> I was able to set a RAID1 in the WebBIOS and set the bootable option.
>> But I'm not sure if the setting was accepted. Even though when I set
>> the bootable option again the WebBIOS tells me the option is already
>> set - so it should be ok?
>>
>> Unfortunately the Debian installer doesn't list the RAID1 storage
>> device :-?
>
> Are you using the very latest Squeeze installer ISO?

I'm using the Netinst from Unetbootin. I can try this evening another one.


> It's possible the driver in 2.6.32-5 used in the original Squeeze
> installer doesn't work with the 9240. Support for the 9240 was added in
> 2.6.32-29:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083
>
> Something else to try:
>
> If the disks that were attached to the mobo SATA ports are still intact
> with Sqeueeze installed, boot the system with those attached to the mobo
> SATA but with the 9240 and expander removed from the system.
>
> Once booted, upgrade the kernel:
>
> $ aptitude -t squeeze-backports install linux-image-3.2.0-0.bpo.2-amd64
>
> Shutdown, install the 9240 only, power up and see if it boots without
> hanging. If it does, power down, plug in the expander, cables, drives,
> etc, power up and see if Debian sees the RAID1 virtual disk, and the
> JBOD drives, if any are present.

I have done this already. I have installed Squeeze with the Netinst iso
and the lsi and expander attached.
Then after the install when I couldn't boot removed the lsi card (with
the expander still in the pcie port but not connected to the lsi card).
Installed bpo kernel installed the lsi card again and still it hangs at
boot.


Best regards


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpcnv6$6a2$2...@dough.gmane.org

Ramon Hofer

unread,
May 21, 2012, 6:50:02 AM5/21/12
to
The LSI BIOS shows the RAID1 array with the correct size. But I couldn't
see the disks in the mb BIOS. But I haven't really looked for it so I
will see this evening again if I can find it...


> What we need to figure out is whether this is a BIOS problem at this
> point or a Debian installer kernel driver problem.

This sounds like a plan :-)


>>> Hopefully you won't need to do all of these things as it will be very
>>> time consuming. I'm attempting to provide you a thorough
>>> troubleshooting guide that covers most/all the possible/likely causes
>>> of the hang.
>>
>> Thank you very much for your help so far :-)
>
> Sorry it hasn't helped you make forward progress.

Still you help me by having good ideas. I would have already ran out of
ideas...


> Did you already flash the C7P67 BIOS to the latest version? I can't
> recall.

No I didn't touch the mb firmware.
I can do this this evening as well.


Best regards


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jpcoav$6a2$3...@dough.gmane.org

Ramon Hofer

unread,
May 21, 2012, 12:50:02 PM5/21/12
to
On Sun, 20 May 2012 21:27:21 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 5/20/2012 9:24 AM, Ramon Hofer wrote:
> > On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:
> >
> >> On 5/19/2012 5:33 AM, Ramon Hofer wrote:
> >>
> >>> Yes, I'm really thankful for the recommendation. And somehow I
> >>> hoped you could jump in and help me :-)
> >>
> >> I'm actively working on it, have been for a couple of hours on and
> >> off. I'm reading your responses as I go before responding so I
> >> hopefully don't recommend something you've already tried. I'm
> >> still researching. In the mean time, if you can, go ahead and
> >> flash the 9240 with the latest firmware, precisely following the
> >> instructions.
> >
> > Should I first flash the new firmware and then test what you
> > describe below?
>
> Flash the firmware, then try too boot the system from the drives
> attached to the mobo SATA port, as you have been. If the system locks
> as it did before, this will tell us the firmware update didn't solve
> the problem. Given that the shipped FW was from 2010, I have high
> hopes the new FW will fix this problem. I'm surprised your card
> shipped with a FW that old. From what company did you purchase the
> 9240-4i? I'm wondering if it may have been sitting on a shelf for a
> while.

I purchased the card from http://www.techmania.ch/.
When I placed the order I asked them if I can come and pick it up
directly and they told me that they don't have an own warehouse but
they order the card directly from lsi.
This message was sent by claws. Hope it works now :-)


> >> Please feel free to address me by name and/or contact me directly
> >> off list. I recommended this storage controller/expander solution
> >> to you and it's not working yet. I'm not going to leave you
> >> twisting in the wind. That's not how I roll. ;) Besides, look at
> >> my RHS domain. I have a reputation to uphold. :)
> >
> > That's very kind!
> > I know that everyone here does for free what they do. So I don't
> > want to ask for someone to spend his/her time for me. But of course
> > I'm really thankful for every single minute you spend to help me :-)
>
> I'll be with you until it's fixed, working, or until we identify the
> root cause. It's always possible that the HBA is defective in some
> way. If neither the firmware update no other measures fix the
> problem, you may need to send the card back for replacement.

I really appreciate that. Thanks alot!


> BTW, after you flash the FW, power off the machine and remove the
> Intel Expander from its PCIe slot. Disconnect the 8087 cable from
> the 9240. Then power up and see if the system boots from the mobo
> connected drives. This will isolate the 9240 from the downstream SAS
> expander and drives.

This was my idea before so I did most of the tests without the expander
card...


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120521143835.125d9c15@nb-10114

Henrique de Moraes Holschuh

unread,
May 21, 2012, 3:30:02 PM5/21/12
to
On Mon, 21 May 2012, Ramon Hofer wrote:
> On Sun, 20 May 2012 23:35:58 -0300, Henrique de Moraes Holschuh wrote:
> >> Thanks for the suggestion, Henrique!
> >> The PSU is a 750 W so I think it should be enough for now.
> >
> > Yes, it is probably enough. You have to do a lot to overpower a *good*
> > 750W PSU (a crappy one, OTOH...).
> >
> > You should still do all testing with the minimal hardware setup. From
> > experience, you also need to be able to test using no keyboard or a
> > different keyboard (and mouse)... USB is supposed to be safe from this
> > crap as it can detect overcurrent, but since it IS detecting overcurrent
> > in your case (be it a faulty alarm or not)...
>
> The PSU is a Thermaltake. I have two PSUs with less power. Maybe I should
> try it with one of them?

Well, it is worth a try, Thermaltake are usually good PSUs, but still...

> I will try this evening with a old ps2 keyboard. But it would surprise me

Please make sure to not hotplug a PS2 device (mice/keyboards), they're
cold-plug only. Some motherboards and devices tolerate hotplugging, but it
is not safe to do so unless you're explicilty told in documentation of both
devices that hotplugging is supported.

> if this is the source of the problem because the usb transmitter for the
> keyboard / mouse is used in another computer without problems and the
> over-current messages are always related to port 7 and 8. Using a
> different usb port makes no difference...

Yes, it's unlikely. But you have already exausted all likely reasons,
anyway...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120521152...@khazad-dum.debian.net

Stan Hoeppner

unread,
May 21, 2012, 10:40:03 PM5/21/12
to
On 5/21/2012 2:00 PM, Ramon Hofer wrote:

> From my /var/log/installer/syslog I think it uses 2.6.32-5-amd64.
> I have attached the tree log files again (maybe you haven't seen them
> when I posted them to the list).
> There are two syslogs (the one from /var/log as well). And
> the /var/log/installer/hardware-summary.
>
> I have added the the command I'd use to print the file in the first
> line.

This isn't going to make any difference as it locks up with the 3.2
backport kernel, which has a much newer LSI driver. So don't waste any
more time on the Debian installer.

> Maybe you can see something in there...

Nothing but the megasas errors and the 120 second timeouts. There's no
smoking gun present in the logs.

>>>> It's possible the driver in 2.6.32-5 used in the original Squeeze
>>>> installer doesn't work with the 9240. Support for the 9240 was
>>>> added in 2.6.32-29:
>>>>
>>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083
>>>>
>>>> Something else to try:
>>>>
>>>> If the disks that were attached to the mobo SATA ports are still
>>>> intact with Sqeueeze installed, boot the system with those attached
>>>> to the mobo SATA but with the 9240 and expander removed from the
>>>> system.
>>>>
>>>> Once booted, upgrade the kernel:
>>>>
>>>> $ aptitude -t squeeze-backports install
>>>> linux-image-3.2.0-0.bpo.2-amd64
>>>>
>>>> Shutdown, install the 9240 only, power up and see if it boots
>>>> without hanging. If it does, power down, plug in the expander,
>>>> cables, drives, etc, power up and see if Debian sees the RAID1
>>>> virtual disk, and the JBOD drives, if any are present.
>>>
>>> I have done this already. I have installed Squeeze with the Netinst
>>> iso and the lsi and expander attached.
>>> Then after the install when I couldn't boot removed the lsi card
>>> (with the expander still in the pcie port but not connected to the
>>> lsi card). Installed bpo kernel installed the lsi card again and
>>> still it hangs at boot.
>>
>> Hmmmm.... this is very strange. Never seen anywhere near this much
>> trouble before installing an LSI HBA with Linux.
>
> I have not much luck with linux and hardware ;-)

That's odd given you have top shelf mobo and SAS HBA, SuperMicro and
LSI. The problem seems to be in the kernel at this point, though I'm
unable to find anything thus far via Google that provides a fix...

BTW, I noticed you stopped copying the Debian list. All of this
exchange needs to be archived for others, so I'm adding the list back to
the CC.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FBAC22E...@hardwarefreak.com

Ramon Hofer

unread,
May 29, 2012, 12:10:02 PM5/29/12
to
On Sun, 20 May 2012 21:37:19 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

(...)

> Does the mobo BIOS show the disk device? If not, does the 9240 BIOS
> show the disk device, RAID level, and its size?
>
> What we need to figure out is whether this is a BIOS problem at this
> point or a Debian installer kernel driver problem.

I have finally found some time to work on the problem:

I set up a raid1 in the hba bios. I couldn't install onto it with the
supermicro mb.

Then I mounted the lsi hba into my old server with an Asus mb (can't
remember which one it is, must have to check it at home...). It (almost)
works like a charm.
The only issue is that I can't enter the hba BIOS when it's mounted in
the Asus mb. But when I put it back into the Supermicro mb I can access
it again. Very strange!
But apart from that I could install Debian onto the raid1. Then I set
the bios to use the disks as jbods and installed Debian gain to a drive
directly attached to the mb sata controller.
With the original squeeze kernel the disks attached to the hba weren't
visible. But after updating to the bpo kernel I can fdisk them
separately and put it into a raid5 (in the end I want to apply the 500G
partition method Cameleon suggested).


> Did you already flash the C7P67 BIOS to the latest version? I can't
> recall.

I have tried to do that but it was quite strange.
I created a freedos usb stick with unetbootin and copied the files for
the update from supermicro into the stick. I did exactly what the
readmes told me. But when I did it the first time there was no output
of the flash process and the directory where the supermicro files were
located on the stick was empty.
When I tried to do the procedure again it complains that I have to
first install version 1.

I will now bring it to my dealer who can do the BIOS update for me.

And I will write to Supermicro if they are aware of the issue.


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120529140927.10dde651@nb-10114

Stan Hoeppner

unread,
May 30, 2012, 1:50:01 AM5/30/12
to
On 5/29/2012 7:09 AM, Ramon Hofer wrote:
> On Sun, 20 May 2012 21:37:19 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
> (...)
>
>> Does the mobo BIOS show the disk device? If not, does the 9240 BIOS
>> show the disk device, RAID level, and its size?
>>
>> What we need to figure out is whether this is a BIOS problem at this
>> point or a Debian installer kernel driver problem.
>
> I have finally found some time to work on the problem:
>
> I set up a raid1 in the hba bios. I couldn't install onto it with the
> supermicro mb.
>
> Then I mounted the lsi hba into my old server with an Asus mb (can't
> remember which one it is, must have to check it at home...). It (almost)
> works like a charm.
> The only issue is that I can't enter the hba BIOS when it's mounted in
> the Asus mb. But when I put it back into the Supermicro mb I can access
> it again. Very strange!

This behavior isn't strange. Just about every mobo BIOS has an option
to ignore or load option ROMs. On your SuperMicro board this is
controlled by the setting "AddOn ROM Display Mode" under the "Boot
Feature" menu. Your ASUS board likely has a similar feature that is
currently disabled, preventing the LSI option ROM from being loaded.

> But apart from that I could install Debian onto the raid1. Then I set

This was on the ASUS board correct? Were you able to boot the RAID1
device after install? If so this indeed would be strange as you should
not be able to boot from the HBA if its ROM isn't loaded.

> the bios to use the disks as jbods and installed Debian gain to a drive
> directly attached to the mb sata controller.
> With the original squeeze kernel the disks attached to the hba weren't
> visible. But after updating to the bpo kernel I can fdisk them
> separately and put it into a raid5 (in the end I want to apply the 500G
> partition method Cameleon suggested).

This experience with the ASUS board leads me to wonder if disabling the
option ROM and INT19 on the SM board would allow everything to function
properly. Try that before you take the board to the dealer for
flashing. Assuming you've deleted any BIOS configured RAID devices in
the HBA BIOS already and all drives are configured for JBOD mode, drop
the HBA back into the SM board, go into the SM BIOS, set "PCI Slot X
Option ROM" to "DISABLED" where X is the number of the PCIe slot in
which the LSI HBA is inserted. Set "Interrupt 19 Capture" to
"DISABLED". Save settings and reboot.

You should now see the same behavior as on the ASUS, including the HBA
BIOS not showing up during the boot process. Which I'm thinking is the
key to it working on the ASUS as the ROM code is never resident. Thus
it is not causing problems with kernel driver, which is apparently
assuming the 9240 series ROM will not be resident.

This loading of the option ROM code is what some would consider the
difference between "HBA RAID mode" and "HBA JBOD mode".

>> Did you already flash the C7P67 BIOS to the latest version? I can't
>> recall.
>
> I have tried to do that but it was quite strange.
> I created a freedos usb stick with unetbootin and copied the files for
> the update from supermicro into the stick. I did exactly what the
> readmes told me. But when I did it the first time there was no output
> of the flash process and the directory where the supermicro files were
> located on the stick was empty.
> When I tried to do the procedure again it complains that I have to
> first install version 1.

Unfortunately flashing mobo BIOS is still not always an uneventful nor
routine process, even in 2012.

> I will now bring it to my dealer who can do the BIOS update for me.
>
> And I will write to Supermicro if they are aware of the issue.

Try what I mention above before doing either of these things.

Good luck.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FC57CAC...@hardwarefreak.com

Camaleón

unread,
May 30, 2012, 3:00:03 PM5/30/12
to
On Tue, 29 May 2012 14:09:27 +0200, Ramon Hofer wrote:

>> Did you already flash the C7P67 BIOS to the latest version? I can't
>> recall.
>
> I have tried to do that but it was quite strange. I created a freedos
> usb stick with unetbootin and copied the files for the update from
> supermicro into the stick. I did exactly what the readmes told me. But
> when I did it the first time there was no output of the flash process
> and the directory where the supermicro files were located on the stick
> was empty. When I tried to do the procedure again it complains that I
> have to first install version 1.

BIOS revisions are not accumulative (or so it was when computers were
computers -in the true sense of the word- back in the old good days...),
that is, every version patches some aspects of the BIOS firmware code and
you need to apply whatever revision you need that corrects your specific
problem. It can be normal that a BIOS revision requires a previous
version to be installed first but you better ask this to Supermicro tech
support so they confirm this point because every manufacturer has
developed its own "tricks".

> I will now bring it to my dealer who can do the BIOS update for me.
>
> And I will write to Supermicro if they are aware of the issue.

Yes, contact them and they'll tell you how to proceed.

Greetings,

--
Camaleón


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/jq5ckb$l92$9...@dough.gmane.org

Ramon Hofer

unread,
May 30, 2012, 10:00:02 PM5/30/12
to
On Tue, 29 May 2012 20:49:32 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 5/29/2012 7:09 AM, Ramon Hofer wrote:
> > On Sun, 20 May 2012 21:37:19 -0500
> > Stan Hoeppner <st...@hardwarefreak.com> wrote:
> >
> > (...)
> >
> >> Does the mobo BIOS show the disk device? If not, does the 9240
> >> BIOS show the disk device, RAID level, and its size?
> >>
> >> What we need to figure out is whether this is a BIOS problem at
> >> this point or a Debian installer kernel driver problem.
> >
> > I have finally found some time to work on the problem:
> >
> > I set up a raid1 in the hba bios. I couldn't install onto it with
> > the supermicro mb.
> >
> > Then I mounted the lsi hba into my old server with an Asus mb (can't
> > remember which one it is, must have to check it at home...). It
> > (almost) works like a charm.
> > The only issue is that I can't enter the hba BIOS when it's mounted
> > in the Asus mb. But when I put it back into the Supermicro mb I can
> > access it again. Very strange!
>
> This behavior isn't strange. Just about every mobo BIOS has an option
> to ignore or load option ROMs. On your SuperMicro board this is
> controlled by the setting "AddOn ROM Display Mode" under the "Boot
> Feature" menu. Your ASUS board likely has a similar feature that is
> currently disabled, preventing the LSI option ROM from being loaded.

Very interesting! I didn't know that.
The values I can choose for the "AddOn ROM Display Mode" are
"Keep current" and "Force Bios". I have chosen the Force Bios option.
And I have disable the two options you describe below.
In the supermicro the hba's init screen isn't displayed at all now.
On the other hand in the asus I saw the init screen when the attached
discs are listed I just can't enter the configuration program with
ctrl+h although the message to press these keys is shown.

I'm now able to boot into the 2.6.32-5 kernel.
It takes quite a while until the megasas module was loaded (I suppose:
the over-current messages are shown for a while ~2 mins and then it's
boot normally until the login prompt.
When I leave it alone I get the message:

INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.

After booting the first time this evening I installed the bpo 3.2
kernel.
When I try to reboot the stable kernel the system hangs after the
message "Will now restart."

After a while the above message about the blocked task appears again.

The bpo kernel 3.2 seems to fail. The two over current-messages are
shown and then this message:
http://pastebin.com/raw.php?i=XqVunR9e


When I load the stable kernel it stop for a while again after the
over-current message then finally gets to the login prompt. After a
while I got this message:
http://pastebin.com/raw.php?i=w409KaFN


> > But apart from that I could install Debian onto the raid1. Then I
> > set
>
> This was on the ASUS board correct? Were you able to boot the RAID1
> device after install? If so this indeed would be strange as you
> should not be able to boot from the HBA if its ROM isn't loaded.

No I wasn't able to boot the kernel installed to the RAID1. Grub was
loaded but only because I've installed it to the disk directly attached
to the MB's SATA controller.
But when choosing the RAID1 kernel it stopped (can't remember the
message anymore). I thought I haven't set the boot option for the raid1
in the hba bios properly.


> > the bios to use the disks as jbods and installed Debian gain to a
> > drive directly attached to the mb sata controller.
> > With the original squeeze kernel the disks attached to the hba
> > weren't visible. But after updating to the bpo kernel I can fdisk
> > them separately and put it into a raid5 (in the end I want to apply
> > the 500G partition method Cameleon suggested).
>
> This experience with the ASUS board leads me to wonder if disabling
> the option ROM and INT19 on the SM board would allow everything to
> function properly. Try that before you take the board to the dealer
> for flashing. Assuming you've deleted any BIOS configured RAID
> devices in the HBA BIOS already and all drives are configured for
> JBOD mode, drop the HBA back into the SM board, go into the SM BIOS,
> set "PCI Slot X Option ROM" to "DISABLED" where X is the number of
> the PCIe slot in which the LSI HBA is inserted. Set "Interrupt 19
> Capture" to "DISABLED". Save settings and reboot.
>
> You should now see the same behavior as on the ASUS, including the HBA
> BIOS not showing up during the boot process. Which I'm thinking is
> the key to it working on the ASUS as the ROM code is never resident.
> Thus it is not causing problems with kernel driver, which is
> apparently assuming the 9240 series ROM will not be resident.

Maybe I wasn't clear about that. The hba BIOS seems to be loaded in the
asus as well but I just can't enter its setting with ctrl+h.

Does all of this tell us anything :-?


> This loading of the option ROM code is what some would consider the
> difference between "HBA RAID mode" and "HBA JBOD mode".

Well then it seems as if I want to use Linux software raid I would
better keep the setting to disable the loading of the option ROM :-/


> >> Did you already flash the C7P67 BIOS to the latest version? I
> >> can't recall.
> >
> > I have tried to do that but it was quite strange.
> > I created a freedos usb stick with unetbootin and copied the files
> > for the update from supermicro into the stick. I did exactly what
> > the readmes told me. But when I did it the first time there was no
> > output of the flash process and the directory where the supermicro
> > files were located on the stick was empty.
> > When I tried to do the procedure again it complains that I have to
> > first install version 1.
>
> Unfortunately flashing mobo BIOS is still not always an uneventful nor
> routine process, even in 2012.

Yes, I've had issues with both times I tried to do that (now and about
a year ago with an Intel mainboard) :-(
Maybe this should tell me something ;-)


> > I will now bring it to my dealer who can do the BIOS update for me.
> >
> > And I will write to Supermicro if they are aware of the issue.
>
> Try what I mention above before doing either of these things.

I've already mailed both of them on Monday.

The dealer tells me to do anything on my own.

But Supermicro is very helpful. They described how to flash the bios
before they knew about the problem I have with the v1.10 that the BIOS
updater wants me to install first.
They even attached the zip. Unfortunately it wasn't complete (the
installer complained about a missing file).

They're also helping me to install v1.10 but again I can't find a .ROM
file which I should rename according to their instruction in the mail.
So I asked again this evening...

Hopefully I can flash v1.10 to the Supermicro tomorrow and then update
to the newest version.
Maybe I then am already able to boot :-)
Or I try the steps you described about a week ago again and keep the
load option ROM setting off.
If this doesn't help neither I will try the newest firmware from lsi
which has just been released on May 21, 2012.

Is this a good idea or do you have a better advice?


Best regards
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120530235...@hoferr-x61s.hofer.rummelring

Stan Hoeppner

unread,
May 30, 2012, 10:40:01 PM5/30/12
to
I'd get the mobo and HBA BIOS to the latest revs. Then if it still
doesn't work, as I recommended earlier, you need to try another
non-Debian based distro to eliminate the possibility that Debian is
doing something goofy in their kernels. If neither the latest versions
of SuSE nor Fedora work, then it's clear you have an upstream kernel
issue, or a hardware issue. Either way, that gives you good information
to present to LSI Support when you contact them.

Ultimately, if anyone is to have the answer to this mystery, it will be
LSI, or upstream kernel devs, as you've performed pretty much all
possible troubleshooting steps of an end user. You may want to post a
brief description of the problem to the linux-scsi list. The guys who
wrote and maintain the upstream LSI Linux drivers are on that mailing list.

FWIW, LSI certifies the 9240-4i (all their boards actually) as
compatible with all point releases of Debian 5.x. They don't have a
compat doc later than Dec 2010 for this board series, so I'm not sure
what their support policy is for Debian 6.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FC6A149...@hardwarefreak.com

Ramon Hofer

unread,
Jun 6, 2012, 2:40:03 PM6/6/12
to
It's me again.

After several unsuccessful tries to update the BIOS I brought it back to
my dealer to let him do it.
He now says that the mainboard is broken and I get my money back.

Now my question is should I go for the same mainboard again or what do
you recommend?
I suppose the LSI problem was due to the broken mainboard but the
dealer also said that the LSI has the C7P67 not listed as a compatible
board.

What I want to connect to the mainboard is:

2x PCIe x8 for the LSI and the expander
1x PCIe x1 for the graphics card
1-2x PCIe x1 for TeVii sat card(s)
1-2x PCI for PVR-500 analogue TV card(s)

It would be nice if it had a connector for the lan chassis LEDs :-)


Best regards
Ramon


On Wed, 30 May 2012 17:38:01 -0500
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120606163...@hoferr-x61s.hofer.rummelring

Stan Hoeppner

unread,
Jun 7, 2012, 7:40:01 AM6/7/12
to
On 6/6/2012 9:36 AM, Ramon Hofer wrote:
> It's me again.
>
> After several unsuccessful tries to update the BIOS I brought it back to
> my dealer to let him do it.
> He now says that the mainboard is broken and I get my money back.

Interesting...

> Now my question is should I go for the same mainboard again or what do
> you recommend?

What did I previously recommend in this regard? I don't recall
recommending you replace the mainboard/CPU.

> I suppose the LSI problem was due to the broken mainboard but the

Maybe, maybe not. Too early to tell.

> dealer also said that the LSI has the C7P67 not listed as a compatible
> board.

He's simply giving himself an excuse to not help you with your problem.
Ask him how many mobo+GPU combos he's sold that are "listed as
compatible". The answer will be "none, because nobody does that kind of
testing". Then ask how many don't work together. His answer will
likely be "none". This quashes his compatibility list excuse instantly.

You never did call LSI. When you finally do, as I suggested long ago,
they'll also tell you it's not listed. But they will then tell you it
doesn't matter, and that the two boards should work fine together.

There are over 10,000 different motherboards on the market. Nobody
tests against them all, not even close to 1/4th. And in the case of
LSI, they don't test against any board that is not marketed as "server"
or "workstation", as that is their target market. Your board is
neither. The fact that the C7P67 isn't listed has nothing to do with
whether it's compatible with the 9240. The fact it isn't listed is
simply that they chose not to test it because it's a desktop board.

> What I want to connect to the mainboard is:
>
> 2x PCIe x8 for the LSI and the expander

Again, you don't need a PCIe slot for the expander. If you're not
mechanically gifted and are unable to drill holes and screw it to your
chassis using standoffs, which is the preferred mounting type, simply
wrap it up in a non conductive material, such as bubble wrap, tape it
closed with a few wraps of electrical tape, and lay it where there's a
relatively empty space on the floor of the chassis, such as behind the
drive cage. This is ugly but will work, and free up a PCIe x4/x8 slot.
I wish you lived near me, as I'd come over install the expander
correctly, on the chassis floor, wall, top panel, PSU housing, or drive
cage, in less than 30 minutes. And it would look like it was installed
at the case factory.

> 1x PCIe x1 for the graphics card
> 1-2x PCIe x1 for TeVii sat card(s)
> 1-2x PCI for PVR-500 analogue TV card(s)

I'd get another C7P67. There's no reason the LSI shouldn't work with
it. If it doesn't work off the bat with the replacement C7P67 then it's
certain we have a problem with the Debian kernel driver, or that the
wrong one is being loaded. You've not tried mpt2sas yet, only
megaraid_sas, which Debian loads automatically.

> It would be nice if it had a connector for the lan chassis LEDs :-)

LAN chassis? I thought you had a 24 bay rack chassis? The drive cage
LEDs should be powered directly from the SAS/SATA pins on the back of
the drive, through the backplane. If not, then you've got a really
cheap 24 bay case. :(

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD05941...@hardwarefreak.com

Ramon Hofer

unread,
Jun 10, 2012, 2:10:01 PM6/10/12
to
A situation update: Mounted the mobo with the CPU and RAM, attached the
PSU, the OS SATA disk, the LSI and expander as well as the graphics
card. There are no disks attached to the expander because I put them
again into the old NAS and backing up the data from the 1.5 TB disks to
it.

Then I installed Debian Squeeze AMD64 without problems. I don't have
the over-current error messages anymore :-)
But it still hangs at the same time as before.

I removed the LSI and installed the pbo kernel. Mounted the LSI again
and it stopps again at the same place.

I tried the BIOS settings you described earlier. It didn't help too.

So I wanted to update the BIOS. So I created a FreeDOS usb stick and
put the BIOS update files onto it. I got to the DOS prompt ran the
command to install the BIOS (ami.bat ROM.FILE). The prompt was blocked
for some time (about 5-10 mins or even more). Then a message was shown
that the file couldn't be found.
The whole directory where I put the BIOS update file into was empty or
even deleted completely (I can't remember anymore).

I'll try it again afterwards maybe the Supermicro doesn't like my
FreeDOS usb stick. So I'll try it with the Win program Supermicro
proposed [1] to create the usb stick.

If this doesn't help I'll contact LSI and if they want me to update the
BIOS I will ask my dealer again to do it. Probably they will have the
same problems and will have to send the mobo to Supermicro which will
take a month until I have it back :-/


[1]
http://www.softpedia.com/get/System/Boot-Manager-Disk/BootFlashDOS.shtml


On Fri, 08 Jun 2012 18:38:24 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

(...)

> I always do this when I build the system so I don't have to mess with
> it when I need to install more HBAs/cards later. It's a 10 minute
> operation for me so it's better done up front. In you case I
> understand the desire to wait until necessary. However, the better
> airflow alone makes with worth doing. Especially given that the
> heatsink on the 9240 needs good airflow. If it runs hot it might act
> goofy, such as slow data transfer speeds, lockups, etc.

Thanks again very much.
The air flow / cooling argument is very convincing. I haven't thought
about that.

To mount the expander I'll probably have a month available until
the mobo is back ;-)


> > Yes and the fact that I didn't have any problems with the Asus
> > board. I could use LSI RAID1 to install Debian (couldn't boot
> > probably because the option RAM option of the Asus board was
> > disabled). I could also use the JBOD drives to set up a linux RAID.
> > But I didn't mention it before the throughput was very low (100
> > GB/s at the beginning and after some secs/min it went down to ~5
> > GB/s) when I copied recordings from a directly attached WD green 2
> > TB SATA disk to the linux RAID5 containing 4 JBOD drives attached
> > to the expander and the LSI.
> >
> > I hope this was a problem I caused and not the hardware :-/
>
> Too early to tell. You were probably copying through Gnome/KDE
> desktop. Could have been other stuff slow it down, or it could have
> been something to do with the Green drive. They are not known for
> high performance, and people have had lots of problems with them.

Probably the green drives.
I don't have a desktop environment installed on the server. It was done
using `rsync -Pha`.
But it could also be because I've split the RAM from the running server
to have some for the new server. That's why now the running old Asus
server has only 2 GB RAM and on the Supermicro I mounted the other 2 GB
RAM stick (but when the disks are set up I'd like to put some more in).


(...)

> > Exactly and the Asus doesn't. So if you'd have told me get another
> > mobo this would be a option I'd liked to have :-)
> >
> > An other option I was thinking of was using the Asus board for the
> > new server and use the Supermicro for my new desktop. And not the
> > other way around as I had planned to do.
>
> That's entirely up to you. I have no advice here.
>
> Which Asus board is it again?

It was the P7P55D premium.

The only two problems I have with this board is that I'd have to find
the right BIOS settings to enable the LSI online setting program (or
how is it called exactly?) where one can set up the disks as JBOD / HW
RAID.

And that it doesn't have any chassis LAN LED connectors :-o
But this is absolutely not important...


(...)

> > Btw. I saw that the JBOD devices which are seen by Debian from the
> > are e.g. /dev/sda1, /dev/sdb1. When I partition them I get something
> > like /dev/sda1.1, /dev/sda1.2, /dev/sdb1.1, /dev/sdb1.2 (I don't
> > remember exactly if it's only a number behind the point because I
> > think it had a prefix containing one or two character before the
> > number after the point).
>
> I'd have so see more of your system setup. This may be normal
> depending on how/when your mobo sata controller devices are
> enumerated.

Probably yes. I was just confused because it was not consistent with
how Debian names the "normal" drives and partitions.


> BTW, don't put partitions on your mdraid devices before creating the
> array.

Sorry I don't understand what you mean by "don't put partitions on your
mdraid devices before creating the array".
Is it wrong to partition the disks and the do "mdadm --create
--verbose /dev/md0 --auto=yes --level=6
--raid-devices=4 /dev/sda1.1 /dev/sdb1.1 /dev/sdc1.1 /dev/sdd1.1"?

Should I first create an empty array with "mdadm --create
--verbose /dev/md0 --auto=yes --level=6 --raid-devices=0"

And then add the partitions?


> You may be tempted if you have dissimilar size drives and want
> to use all capacity of each drive. I still say don't do it. You
> should always use identical drives for your arrays, whether md based
> or hardware based. Does md require this? No. But there are many
> many reasons to do so. Butt I'm not going to get into all of them
> here, now. Take it on faith for now. ;)

Hmm, that's a very hard decision.
You probably understand that I don't want to buy 20 3 TB drives now. And
still I want to be able to add some 3 TB drives in the future. But at
the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
TB).
Actually I've just saw that the Samsungs are green drives as well.

The reason why I bought green drives is that the server provides
mythbackend, nas, logitech media server, etc.
So it doesn't have much to do but it still should be ready all the time
(if I wan't to listen to music I don't want to power the squeezebox
radio which triggers the server to start up and only when it started I
can listen to music which would probably take >1 min.
So I thought the drives should manage themselves to save some power.

I understand that there may be timing problems. But do they make it
impossible?

What would you do if you?

Let's say I'd "throw away" these disks and go for 3 TB drives. At the
moment four in a RAID 6 array would be enough. So I'd have 6 TB
available.
Then I'd run out of space and want to upgrade with another disk.
Probably it'll still be available but will it also be when I'll have 19
disks and want to add the last one?
Just as an example to explain my worries ;-)


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120610160...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 12, 2012, 1:50:02 PM6/12/12
to
On Sun, 10 Jun 2012 17:30:08 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 6/10/2012 9:00 AM, Ramon Hofer wrote:
> > A situation update: Mounted the mobo with the CPU and RAM, attached
> > the PSU, the OS SATA disk, the LSI and expander as well as the
> > graphics card. There are no disks attached to the expander because
> > I put them again into the old NAS and backing up the data from the
> > 1.5 TB disks to it.
> >
> > Then I installed Debian Squeeze AMD64 without problems. I don't have
> > the over-current error messages anymore :-)
> > But it still hangs at the same time as before.
>
> Try the Wheezy installer. Try OpenSuSE. Try Fedora. If any of these
> work without lockup we know the problem is Debian 6. However...

I didn't do this because it the LSI worked with the Asus mobo and
Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
But I will give it another try...


> Please call LSI support before you attempt any additional
> BIOS/firmware updates.

I mailed them and got this answer:

"Unfortunately, the system board has not been qualified on the hardware
compatibility list for the LSI MegaRAID 9240 series controllers. There
could be any number of reason for this, either it has not yet been
tested or did not pass testing, but the issue is likely an
incompatibility.

It sounds like the issue is related to the bootstrap, so either to
resolve the issue you will have to free up the option ROM space or
limit the number of devices during POST."

This is what you've already told me.
If I understand it right you already told me to try both: free up the
option ROM and limit the number of devices, right?


(...)

> > Thanks again very much.
> > The air flow / cooling argument is very convincing. I haven't
> > thought about that.
>
> Airflow is 80% of the reason the SAS and SATA specifications were
> created.

You've convinced me: I will mount the expander properly to the case :-)


> > It was the P7P55D premium.
> >
> > The only two problems I have with this board is that I'd have to
> > find the right BIOS settings to enable the LSI online setting
> > program (or how is it called exactly?) where one can set up the
> > disks as JBOD / HW RAID.
>
> I already told you how to do this with the C7P67. Read the P7P55D
> manual, BIOS section. There will be a similar parameter to load the
> BIOS ROMs of add in cards.

Ok, thanks!


> > Sorry I don't understand what you mean by "don't put partitions on
> > your mdraid devices before creating the array".
> > Is it wrong to partition the disks and the do "mdadm --create
> > --verbose /dev/md0 --auto=yes --level=6
> > --raid-devices=4 /dev/sda1.1 /dev/sdb1.1 /dev/sdc1.1 /dev/sdd1.1"?
> >
> > Should I first create an empty array with "mdadm --create
> > --verbose /dev/md0 --auto=yes --level=6 --raid-devices=0"
> >
> > And then add the partitions?
>
> Don't partition the drives before creating your md array. Don't
> create partitions on it afterward. Do not use any partitions at
> all. They are not needed. Create the array from the bare drive
> device names. After the array is created format it with your
> preferred filesystem, such as:
>
> ~$ mkfs.xfs /dev/md0

Ok understood. RAID arrays containing partitions are bad.


> > Hmm, that's a very hard decision.
> > You probably understand that I don't want to buy 20 3 TB drives
> > now. And still I want to be able to add some 3 TB drives in the
> > future. But at
>
> Most novices make the mistake of assuming they can only have one md
> RAID device on the system, and if they add disks in the future they
> need to stick them into that same md device. This is absolutely not
> true, and it's not a smart thing to do, especially if it's a parity
> array that requires a reshape, which takes dozens of hours.
> Instead...

Nono, I was aware that I can have several RAID arrays.
My initial plan was to use four disks with the same size and have
several RAID5 devices. But Cameleon from the debian list told me to not
use such big disks (>500 GB) because reshaping takes too long and
another failure during reshaping will kill the data. So she proposed to
use 500 GB partitions and RAID6 with them.

Is there some documentation why partitions aren't good to use?
I'd like to learn more :-)


> > the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
> > TB).
>
> You create two 4 drive md RAID5 arrays, one composed of the four
> identical 1.5TB drives and the other composed of the four identical
> 2TB drives. Then concatenate the two arrays together into an md
> --linear array, similar to this:
>
> ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd] <-- 2.0TB drives

May I ask what the -c 128 option means? The mdadm man page says that -c
is to specify the config file?


> ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh] <-- 1.5TB drives
> ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]

This is very interesting. I didn't know that this is possible :-o
Does it work as well with hw RAID devices from the LSI card?
Since you tell me that RAIDs with partitions aren't wise I'm thinking
about creating hw RAID5 devices with four equally sized disks.

The -C option means that mdadm creates a new array with the
name /dev/md1.
Is it wise to use other names, e.g. /dev/md_2T, /dev/md_1T5
and /dev/md_main?

And is a linear raid array the same as RAID0?


> Then make a write aligned XFS filesystem on this linear device:
>
> ~$ mkfs.xfs -d agcount=11 su=131072,sw=3 /dev/md2

Are there similar options for jfs?
I decided to use jfs when I set up the old server because it's easier
to grow the filesystem.
But when I see the xfs_grow below I'm not sure if xfs wouldn't be the
better choice. Especially because I read in wikipedia that xfs is
integrated in the kernel and to use jfs one has to install additional
packages.

Btw it seems very complicated with all the allocation groups, stripe
units and stripe width.
How do you calculate these number?
And why do both arrays have a stripe width of 384 KB?


> The end result is a 10.5TB XFS filesystem that is correctly write
> stripe aligned to the 384KB stripe width of both arrays. This
> alignment prevents extra costly unaligned RMW operations (which
> happen every time you modify an existing file). XFS uses allocation
> groups for storing files and metadata and it writes to all AGs in
> parallel during concurrent access. Thus, even though the your
> spindles are separated into two different stripes instead of one
> large stripe, you still get the performance of 6 spindles. Two RAID
> 5 arrays actually give better performance, as you will have two md
> threads instead of one, allowing two CPU cores to do md work instead
> of only one with md RAID6.

Is it also true that I will get better performance with two hw RAID5
arrays?


> So now you've run out of space or nearly so, and need to add more.
> Simple. Using four new drives (so our array geometry remains the
> same), say 3TB models, you'd create another RAID5 array:
>
> ~$ mdadm -C /dev/md3 -c 128 -n4 -l5 /dev/sd[ijkl]
>
> Now we grow the linear array:
>
> ~$ mdadm --grow /dev/md0 --add /dev/md3
>
> And now we grow the XFS filesystem:
>
> ~$ xfs_growfs /your/current/xfs/mount_point
>
> Now your 10.5TB XFS filesystem is 19.5TB and has 9TB additional free
> space, with additional AGs, still aligned to the RAID stripe size of
> the md RAID arrays, which are all identical at 384KB. And unlike an
> md reshape of an 8 drive RAID6 array which can take over 36 hours,
> the XFS grow operation takes a few seconds. Creating the new 4 drive
> array will take much longer, but not nearly as long as a reshape of
> an 8 drive array involving 4 new drives.
>
> > Actually I've just saw that the Samsungs are green drives as well.
>
> I fear you may suffer more problems down the road using WDEARS drives
> in md RAID, or any green drives.

What if I loose a complete raid5 array which was part of the linear
raid array? Will I loose the whole content from the linear array as I
would with lvm?


> > The reason why I bought green drives is that the server provides
> > mythbackend, nas, logitech media server, etc.
> > So it doesn't have much to do but it still should be ready all the
> > time (if I wan't to listen to music I don't want to power the
> > squeezebox radio which triggers the server to start up and only
> > when it started I can listen to music which would probably take >1
> > min. So I thought the drives should manage themselves to save some
> > power.
>
> > I understand that there may be timing problems. But do they make it
> > impossible?
>
> Just make sure you don't have any daemons accessing directories on the
> md array(s) and you should be fine. IIRC the WD Green drives go to
> sleep automatically after 30 seconds of inactivity, and park the heads
> after something like <10 seconds. I'll personally never own WD Green
> drives due to their reputation for failure and anemic performance.

Thanks for the advice!

About the failure: This is why I use raid5 and as I don't need very
high performance this doesn't matter for me.
But I understand that they are risky.

I'm still aware that 3 TB raid5 rebuilds take long. Nevertheless I think
I will risk using normal (non-green) disks for the next expansion.

If I'm informed correctly there are not only green drives and normal
desktop drives but also server disks with a higher quality than
desktop disks.

But still I don't want to "waste" energy. Would the Seagate Barracuda
3TB disks be a better choise?


> > What would you do if you?
> >
> > Let's say I'd "throw away" these disks and go for 3 TB drives. At
> > the
>
> I wouldn't. 3TB drives take far too long to rebuild. It takes about
> 8 hours to rebuild one in a mirror pair, something like 30+ hours to
> rebuild a drive in a 6 drive RAID6. If a 3TB drive fails due to
> age/wear, and your drives are identical, the odds of having two more
> drive failures before the rebuild completes are relatively high. If
> this happens, you better have a big full backup handy. Due to this
> and other reasons, I prefer using drives of 1TB or less. My needs are
> different than yours, however--production servers vs home use.

My needs are probably *much* less demanding than yours.
Usually it only has to do read access to the files. Aditionally copying
bluray rips to it. But most of the the it sits around doing nothing
(the raid). MythTV records almost most of the time but to a non RAID
disk.
So I hope with non-green 3 TB disks I can get some security from the
redundancy and still get a lot of disk space.


> > moment four in a RAID 6 array would be enough. So I'd have 6 TB
> > available.
>
> Never build a RAID6 array with less than 6 drives, as RAID10 on 4
> drives gives vastly superior performance vs a 5/6 drive RAID6, and
> rebuild times are drastically lower for mirrors. Rebuilding a RAID1
> 3TB drive takes about 8 hours, and you're looking at something North
> of 30 hours for a 6 drive RAID6 rebuild.

Because you told me that it's not good to use partitions I won't set up
raid6.
Instead I'll go for raid5 with 4 disks.


> > Then I'd run out of space and want to upgrade with another disk.
> > Probably it'll still be available but will it also be when I'll
> > have 19 disks and want to add the last one?
> > Just as an example to explain my worries ;-)
>
> Start with the 4 drive RAID5 arrays I mentioned above. You have a 20
> cage case, 4 rows of 5 cages each. Put each array in its own 4 cage
> row to keep things organized. You have two rows empty as you have 8
> drives currently. When you expand in the future, only expand 4
> drives at a time, using the instructions I provided above. You can
> expand two times. Using 2TB drives you'll get +6TB twice for a total
> of 22.5TB.

This was exactly what I had in mind at the first place. But the
suggestion from Cameleon was so tempting :-)


Btw I have another question:
Is it possible to attach the single (non raid) disk I now have in my old
server for the mythtv recordings to the LSI controller and still have
access to the content when it's configured as jbod?
Since there are recordings which it wouldn't be very bad if I loose
them I'd like to avoid backing this up.


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120612154...@hoferr-x61s.hofer.rummelring

Jeremy T. Bouse

unread,
Jun 12, 2012, 3:40:03 PM6/12/12
to
I don't know if the problem I experienced with a LSI MegaRAID SAS
9265-8i is the same as what you're experiencing but I found that
Debian's kernel provided does not have the megaraid_sas.ko driver new
enough to support the card. I had to go to LSI's site and get the
updated driver from them. Unfortunately they only had a driver compiled
for the 5.0.x installer kernel version and the source didn't compile
cleanly against the 6.0.x kernel header packages. I ended up having to
patch the code to clean up the compile issues and was then able to
compile a megaraid_sas driver that I could use on a 6.0 install.


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD7603E...@undergrid.net

Ramon Hofer

unread,
Jun 12, 2012, 5:20:02 PM6/12/12
to
On Sun, 10 Jun 2012 17:30:08 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

(...)

> You create two 4 drive md RAID5 arrays, one composed of the four
> identical 1.5TB drives and the other composed of the four identical
> 2TB drives. Then concatenate the two arrays together into an md
> --linear array, similar to this:
>
> ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd] <-- 2.0TB drives
> ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh] <-- 1.5TB drives
> ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]

Sorry I have another question to this procedure:

Can I put the raid5 from the old server which was attached over sata
to the LSI and mdadm will still recognize the disks? Will the disks
uuids be the same?

And when I have added the old raid5 which contains the data can I add
this to the linear array and still have the data or will it be lost?

Hmm, probably I have to create a raid5 with the four empty 2 TB disks
attached to the LSI. Then:

~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1

Now I copy the content from the old raid5 with the four 1.5 TB disks to
the new linear md0.


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120612190...@hoferr-x61s.hofer.rummelring

Stan Hoeppner

unread,
Jun 12, 2012, 10:40:01 PM6/12/12
to
On 6/12/2012 8:40 AM, Ramon Hofer wrote:
> On Sun, 10 Jun 2012 17:30:08 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:

>> Try the Wheezy installer. Try OpenSuSE. Try Fedora. If any of these
>> work without lockup we know the problem is Debian 6. However...
>
> I didn't do this because it the LSI worked with the Asus mobo and
> Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
> But I will give it another try...

Your problem may involve more than just the two variables. The problem
may be mobo+LSI+distro_kernel, not just mobo+LSI. This is why I
suggested trying to install other distros.

>> Please call LSI support before you attempt any additional
>> BIOS/firmware updates.

Note I stated "call". You're likely to get more/better
information/assistance speaking to a live person.

> It sounds like the issue is related to the bootstrap, so either to
> resolve the issue you will have to free up the option ROM space or
> limit the number of devices during POST."

This is incorrect advice, as it occurs with the LSI BIOS both enabled
and disabled. Apparently you didn't convey this in your email.

> This is what you've already told me.
> If I understand it right you already told me to try both: free up the
> option ROM and limit the number of devices, right?

No, this person is not talented. You only have one HBA with BIOS to
load. There should be plenty of free memory in the ROM pool area. This
is the case with any mobo. The LSI ROM is big, but not _that_ big as to
eat up all available space. Please don't ask me to explain how option
(i.e. add in card) ROMs are mapped into system memory. That information
is easily found on Wikipedia and in other places. My point here is that
the problem isn't related to insufficient space for mapping ROMs.

> You've convinced me: I will mount the expander properly to the case :-)

There are many SAS expander that can only be mounted to the chassis,
such as this one:

http://www.hellotrade.com/astek-corporation/serial-attached-scsi-expanders-sas-expander-add-in-card.html

> Ok understood. RAID arrays containing partitions are bad.

Not necessarily. It depends on the system. In your system they'd serve
not purpose, and simply complicate your storage stack.

> Nono, I was aware that I can have several RAID arrays.
> My initial plan was to use four disks with the same size and have
> several RAID5 devices.

This is what you should do. I usually recommend RAID10 for many
reasons, but I'm guessing you need more than half of your raw storage
space. RAID10 eats 1/2 of your disks for redundancy. It also has the
best performance by far, and the lowest rebuild times by far. RAID5
eats 1 disk for redundancy, RAID6 eats 2. Both are very slow compared
to RAID10, and both have long rebuild times which increase severely as
the number of drives in the array increases. The drive rebuild time for
RAID10 is the same whether your array has 4 disks or 40 disks.

> But Cameleon from the debian list told me to not
> use such big disks (>500 GB) because reshaping takes too long and
> another failure during reshaping will kill the data. So she proposed to
> use 500 GB partitions and RAID6 with them.

I didn't read the post you refer to, but I'm guessing you misunderstood
what Camale�n stated, as such a thing is simply silly. Running multiple
md arrays on the same set of disks is also silly, and can be detrimental
to performance. For a deeper explanation of this see my recent posts to
the Linux-RAID list.

If you're more concerned with double drive failure during rebuild (not
RESHAPE as you stated) than usable space, make 4 drive RAID10 arrays or
4 drive RAID6s, again, without partitions, using the command examples I
provided as a guide.

> Is there some documentation why partitions aren't good to use?
> I'd like to learn more :-)

Building md arrays from partitions on disks is a means to an end. Do
you have an end that requires these means? If not, don't use
partitions. The biggest reason to NOT use partitions is misalignment on
advanced format drives. The partitioning utilities shipped with
Squeeze, AFAIK, don't do automatic alignment on AF drives.

If you misalign the partitions, RAID5/6 performance will drop by a
factor of 4, or more, during RMW operations, i.e. modifying a file or
directory metadata. The latter case is where you really take the
performance hit as metadata is modified so frequently. Creating md
arrays from bare AF disks avoids partition misalignment.

There have been dozens, maybe hundreds, of articles and blog posts
covering this issue, so I won't elaborate further.

>>> the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
>>> TB).
>>
>> You create two 4 drive md RAID5 arrays, one composed of the four
>> identical 1.5TB drives and the other composed of the four identical
>> 2TB drives. Then concatenate the two arrays together into an md
>> --linear array, similar to this:
>>
>> ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd] <-- 2.0TB drives
>
> May I ask what the -c 128 option means? The mdadm man page says that -c
> is to specify the config file?

Read further down in the "create, build, or grow" section. Here '-c' is
abbreviated '--chunk'.

>
>> ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh] <-- 1.5TB drives
>> ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]
>
> This is very interesting. I didn't know that this is possible :-o

It's called 'nested RAID' and it's quite common on large scale storage
systems (dozens to hundreds of drives) where any single array type isn't
suitable for such disk counts.

> Does it work as well with hw RAID devices from the LSI card?

Your LSI card is an HBA with full RAID functions. It is not however a
full blown RAID card--its ASIC is much lower performance and it has no
cache memory. For RAID1/10 it's probably a toss up at low disk counts
(4-8). At higher disk counts, or with parity RAID, md will be faster.
But given your target workloads you'll likely not notice a difference.

> Since you tell me that RAIDs with partitions aren't wise I'm thinking
> about creating hw RAID5 devices with four equally sized disks.

If your drives were enterprise units with ERC/TLER I'd say go for it.
However, you have 8 drives of the "green" persuasion. Hardware RAID
controllers love to kick drives and mark them as "bad" due to timeouts.
The WD Green drives in particular park heads at something like 6
seconds, and spin down the motors automatically at something like 30
seconds. When accessed, they'll exceed the HBA timeout period before
spinning up and responding, and get kicked from the array.

I recommended this card in response to your inquiry about a good HBA for
md RAID. My recommendation was that you use it in HBA mode, not RAID
mode. It's not going to work well, if at all, with these drives in RAID
mode. I thought we already discussed this. Maybe not.

> The -C option means that mdadm creates a new array with the
> name /dev/md1.

It creates it with the <raiddevice> name you specify. See above.

> Is it wise to use other names, e.g. /dev/md_2T, /dev/md_1T5
> and /dev/md_main?

The md device file names are mostly irrelevant. But I believe the names
are limited to 'md' and a minor number of 0-127. And in fact, I believe
the example i gave above may not work, as I said to create md1 and md2
before md0. mdadm may require the first array you create to be called md0.

Regardless, you're telling us you want to know which array has which
disks by its name. If you forget what md0/1/2/etc is made of, simply
run 'mdadm -D /dev/mdX'.

> And is a linear raid array the same as RAID0?

No. Please see the Wikipedia mdadm page.
http://en.wikipedia.org/wiki/Mdadm

>> Then make a write aligned XFS filesystem on this linear device:
>>
>> ~$ mkfs.xfs -d agcount=11 su=131072,sw=3 /dev/md2
>
> Are there similar options for jfs?

Dunno. Never used as XFS is superior in every way. JFS hasn't seen a
feature release since 2004. It's been in bug fix only mode for 8 years
now. XFS has a development team of about 30 people working at all the
major Linux distros, SGI, and IBM, yes, IBM. It has seen constant
development since it's initial release on IRIX in 1994 and port to Linux
in the early 2000s.

> I decided to use jfs when I set up the old server because it's easier
> to grow the filesystem.

Easier that what? EXT?

> But when I see the xfs_grow below I'm not sure if xfs wouldn't be the
> better choice.

It is, but for dozens of more reasons.

> Especially because I read in wikipedia that xfs is
> integrated in the kernel and to use jfs one has to install additional
> packages.

You must have misread something. The JFS driver was still in mainline
as of 3.2.6, and I'm sure it's still in 3.4 though I've not confirmed
it. So you can build JFS right into your kernel, or as a module. I'd
never use it, nor recommend it, I'm just squaring the record.

> Btw it seems very complicated with all the allocation groups, stripe
> units and stripe width.

Powerful flexibility is often accompanied by a steep learning curve.

> How do you calculate these number?

Beginning users don't. You use the defaults. You are confused right
now because I lifted the lid and you got a peek inside more advanced
configuations. Reading the '-d' section of 'man mkfs.xfs' tells you how
to calculate sunit/swidth, su/sw for different array types and chunk sizes.

Please read the following very carefully. IF you did not want a single
filesystem space across both 4 disk arrays, and the future 12 disks you
may install in that chassis, you CAN format each md array with its own
XFS filesystem using the defaults. In this case, mkfs.xfs will read the
md geometry and create the array with all the correct
parameters--automatically. So there's nothing to calculate, no confusion.

However, you don't want 2 or 6 separate filesystems mounted as something
like:

/data1
...
/data6

in your root directory. You want one big filesystem mounted in your
root as something like '/data' to create subdirs and put files in,
without worrying about how much space you have left in each of 6
filesystems/arrays. Correct?

The advanced configuration I previously gave you allows for one large
XFS across all your arrays. mkfs.xfs is not able to map out the complex
storage geometry of nested arrays automatically, which is why I lifted
the lid and showed you the advanced configuration.

With it you'll get a minimum filesystem bandwidth of ~300MB/s per single
file IO and a maximum of ~600MB/s with 2 or more parallel file IOs, with
two 4-drive arrays. Each additional 4 drive RAID5 array grown into the
md linear array and then into XFS will add ~300MB/s of parallel file
bandwidth, up to a maximum of ~1.5GB/s. This should far exceed your needs.

> And why do both arrays have a stripe width of 384 KB?

You already know the answer. You should anyway:

chunk size = 128KB
RAID level = 5
No. of disks = 4
((4-1)=3)) * 128KB = 384KB

> Is it also true that I will get better performance with two hw RAID5
> arrays?

Assuming for a moment your drives will work in RAID mode with the 9240,
which they won't, the answer is no. Why? Your CPU cores are far faster
than the ASIC on the 9240, and the board has no battery backed cache RAM
to offload write barriers.

If you step up to one of the higher end full up RAID boards with BBWC,
and the required enterprise drives, then the answer would be yes up to
the 20 drives your chassis can hold. As you increase the drive count,
at some point md RAID will overtake any hardware RAID card, as the
533-800MHz single/dual core RAID ASIC just can't keep up with the cores
in the host CPU.

> What if I loose a complete raid5 array which was part of the linear
> raid array? Will I loose the whole content from the linear array as I
> would with lvm?

Answer1: Are you planning on losing an entire RAID5 array? Planning,
proper design, and proper sparing prevents this. If you lose a drive,
replace it and rebuild IMMEDIATELY. Keep a spare drive on hand, or
better yet in standby. Want to eliminate this scenario? Use RAID10 or
RAID6, and live with the lost drive space. And still replace/rebuild a
dead drive immediately.

Answer2: It depends. If this were to happen, XFS will automatically
unmount the filesystem. At that point you run xfs_repair. If the array
that died contained the superblock and AG0 you've probably lost
everything. If it did not, the repair may simply shrink the filesystem
and repair any damaged inodes, leaving you with whatever was stored on
the healthy RAID5 array.

> I'm still aware that 3 TB raid5 rebuilds take long.

3TB drive rebuilds take forever, period. As I mentioned, it takes ~8
hours to rebuild a mirror.

> Nevertheless I think
> I will risk using normal (non-green) disks for the next expansion.

What risk? Using 'normal' drives will tend to reduce RAID related green
drive problems.

> If I'm informed correctly there are not only green drives and normal
> desktop drives but also server disks with a higher quality than
> desktop disks.

Yes, and higher performance. They're called "enterprise" drives. There
are many enterprise models: 7.2K SATA/SAS, 10K SATA/SAS, 15K SAS, 2.5"
and 3.5"

> But still I don't want to "waste" energy.

Manufacturing a single drive consumes as much energy as 4 drives running
for 3 years. Green type drives tend to last half as long due to all the
stop/start cycles wearing out the spindle bearings. Do the math. The
net energy consumption of 'green' drives is therefore equal to or higher
than 'normal' drives. The only difference is that a greater amount of
power is consumed by the drive before you even buy it. The same
analysis is true of CFL bulbs. They consume more total energy through
their life cycle than incandescents.

> Would the Seagate Barracuda
> 3TB disks be a better choise?

Is your 10.5TB full already? You don't even have the system running yet...

> My needs are probably *much* less demanding than yours.
> Usually it only has to do read access to the files. Aditionally copying
> bluray rips to it. But most of the the it sits around doing nothing
> (the raid). MythTV records almost most of the time but to a non RAID
> disk.
> So I hope with non-green 3 TB disks I can get some security from the
> redundancy and still get a lot of disk space.

If you have a good working UPS, good airflow (that case does), and
decent quality drives, you shouldn't have to worry much. I'm unsure of
the quality of the 3TB Barracuda, haven't read enough about it.

Are you planning on replacing all your current drives with 4x 3TB
drives? Or going with the linear over RAID5 architecture I recommended,
and adding 4x 3TB drives into the mix?

> This was exactly what I had in mind at the first place. But the
> suggestion from Cameleon was so tempting :-)

Cameleon helps many people with many Debian/Linux issues and is very
knowledgeable in many areas. But I don't recall anyone accusing her of
being a storage architect. ;)

> Btw I have another question:
> Is it possible to attach the single (non raid) disk I now have in my old
> server for the mythtv recordings to the LSI controller and still have
> access to the content when it's configured as jbod?
> Since there are recordings which it wouldn't be very bad if I loose
> them I'd like to avoid backing this up.

Drop it in a drive sled, plug it into the backplane, and find out. If
you configure it for JBOD the LSI shouldn't attempt writing any metadata
to it.

--
Stan



--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD7C313...@hardwarefreak.com

Stan Hoeppner

unread,
Jun 12, 2012, 11:40:01 PM6/12/12
to
On 6/12/2012 12:05 PM, Ramon Hofer wrote:
> On Sun, 10 Jun 2012 17:30:08 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
> (...)
>
>> You create two 4 drive md RAID5 arrays, one composed of the four
>> identical 1.5TB drives and the other composed of the four identical
>> 2TB drives. Then concatenate the two arrays together into an md
>> --linear array, similar to this:
>>
>> ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd] <-- 2.0TB drives
>> ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh] <-- 1.5TB drives
>> ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]
>
> Sorry I have another question to this procedure:
>
> Can I put the raid5 from the old server which was attached over sata
> to the LSI and mdadm will still recognize the disks? Will the disks
> uuids be the same?

Assuming you create it again with the same device order and parameters,
yes, it should work. You _need_ to ask for assistance with this on the
linux-raid list. I've never done it. People there have, and can
explain in much better. It is not a simple process for anyone who has
not done it, and is fraught with pitfalls.

> And when I have added the old raid5 which contains the data can I add
> this to the linear array and still have the data or will it be lost?

No, you cannot add it to the linear array without wiping it first.
Beyond that, if the drive count is not 4, or not RAID level 5, or the
other parameters are different, then you can't use it in the linear
array. As I already mentioned, all the RAID parameters but for disk
size must be the same.

> Hmm, probably I have to create a raid5 with the four empty 2 TB disks
> attached to the LSI. Then:
>
> ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1

WTF?

> Now I copy the content from the old raid5 with the four 1.5 TB disks to
> the new linear md0.

Shuffleboard... You didn't previously make clear that not all 8 disks
were freely available to build your stack from the ground up. The
instructions I gave you assumed that all 8 drives were clean. Now
you're attempting to modify the precise instructions I gave you and play
shuffleboard with your data and disks, attempting to migrate on the fly.

This may not have a good outcome. I guess you feel that you understand
this stuff and are confident in your ability at this point to effect the
outcome you desire. If things break badly, I'll try to assist, but I
make no promises WRT outcomes nor guarantee my availability.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD7D25...@hardwarefreak.com

Ramon Hofer

unread,
Jun 13, 2012, 6:10:02 PM6/13/12
to
On Tue, 12 Jun 2012 18:35:57 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> > Hmm, probably I have to create a raid5 with the four empty 2 TB
> > disks attached to the LSI. Then:
> >
> > ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1
>
> WTF?

I also had to add --force to create the array with one raid5.


> > Now I copy the content from the old raid5 with the four 1.5 TB
> > disks to the new linear md0.
>
> Shuffleboard... You didn't previously make clear that not all 8 disks
> were freely available to build your stack from the ground up. The
> instructions I gave you assumed that all 8 drives were clean. Now
> you're attempting to modify the precise instructions I gave you and
> play shuffleboard with your data and disks, attempting to migrate on
> the fly.

Sorry, I wrote this too in another thread :-(
I like taking some risk :-)

But since the old raid isn't written only read I didn't fear to loose
the data.


> This may not have a good outcome. I guess you feel that you
> understand this stuff and are confident in your ability at this point
> to effect the outcome you desire. If things break badly, I'll try to
> assist, but I make no promises WRT outcomes nor guarantee my
> availability.

I wanted to finally do something ;-)


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120613200...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 13, 2012, 7:30:02 PM6/13/12
to
On Tue, 12 Jun 2012 17:30:43 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 6/12/2012 8:40 AM, Ramon Hofer wrote:
> > On Sun, 10 Jun 2012 17:30:08 -0500
> > Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
> >> Try the Wheezy installer. Try OpenSuSE. Try Fedora. If any of
> >> these work without lockup we know the problem is Debian 6.
> >> However...
> >
> > I didn't do this because it the LSI worked with the Asus mobo and
> > Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
> > But I will give it another try...
>
> Your problem may involve more than just the two variables. The
> problem may be mobo+LSI+distro_kernel, not just mobo+LSI. This is
> why I suggested trying to install other distros.

Aha, this is true - didn't think about this...


> >> Please call LSI support before you attempt any additional
> >> BIOS/firmware updates.
>
> Note I stated "call". You're likely to get more/better
> information/assistance speaking to a live person.

I didn't have enough confidence in my oral english :-(


> > It sounds like the issue is related to the bootstrap, so either to
> > resolve the issue you will have to free up the option ROM space or
> > limit the number of devices during POST."
>
> This is incorrect advice, as it occurs with the LSI BIOS both enabled
> and disabled. Apparently you didn't convey this in your email.

I will write it to them again.
But to be honest I think I'll leave the Supermicro and use it for my
Desktop.


(...)

> > Nono, I was aware that I can have several RAID arrays.
> > My initial plan was to use four disks with the same size and have
> > several RAID5 devices.
>
> This is what you should do. I usually recommend RAID10 for many
> reasons, but I'm guessing you need more than half of your raw storage
> space. RAID10 eats 1/2 of your disks for redundancy. It also has the
> best performance by far, and the lowest rebuild times by far. RAID5
> eats 1 disk for redundancy, RAID6 eats 2. Both are very slow compared
> to RAID10, and both have long rebuild times which increase severely as
> the number of drives in the array increases. The drive rebuild time
> for RAID10 is the same whether your array has 4 disks or 40 disks.

Yes, I think for me raid5 is sufficient. I don't need extreme
performance nor extreme security. I just hope that the raid5 setup will
be enough safe :-)


> If you're more concerned with double drive failure during rebuild (not
> RESHAPE as you stated) than usable space, make 4 drive RAID10 arrays
> or 4 drive RAID6s, again, without partitions, using the command
> examples I provided as a guide.

Well this is just multimedia data stored on this server. So if I loose
it it won't kill me :-)


> > Is there some documentation why partitions aren't good to use?
> > I'd like to learn more :-)
>
> Building md arrays from partitions on disks is a means to an end. Do
> you have an end that requires these means? If not, don't use
> partitions. The biggest reason to NOT use partitions is misalignment
> on advanced format drives. The partitioning utilities shipped with
> Squeeze, AFAIK, don't do automatic alignment on AF drives.

Ok, I was just confused because most the tutorials (or at least most of
the ones I found) use partitions over the whole disk...


> If you misalign the partitions, RAID5/6 performance will drop by a
> factor of 4, or more, during RMW operations, i.e. modifying a file or
> directory metadata. The latter case is where you really take the
> performance hit as metadata is modified so frequently. Creating md
> arrays from bare AF disks avoids partition misalignment.

So if I can make things simpler I'm happy :-)


> > Does it work as well with hw RAID devices from the LSI card?
>
> Your LSI card is an HBA with full RAID functions. It is not however a
> full blown RAID card--its ASIC is much lower performance and it has no
> cache memory. For RAID1/10 it's probably a toss up at low disk counts
> (4-8). At higher disk counts, or with parity RAID, md will be faster.
> But given your target workloads you'll likely not notice a difference.

You're right.
I just had the impression that you'd suggested that I'd use the hw raid
capability of the lsi at the beginning of this conversation.


> >> Then make a write aligned XFS filesystem on this linear device:
> >>
> >> ~$ mkfs.xfs -d agcount=11 su=131072,sw=3 /dev/md2
> >
> > Are there similar options for jfs?
>
> Dunno. Never used as XFS is superior in every way. JFS hasn't seen a
> feature release since 2004. It's been in bug fix only mode for 8
> years now. XFS has a development team of about 30 people working at
> all the major Linux distros, SGI, and IBM, yes, IBM. It has seen
> constant development since it's initial release on IRIX in 1994 and
> port to Linux in the early 2000s.

I must have read outdated wikis (mostly from the mythtv project).


> > Especially because I read in wikipedia that xfs is
> > integrated in the kernel and to use jfs one has to install
> > additional packages.
>
> You must have misread something. The JFS driver was still in mainline
> as of 3.2.6, and I'm sure it's still in 3.4 though I've not confirmed
> it. So you can build JFS right into your kernel, or as a module. I'd
> never use it, nor recommend it, I'm just squaring the record.

I found this information in the german wikipedia
(http://de.wikipedia.org/wiki/XFS_%28Dateisystem%29):

"... Seit Kernel-Version 2.6 ist es offizieller Bestandteil des
Kernels. ..."

Translated: Since kernel version 2.6 it's an official part of the
kernel.

Maybe I misunderstood this sentence in what the writer meant or maybe
it's even wrong what they wrote in the first place :-?


> > Btw it seems very complicated with all the allocation groups, stripe
> > units and stripe width.
>
> Powerful flexibility is often accompanied by a steep learning curve.

True :-)


> > How do you calculate these number?
>
> Beginning users don't. You use the defaults. You are confused right
> now because I lifted the lid and you got a peek inside more advanced
> configuations. Reading the '-d' section of 'man mkfs.xfs' tells you
> how to calculate sunit/swidth, su/sw for different array types and
> chunk sizes.

Ok if I read it right it divides the array into 11 allocation groups,
with 131072 byte blocks and 3 stripe units as stripe width.
But where do you know what numbers to use?
Maybe I didn't read the man carefully enough then I'd like to
appologize :-)


> Please read the following very carefully. IF you did not want a
> single filesystem space across both 4 disk arrays, and the future 12
> disks you may install in that chassis, you CAN format each md array
> with its own XFS filesystem using the defaults. In this case,
> mkfs.xfs will read the md geometry and create the array with all the
> correct parameters--automatically. So there's nothing to calculate,
> no confusion.
>
> However, you don't want 2 or 6 separate filesystems mounted as
> something like:
>
> /data1
> ...
> /data6
>
> in your root directory. You want one big filesystem mounted in your
> root as something like '/data' to create subdirs and put files in,
> without worrying about how much space you have left in each of 6
> filesystems/arrays. Correct?

Yes, this is very handy :-)


> The advanced configuration I previously gave you allows for one large
> XFS across all your arrays. mkfs.xfs is not able to map out the
> complex storage geometry of nested arrays automatically, which is why
> I lifted the lid and showed you the advanced configuration.

Ok, this is very nice!
But will it also work for any disk size (1.5, 2 and 3 TB drives)?


> With it you'll get a minimum filesystem bandwidth of ~300MB/s per
> single file IO and a maximum of ~600MB/s with 2 or more parallel file
> IOs, with two 4-drive arrays. Each additional 4 drive RAID5 array
> grown into the md linear array and then into XFS will add ~300MB/s of
> parallel file bandwidth, up to a maximum of ~1.5GB/s. This should
> far exceed your needs.

This really is enough for my needs :-)

> > And why do both arrays have a stripe width of 384 KB?
>
> You already know the answer. You should anyway:
>
> chunk size = 128KB

This is what I don't know.
Is this a characteristic of the disk?


> RAID level = 5
> No. of disks = 4
> ((4-1)=3)) * 128KB = 384KB

This is traceable.


> > Is it also true that I will get better performance with two hw RAID5
> > arrays?
>
> Assuming for a moment your drives will work in RAID mode with the
> 9240, which they won't, the answer is no. Why? Your CPU cores are
> far faster than the ASIC on the 9240, and the board has no battery
> backed cache RAM to offload write barriers.
>
> If you step up to one of the higher end full up RAID boards with BBWC,
> and the required enterprise drives, then the answer would be yes up to
> the 20 drives your chassis can hold. As you increase the drive
> count, at some point md RAID will overtake any hardware RAID card, as
> the 533-800MHz single/dual core RAID ASIC just can't keep up with the
> cores in the host CPU.

Very interesting!


> > What if I loose a complete raid5 array which was part of the linear
> > raid array? Will I loose the whole content from the linear array as
> > I would with lvm?
>
> Answer1: Are you planning on losing an entire RAID5 array? Planning,
> proper design, and proper sparing prevents this. If you lose a drive,
> replace it and rebuild IMMEDIATELY. Keep a spare drive on hand, or
> better yet in standby. Want to eliminate this scenario? Use RAID10
> or RAID6, and live with the lost drive space. And still
> replace/rebuild a dead drive immediately.
>
> Answer2: It depends. If this were to happen, XFS will automatically
> unmount the filesystem. At that point you run xfs_repair. If the
> array that died contained the superblock and AG0 you've probably lost
> everything. If it did not, the repair may simply shrink the
> filesystem and repair any damaged inodes, leaving you with whatever
> was stored on the healthy RAID5 array.

This sounds suitable for my needs.
Just another question: The linear raid will distribute the data to the
containing raid5 arrays?
Or will it fill up the first one and continue with the second and so on?


> > I'm still aware that 3 TB raid5 rebuilds take long.
>
> 3TB drive rebuilds take forever, period. As I mentioned, it takes ~8
> hours to rebuild a mirror.
>
> > Nevertheless I think
> > I will risk using normal (non-green) disks for the next expansion.
>
> What risk? Using 'normal' drives will tend to reduce RAID related
> green drive problems.

Ok, I will use normal drives in the future and hope that the green
drives wont give up at the same time :-/


> > If I'm informed correctly there are not only green drives and normal
> > desktop drives but also server disks with a higher quality than
> > desktop disks.
>
> Yes, and higher performance. They're called "enterprise" drives.
> There are many enterprise models: 7.2K SATA/SAS, 10K SATA/SAS, 15K
> SAS, 2.5" and 3.5"
>
> > But still I don't want to "waste" energy.
>
> Manufacturing a single drive consumes as much energy as 4 drives
> running for 3 years. Green type drives tend to last half as long due
> to all the stop/start cycles wearing out the spindle bearings. Do
> the math. The net energy consumption of 'green' drives is therefore
> equal to or higher than 'normal' drives. The only difference is that
> a greater amount of power is consumed by the drive before you even
> buy it. The same analysis is true of CFL bulbs. They consume more
> total energy through their life cycle than incandescents.

Hmm, I knew that for hybrid cars but never thought about this for
hdds.


> > Would the Seagate Barracuda
> > 3TB disks be a better choise?
>
> Is your 10.5TB full already? You don't even have the system running
> yet...

No, but I like living in the future ;-)


> > My needs are probably *much* less demanding than yours.
> > Usually it only has to do read access to the files. Aditionally
> > copying bluray rips to it. But most of the the it sits around doing
> > nothing (the raid). MythTV records almost most of the time but to a
> > non RAID disk.
> > So I hope with non-green 3 TB disks I can get some security from the
> > redundancy and still get a lot of disk space.
>
> If you have a good working UPS, good airflow (that case does), and
> decent quality drives, you shouldn't have to worry much. I'm unsure
> of the quality of the 3TB Barracuda, haven't read enough about it.
>
> Are you planning on replacing all your current drives with 4x 3TB
> drives? Or going with the linear over RAID5 architecture I
> recommended, and adding 4x 3TB drives into the mix?

I'm planning to keep the drives I have now and add 4x 3TB into the mix.


> > This was exactly what I had in mind at the first place. But the
> > suggestion from Cameleon was so tempting :-)
>
> Cameleon helps many people with many Debian/Linux issues and is very
> knowledgeable in many areas. But I don't recall anyone accusing her
> of being a storage architect. ;)

Her suggestion seemed very tempting because it would give me a raid6
without having to loose too much storage space.
She really knows a lot so I was just happy with her suggesting me this
setup.


> > Btw I have another question:
> > Is it possible to attach the single (non raid) disk I now have in
> > my old server for the mythtv recordings to the LSI controller and
> > still have access to the content when it's configured as jbod?
> > Since there are recordings which it wouldn't be very bad if I loose
> > them I'd like to avoid backing this up.
>
> Drop it in a drive sled, plug it into the backplane, and find out. If
> you configure it for JBOD the LSI shouldn't attempt writing any
> metadata to it.

Ok, thanks I will do that :-)


Again thanks alot for all your help and your patience with me.
Certainly not always easy ;-)


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120613212...@hoferr-x61s.hofer.rummelring

Stan Hoeppner

unread,
Jun 14, 2012, 8:40:02 AM6/14/12
to
On 6/13/2012 2:22 PM, Ramon Hofer wrote:
> On Tue, 12 Jun 2012 17:30:43 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:

This chain is so long I'm going to liberally snip lots of stuff already
covered. Hope that's ok.

>> Note I stated "call". You're likely to get more/better
>> information/assistance speaking to a live person.
>
> I didn't have enough confidence in my oral english :-(

Understood. Didn't realize that could be an issue. Apologies for my
'cultural insensitivity". ;)

>> This is incorrect advice, as it occurs with the LSI BIOS both enabled
>> and disabled. Apparently you didn't convey this in your email.
>
> I will write it to them again.
> But to be honest I think I'll leave the Supermicro and use it for my
> Desktop.

If you're happy with an Asus+LSI server and SuperMicro PC, and it all
works the way you want, I'd not bother with further troubleshooting either.

>> Building md arrays from partitions on disks is a means to an end. Do
>> you have an end that requires these means? If not, don't use
>> partitions. The biggest reason to NOT use partitions is misalignment
>> on advanced format drives. The partitioning utilities shipped with
>> Squeeze, AFAIK, don't do automatic alignment on AF drives.
>
> Ok, I was just confused because most the tutorials (or at least most of
> the ones I found) use partitions over the whole disk...

Most of the md tutorials were written long before AF drives became
widespread, which has been a relatively recent phenomenon, the last 2
years or so.

It seems md atop partitions is recommended by two classes of users:

1. Ultra cheap bastards who buy "drive of the week".
2. Those who want to boot from disks in an md array

I'd rather not fully explain this due to space. If you reread your
tutorials and other ones, you'll start to understand.

>> If you misalign the partitions, RAID5/6 performance will drop by a
>> factor of 4, or more, during RMW operations, i.e. modifying a file or
>> directory metadata. The latter case is where you really take the
>> performance hit as metadata is modified so frequently. Creating md
>> arrays from bare AF disks avoids partition misalignment.
>
> So if I can make things simpler I'm happy :-)

Simpler is not always better, but it is most of the time.

The only caveat to using md on bare drives is that all members should
ideally be of identical size. If they're not, md takes the sector count
of the smallest drive and uses that number of sectors on all the others.
If you try to add a drive later whose sector count is less, it won't
work. "Drive of the week" buyer applies here. ;)

More savvy users don't add drives to and reshape their arrays. They add
an entire new array, add it to an existing umbrella linear array, then
grow their XFS filesystem over it. There is zero downtime or degraded
access to current data with this method. Reshaping runs for a day or
more and data access, especially writes, is horribly slow during the
process.

Misguided souls who measure their array performance exclusively with
single stream 'dd' reads instead of real workload will balk at this
approach. They're also the crowd that promotes using md over
partitions. ;)

> You're right.
> I just had the impression that you'd suggested that I'd use the hw raid
> capability of the lsi at the beginning of this conversation.

I did. And if you could, you should. And you did HW RAID with the SM
board, but the Debian kernel locks up. With the Asus board you can't
seem get into the HBA BIOS to configure HW RAID. So it's really not an
option now. The main reason for it is automatic rebuild on failure.
But since you don't have dedicated spare drives that advantage goes out
the window. So md RAID is fine.

> I must have read outdated wikis (mostly from the mythtv project).

Trust NASA more than MythTV users? From:
http://www.nas.nasa.gov/hecc/resources/columbia.html

Storage
Online: DataDirect Networks� and LSI� RAID, 800 TB (raw)
...
Local SGI XFS

That 800TB is carved up into a handful of multi-hundred TB XFS
filesystems. It's mostly used for scratch space during sim runs. They
have a multi-petabyte CXFS filesystem for site wide archival storage.
NASA is but one of many sites with multi-hundred TB XFS filesystems
spanning hundreds of disk drives.

IBM unofficially abandoned GFS on Linux, which is why it hasn't seen a
feature release since 2004. Enhanced JFS, called JFS2, is proprietary,
and is only available on IBM pSeries servers.

MythTV users running JFS are simply unaware of these facts, and use JFS
because it still works for them, and that's great. Choice and freedom
are good things. But if they're stating it's better than XFS they're
hitting the crack pipe too often. ;)

> Translated: Since kernel version 2.6 it's an official part of the
> kernel.
>
> Maybe I misunderstood this sentence in what the writer meant or maybe
> it's even wrong what they wrote in the first place :-?

What they wrote is correct. JFS has been in Linux mainline since the
release of Linux 2.6, which was ... December 2003, 8.5 years ago. Then
IBM abandoned Linux JFS not long after.

> Ok if I read it right it divides the array into 11 allocation groups,
> with 131072 byte blocks and 3 stripe units as stripe width.
> But where do you know what numbers to use?
> Maybe I didn't read the man carefully enough then I'd like to
> appologize :-)

'man mkfs.xfs' won't tell you how to calculate how many AGs you need.
mkfs.xfs creates agcount and agsize automatically using an internal
formula unless you manually specify valid values. Though I can tell you
how it works. Note: the current max agsize=1TB

1. Defaults to 4 AGs if the device is < 4TB and not a single level
md striped array. This is done with singe disks, linear arrays,
hardware RAIDs, SANs. Linux/XFS have no standard interface to
query hardware RAID device parms. There's been talk of an
industry standard interface but no publication/implementation.
So for hardware RAID may need to set some parms manually for best
performance. You can always use mkfs.xfs defaults and it will work.
You simply don't get all the performance of the hardware.

2. If device is a single level md striped array, AGs=16, unless the
device size is > 16TB. In that case AGs=device_size/1TB.

3. What 'man mkfs.xfs' does tell you is how to manually configure the
stripe parms. It's easy. You match the underlying RAID parms.
E.g. 16 drive RAID 10 with 64KB chunk. RAID 10 has n/2 stripe
spindles. 16/2 = 8

~$ mkfs.xfs -d su=64k,sw=8 /dev/sda

E.g. 8 drive RAID6 with 128KB chunk. RAID6 has n-2 stripe
spindles. 8-2 = 6

~$ mkfs.xfs -d su=128k,sw=6 /dev/sda

E.g. 3 drive RAID5 with 256KB chunk. RAID5 has n-1 stripe
spindles. 3-1 = 2

~$ mkfs.xfs -d su=256k,sw=2 /dev/sda

The above are basic examples and we're letting mkfs.xfs choose the
number of AGs based on total capacity. You typically only specify
agcount or agsize manually in advanced configurations when you're tuning
XFS to a storage architecture for a very specific application workload,
such as a high IOPS maildir server. I've posted examples of this
advanced storage architectures and mkfs.xfs previously on the dovecot
and XFS lists if you care to search for them. In them I show how to
calculate a custom agcount to precisely match the workload IO pattern to
each disk spindle, using strictly allocation group layout to achieve
full workload concurrency without any disk striping, only mirroring.

>> The advanced configuration I previously gave you allows for one large
>> XFS across all your arrays. mkfs.xfs is not able to map out the
>> complex storage geometry of nested arrays automatically, which is why
>> I lifted the lid and showed you the advanced configuration.
>
> Ok, this is very nice!
> But will it also work for any disk size (1.5, 2 and 3 TB drives)?

All of the disks in each md array should to be the same size, preferably
identical disks from the same vendor, for the best outcome. But each
array can use different size disks, such as what you have now. One
array of 4x1.5TB, another array of 4x2TB. Your next array could be
4x1TB or 4x3TB. You could go with more or fewer drives per array, but
if you do it will badly hose your xfs stripe alignment, and performance
to the new array will be so horrible that you will notice it, big time,
even though you need no performance. Stick to adding sets of 4 drives
with the same md RAID5 parms and you'll be happy. Deviate from that,
and you'll be very sad, ask me for help, and then I'll be angry, as it's
impossible to undo this design and start over. This isn't unique to XFS.

>> chunk size = 128KB
>
> This is what I don't know.
> Is this a characteristic of the disk?

No. I chose this based on your workload description. The mdadm default
is 64KB. Different workloads work better with different chunk sizes.
There is no book or table with headings "workload" and "chunk size" to
look at. People who set a manual chunk/strip size either have a lot of
storage education, either self or formal, or they make an educated
guess--or both. Multi-streaming video capture to high capacity drives
typically works best with an intermediate strip/chunk size with few
stripe members in the array. If you had 8 drives per array I'd have
left it at 64KB, the default. I'm sure you can find many
recommendations on strip/stripe size in the MythTV forums. They may
vary widely, but if you read enough posts you'll find a rough consensus.
And it may even contradict what I've recommended. I've never used
MythTV. My recommendation is based on general low level IO for
streaming video.

> Just another question: The linear raid will distribute the data to the
> containing raid5 arrays?

Unfortunately you jumped the gun and created your XFS atop a single
array, but with the agcount I gave you for the two arrays combined. As
I mentioned in a previous reply (which was off list I think), you now
have too many AGs. To answer your question, the first dir you make is
created in AG0, the second dir in AG1, and so on, until you hit AG11.
The next dir you make will be in AG0 and cycle begins anew.

Since you're copying massive dir counts and files to the XFS, your files
aren't being spread across all 6 drives of two RAID5s. Once you've
copied all the data over, wipe those 1.5s, create an md RAID5, grow them
into the linear array, and grow XFS, only new dirs and files you create
AFTER the grow operation will be able to hit the new set of 3 disks. On
top of that, because your agcount is way too high, XFS will continue
creating new dirs and files in the original RAID5 array until it fills
up. At that point it will write all new stuff to the second RAID5.

This may not be a problem as you said your performance needs are very
low. But that's not the way I designed it for you. I was working under
the assumption you would have both RAID5s available from the beginning.
If that had been so, your dirs/files would have been spread fairly
evenly over all 6 disks of the two RAID5 arrays, and only the 3rd future
array would get an unbalanced share.

> Or will it fill up the first one and continue with the second and so on?

I already mostly answered this above. Due to what has transpired it
will behave more in this fashion than by the design parameters which
would have given fairly even spread across all disks.

>> Manufacturing a single drive consumes as much energy as 4 drives
>> running for 3 years. Green type drives tend to last half as long due
>> to all the stop/start cycles wearing out the spindle bearings. Do
>> the math. The net energy consumption of 'green' drives is therefore
>> equal to or higher than 'normal' drives. The only difference is that
>> a greater amount of power is consumed by the drive before you even
>> buy it. The same analysis is true of CFL bulbs. They consume more
>> total energy through their life cycle than incandescents.
>
> Hmm, I knew that for hybrid cars but never thought about this for
> hdds.

Take a tour with me...

Drive chassis are made from cast aluminum ingots with a CNC machine
Melting point of Al is 660 �C

Drive platters are made of glass and aluminum, and coated with a
specially formulated magnetic film.
Melting point of Si is 1400 �C

It takes a tremendous amount of natural gas or electricity-depending on
the smelting furnace type--to generate the 660 �C and 1400 �C temps
needed to melt these materials. Then you burn the fuel to ship the
ingots and platters from the foundries to the drive factories, possibly
an overseas trip. Then you have all the electricity consumed by the
milling machines, stamping/pressing machines, screw guns, etc. Then we
have factory lighting, air conditioning, particle filtration systems,
etc. Then we have fuel consumed by the cars and buses that transport
the workforce to and from the factory, the trucks that drive the pallets
of finished HDDs to the port, the cranes that load them on the ships,
and the fuel the ships burn bringing the drives from Thailand,
Singapore, and China to Northern Europe and the US.

As with any manufacturing, there is much energy consumption involved.

>>> Would the Seagate Barracuda
>>> 3TB disks be a better choise?
>>
>> Is your 10.5TB full already? You don't even have the system running
>> yet...
>
> No, but I like living in the future ;-)

It may be 2-3 years before you need new drives. All the current models
and their reputations will have changed by then. Ask a few months
before your next drive purchase. Now is too early.

> I'm planning to keep the drives I have now and add 4x 3TB into the mix.

The more flashing LEDs the better. :) Fill-r-up.

> Her suggestion seemed very tempting because it would give me a raid6
> without having to loose too much storage space.
> She really knows a lot so I was just happy with her suggesting me this
> setup.

That's the "ghetto" way of getting what you wanted. And there are many
downsides to it. Which is why I suggested a much better, more sane, way.

> Again thanks alot for all your help and your patience with me.
> Certainly not always easy ;-)

You're very welcome Ramon.

Nah, if I've seemed short of frustrated at times, that's my fault, not
yours. I'm glad to help. And besides, I was obliged to because the
unique hardware combo I personally, specifically, recommended wasn't
working for you, in a mobo it *should* work in. And I recommended the
md linear over stripe w/XFS, and nobody here would have been able to
help you with that either. But in the end you're going to have
something unique, fits your needs, performs well enough, and is built
from best of breed hardware and software. So you can be proud of this
solution, show it off it you have like minded friends. And you can tell
them you're running what NASA supercomputers run. ;)

Sorry I didn't meet my goal of making this shorter than previous replies. ;)

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD9A0E5...@hardwarefreak.com

Stan Hoeppner

unread,
Jun 14, 2012, 8:40:02 AM6/14/12
to
On 6/13/2012 1:04 PM, Ramon Hofer wrote:
> On Tue, 12 Jun 2012 18:35:57 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
>>> Hmm, probably I have to create a raid5 with the four empty 2 TB
>>> disks attached to the LSI. Then:
>>>
>>> ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1
>>
>> WTF?
>
> I also had to add --force to create the array with one raid5.
>
>
>>> Now I copy the content from the old raid5 with the four 1.5 TB
>>> disks to the new linear md0.
>>
>> Shuffleboard... You didn't previously make clear that not all 8 disks
>> were freely available to build your stack from the ground up. The
>> instructions I gave you assumed that all 8 drives were clean. Now
>> you're attempting to modify the precise instructions I gave you and
>> play shuffleboard with your data and disks, attempting to migrate on
>> the fly.
>
> Sorry, I wrote this too in another thread :-(
> I like taking some risk :-)
>
> But since the old raid isn't written only read I didn't fear to loose
> the data.

No, no, no risk of losing the data. The problem is with the resulting
XFS AG layout you got from this procedure, as I mentioned in the other
reply. Everything will still work, assuming mdadm will allow you to add
another array after having to use --force to create the linear. I've
never tried creating a linear array with one device before. If that
works, and the xfs grow works, you should be ok, just with less performance.

>
>> This may not have a good outcome. I guess you feel that you
>> understand this stuff and are confident in your ability at this point
>> to effect the outcome you desire. If things break badly, I'll try to
>> assist, but I make no promises WRT outcomes nor guarantee my
>> availability.
>
> I wanted to finally do something ;-)

I was being a bit dramatic there, frustration showing I guess. ;) Like
I said, should be ok, if mdadm doesn't puke adding the other array in.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FD9A2B7...@hardwarefreak.com

Ramon Hofer

unread,
Jun 14, 2012, 1:10:02 PM6/14/12
to
On Thu, 14 Jun 2012 03:29:25 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 6/13/2012 2:22 PM, Ramon Hofer wrote:
> > On Tue, 12 Jun 2012 17:30:43 -0500
> > Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
> This chain is so long I'm going to liberally snip lots of stuff
> already covered. Hope that's ok.

Sure. Your mail still blew my mind :-)


> >> This is incorrect advice, as it occurs with the LSI BIOS both
> >> enabled and disabled. Apparently you didn't convey this in your
> >> email.
> >
> > I will write it to them again.
> > But to be honest I think I'll leave the Supermicro and use it for my
> > Desktop.
>
> If you're happy with an Asus+LSI server and SuperMicro PC, and it all
> works the way you want, I'd not bother with further troubleshooting
> either.

Well the only differences are:

1. Can't enter the LSI BIOS to set up hw raid which I don't need to. So
no problem.

2. I can't see the network activity leds in the front of the case.
Which is a gadget I don't really need. If there are problems I can look
at the mobo leds if there's lan activity. So no problem too.


> >> Building md arrays from partitions on disks is a means to an end.
> >> Do you have an end that requires these means? If not, don't use
> >> partitions. The biggest reason to NOT use partitions is
> >> misalignment on advanced format drives. The partitioning
> >> utilities shipped with Squeeze, AFAIK, don't do automatic
> >> alignment on AF drives.
> >
> > Ok, I was just confused because most the tutorials (or at least
> > most of the ones I found) use partitions over the whole disk...
>
> Most of the md tutorials were written long before AF drives became
> widespread, which has been a relatively recent phenomenon, the last 2
> years or so.

AF drives are Advanced Format drives with more than 512 bytes per
sector right?


> > I must have read outdated wikis (mostly from the mythtv project).
>
> Trust NASA more than MythTV users? From:
> http://www.nas.nasa.gov/hecc/resources/columbia.html

I don't trust anybody ;-)


> Storage
> Online: DataDirect Networks® and LSI® RAID, 800 TB (raw)
> ...
> Local SGI XFS
>
> That 800TB is carved up into a handful of multi-hundred TB XFS
> filesystems. It's mostly used for scratch space during sim runs.
> They have a multi-petabyte CXFS filesystem for site wide archival
> storage. NASA is but one of many sites with multi-hundred TB XFS
> filesystems spanning hundreds of disk drives.
>
> IBM unofficially abandoned GFS on Linux, which is why it hasn't seen a
> feature release since 2004. Enhanced JFS, called JFS2, is
> proprietary, and is only available on IBM pSeries servers.
>
> MythTV users running JFS are simply unaware of these facts, and use
> JFS because it still works for them, and that's great. Choice and
> freedom are good things. But if they're stating it's better than XFS
> they're hitting the crack pipe too often. ;)

Here's what I was referring to:
http://www.mythtv.org/docs/mythtv-HOWTO-3.html

"Filesystems

MythTV creates large files, many in excess of 4GB. You must use a 64 or
128 bit filesystem. These will allow you to create large files.
Filesystems known to have problems with large files are FAT (all
versions), and ReiserFS (versions 3 and 4).

Because MythTV creates very large files, a filesystem that does well at
deleting them is important. Numerous benchmarks show that XFS and JFS
do very well at this task. You are strongly encouraged to consider one
of these for your MythTV filesystem. JFS is the absolute best at
deletion, so you may want to try it if XFS gives you problems. MythTV
incorporates a "slow delete" feature, which progressively shrinks the
file rather than attempting to delete it all at once, so if you're more
comfortable with a filesystem such as ext3 (whose delete performance
for large files isn't that good) you may use it rather than one of the
known-good high-performance file systems. There are other ramifications
to using XFS and JFS - neither offer the opportunity to shrink a
filesystem; they may only be expanded.

NOTE: You must not use ReiserFS v3 for your recordings. You will get
corrupted recordings if you do.

Because of the size of the MythTV files, it may be useful to plan for
future expansion right from the beginning. If your case and power
supply have the capacity for additional hard drives, read through the
Advanced Partition Formatting sections for some pointers."


So they say it's about the same. But this page must be at least some
years old without any changes at least in this paragraph.

I additionally found a foum post from four years ago were someone
states that xfs has problems with interrupted power supply:
http://www.linuxquestions.org/questions/linux-general-1/xfs-or-jfs-685745/#post3352854

"I only advise XFS if you have any means to guarantee uninterrupted
power supply. It's not the most resistant fs when it comes to power
outages."

I usually don't have blackouts. At least as long that the PC turn off.
But I don't have a UPS.


> > Ok if I read it right it divides the array into 11 allocation
> > groups, with 131072 byte blocks and 3 stripe units as stripe width.
> > But where do you know what numbers to use?
> > Maybe I didn't read the man carefully enough then I'd like to
> > appologize :-)
>
> 'man mkfs.xfs' won't tell you how to calculate how many AGs you need.
> mkfs.xfs creates agcount and agsize automatically using an internal
> formula unless you manually specify valid values. Though I can tell
> you how it works. Note: the current max agsize=1TB

This is very interesting. I hope I get everything right :-)


> 1. Defaults to 4 AGs if the device is < 4TB and not a single level
> md striped array. This is done with singe disks, linear arrays,
> hardware RAIDs, SANs. Linux/XFS have no standard interface to
> query hardware RAID device parms. There's been talk of an
> industry standard interface but no publication/implementation.
> So for hardware RAID may need to set some parms manually for best
> performance. You can always use mkfs.xfs defaults and it will
> work. You simply don't get all the performance of the hardware.

I will get better performance if I have the correct parameters.


> 2. If device is a single level md striped array, AGs=16, unless the
> device size is > 16TB. In that case AGs=device_size/1TB.

A single level md striped array is any linux raid containing disks.
Like my raid5.
In contrast would be my linear raid containing one or more raids?


> 3. What 'man mkfs.xfs' does tell you is how to manually configure the
> stripe parms. It's easy. You match the underlying RAID parms.
> E.g. 16 drive RAID 10 with 64KB chunk. RAID 10 has n/2 stripe
> spindles. 16/2 = 8
>
> ~$ mkfs.xfs -d su=64k,sw=8 /dev/sda
>
> E.g. 8 drive RAID6 with 128KB chunk. RAID6 has n-2 stripe
> spindles. 8-2 = 6
>
> ~$ mkfs.xfs -d su=128k,sw=6 /dev/sda
>
> E.g. 3 drive RAID5 with 256KB chunk. RAID5 has n-1 stripe
> spindles. 3-1 = 2
>
> ~$ mkfs.xfs -d su=256k,sw=2 /dev/sda
>
> The above are basic examples and we're letting mkfs.xfs choose the
> number of AGs based on total capacity. You typically only specify
> agcount or agsize manually in advanced configurations when you're
> tuning XFS to a storage architecture for a very specific application
> workload, such as a high IOPS maildir server. I've posted examples
> of this advanced storage architectures and mkfs.xfs previously on the
> dovecot and XFS lists if you care to search for them. In them I show
> how to calculate a custom agcount to precisely match the workload IO
> pattern to each disk spindle, using strictly allocation group layout
> to achieve full workload concurrency without any disk striping, only
> mirroring.

Ok, the chunck (=stripe) size is already set 128 kB when creating the
raid5 with the command you provided earlier:

~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]

Then the mkfs.xfs parameters are adapted to this.


> >> The advanced configuration I previously gave you allows for one
> >> large XFS across all your arrays. mkfs.xfs is not able to map out
> >> the complex storage geometry of nested arrays automatically, which
> >> is why I lifted the lid and showed you the advanced configuration.
> >
> > Ok, this is very nice!
> > But will it also work for any disk size (1.5, 2 and 3 TB drives)?
>
> All of the disks in each md array should to be the same size,
> preferably identical disks from the same vendor, for the best
> outcome. But each array can use different size disks, such as what
> you have now. One array of 4x1.5TB, another array of 4x2TB. Your
> next array could be 4x1TB or 4x3TB. You could go with more or fewer
> drives per array, but if you do it will badly hose your xfs stripe
> alignment, and performance to the new array will be so horrible that
> you will notice it, big time, even though you need no performance.
> Stick to adding sets of 4 drives with the same md RAID5 parms and
> you'll be happy. Deviate from that, and you'll be very sad, ask me
> for help, and then I'll be angry, as it's impossible to undo this
> design and start over. This isn't unique to XFS.

I'll try not to make you angry :-)


> >> chunk size = 128KB
> >
> > This is what I don't know.
> > Is this a characteristic of the disk?
>
> No. I chose this based on your workload description. The mdadm
> default is 64KB. Different workloads work better with different
> chunk sizes. There is no book or table with headings "workload" and
> "chunk size" to look at. People who set a manual chunk/strip size
> either have a lot of storage education, either self or formal, or
> they make an educated guess--or both. Multi-streaming video capture
> to high capacity drives typically works best with an intermediate
> strip/chunk size with few stripe members in the array. If you had 8
> drives per array I'd have left it at 64KB, the default. I'm sure you
> can find many recommendations on strip/stripe size in the MythTV
> forums. They may vary widely, but if you read enough posts you'll
> find a rough consensus. And it may even contradict what I've
> recommended. I've never used MythTV. My recommendation is based on
> general low level IO for streaming video.

Ok, cool!
Probably some time I will understand how to choose chunck sizes. In the
meantime I will just be happy with the number you provided :-)

Btw: I wasn't clear about mythtv. For the recordings I don't use the
raid. I have another disk just for it.
Everyone recommends to not use raids for the recordings. But to be
honest I don't remember the reaosn anymore :-(

The raid is used for my music and video collection. Of course
everything is owned by me and backed up to disk.

And I also use the raid for backups of the mythtv database and many
other backups.

But far by most it's used to stream multimedia content. So the backups
can be neglected.


> > Just another question: The linear raid will distribute the data to
> > the containing raid5 arrays?
>
> Unfortunately you jumped the gun and created your XFS atop a single
> array, but with the agcount I gave you for the two arrays combined.
> As I mentioned in a previous reply (which was off list I think), you
> now have too many AGs. To answer your question, the first dir you
> make is created in AG0, the second dir in AG1, and so on, until you
> hit AG11. The next dir you make will be in AG0 and cycle begins anew.
>
> Since you're copying massive dir counts and files to the XFS, your
> files aren't being spread across all 6 drives of two RAID5s. Once
> you've copied all the data over, wipe those 1.5s, create an md RAID5,
> grow them into the linear array, and grow XFS, only new dirs and
> files you create AFTER the grow operation will be able to hit the new
> set of 3 disks. On top of that, because your agcount is way too
> high, XFS will continue creating new dirs and files in the original
> RAID5 array until it fills up. At that point it will write all new
> stuff to the second RAID5.
>
> This may not be a problem as you said your performance needs are very
> low. But that's not the way I designed it for you. I was working
> under the assumption you would have both RAID5s available from the
> beginning. If that had been so, your dirs/files would have been
> spread fairly evenly over all 6 disks of the two RAID5 arrays, and
> only the 3rd future array would get an unbalanced share.

This may really be no problem. But when I have an expert at hand and
starting the storage from scratch I want to do it right :-)

I stopped the copy process and will create the xfs again with the
correct number of ags. Would 6 be a good number for the linear array
containing the one raid5 with 4x 2TB disks?

The xfs seems really intelligent. So it spreads the load if it can but
it won't copy everything around when a new disk or in my case raid5 is
added?
This is very convincing.
But I thought that a green drive lives at least as long as a normal
drive or even longer because it *should* wear less because it's more
often asleep. If this assumption would have been correct than the same
amount of energy would have been used to produce the disks but during
operation less energy would have been used and they would have had to
be replaced fewer so the production energy would have had to be spent
more unfrequently.
If all of this would have been true than I would be willing to pay the
price of less performance and higher raid problem rate.

But I believe you that the disks don't live as long as normal drives.
So everything is different and I won't buy green drives again :-)


> >>> Would the Seagate Barracuda
> >>> 3TB disks be a better choise?
> >>
> >> Is your 10.5TB full already? You don't even have the system
> >> running yet...
> >
> > No, but I like living in the future ;-)
>
> It may be 2-3 years before you need new drives. All the current
> models and their reputations will have changed by then. Ask a few
> months before your next drive purchase. Now is too early.

True.
I will gladly do :-)


> > I'm planning to keep the drives I have now and add 4x 3TB into the
> > mix.
>
> The more flashing LEDs the better. :) Fill-r-up.

Maybe I will solder a flash light for the LAN LEDs in the front of the
case too :-D


> > Again thanks alot for all your help and your patience with me.
> > Certainly not always easy ;-)
>
> You're very welcome Ramon.
>
> Nah, if I've seemed short of frustrated at times, that's my fault, not
> yours. I'm glad to help. And besides, I was obliged to because the
> unique hardware combo I personally, specifically, recommended wasn't
> working for you, in a mobo it *should* work in. And I recommended the
> md linear over stripe w/XFS, and nobody here would have been able to
> help you with that either. But in the end you're going to have
> something unique, fits your needs, performs well enough, and is built
> from best of breed hardware and software. So you can be proud of this
> solution, show it off it you have like minded friends. And you can
> tell them you're running what NASA supercomputers run. ;)
>
> Sorry I didn't meet my goal of making this shorter than previous
> replies. ;)

I never was frustrated because of your help. If I was a unhappy it was
only because of my missing knowledge and luck.

If you weren't here to suggest things and help me I would have ended up
with a case that I couldn't use in the worst case. Or one that eats my
data (because of the Supermicro AOC-SASLP-MV8 controllers I initially
had).

In the end I'm very happy and proud of my system. Of course I show it
to my friends and they are jealous for sure :-)


So thanks very much again and please let me know how I can buy you a
beer or two!


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120614150212.647f14dd@nb-10114

Ramon Hofer

unread,
Jun 14, 2012, 2:50:03 PM6/14/12
to
On Thu, 14 Jun 2012 08:38:27 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> Couldn't hurt. And while you're at it, mount with "inode64" in your
> fstab immediately after you create the XFS. You were running with
> inode32, which sticks all the inodes at the front of AG0 causing lots
> of seeks. Inode64 puts file/dir inodes in the AG where the file gets
> written. In short, inode64 is more efficient for most workloads. And
> this is also why getting the agcount correct is so critical with
> tiered linear/striped parity setups such as this.
>
> When you recreate the XFS use 'agcount=6'. That's the smallest you
> can go with 2TB disks. A force will be required since you already
> have an XFS on the device.

Sorry I haven't much time now. I'm invoted to a BBQ and already
hungry :-)

I just wanted to create the filesystem and start to copy the files.

So I tried and got this warning:

~$ sudo mkfs.xfs -f -d agcount=6,su=131072,sw=3 /dev/md0
Warning: AG size is a multiple of stripe width. This can cause
performance problems by aligning all AGs on the same disk. To avoid
this, run mkfs with an AG size that is one stripe unit smaller, for
example 244189120.

Should I take this seriously?


Btw: Should I mount every xfs filesystem (also the one for the mythtv
recordings) with inode64.
This is not true for the smaller ext4 filesystems I use for the os and
the home dir I suppose?


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120614164...@hoferr-x61s.hofer.rummelring

Stan Hoeppner

unread,
Jun 15, 2012, 8:50:02 AM6/15/12
to
On 6/14/2012 9:45 AM, Ramon Hofer wrote:
> On Thu, 14 Jun 2012 08:38:27 -0500
> Stan Hoeppner <st...@hardwarefreak.com> wrote:
>
>> Couldn't hurt. And while you're at it, mount with "inode64" in your
>> fstab immediately after you create the XFS. You were running with
>> inode32, which sticks all the inodes at the front of AG0 causing lots
>> of seeks. Inode64 puts file/dir inodes in the AG where the file gets
>> written. In short, inode64 is more efficient for most workloads. And
>> this is also why getting the agcount correct is so critical with
>> tiered linear/striped parity setups such as this.
>>
>> When you recreate the XFS use 'agcount=6'. That's the smallest you
>> can go with 2TB disks. A force will be required since you already
>> have an XFS on the device.
>
> Sorry I haven't much time now. I'm invoted to a BBQ and already
> hungry :-)
>
> I just wanted to create the filesystem and start to copy the files.
>
> So I tried and got this warning:
>
> ~$ sudo mkfs.xfs -f -d agcount=6,su=131072,sw=3 /dev/md0
> Warning: AG size is a multiple of stripe width. This can cause
> performance problems by aligning all AGs on the same disk. To avoid
> this, run mkfs with an AG size that is one stripe unit smaller, for
> example 244189120.

Grr. This is another reason it is preferable to create the XFS atop the
linear array with both RAIDs already present, from the beginning, which
would allow the proper 11 AGs, and proper placement of them.

> Should I take this seriously?

This is a valid warning and relates to metadata performance, which is
important for everyday use. So yeah, you should take it seriously. So
what you should do now is, instead of making another attempt and
manually setting 7 AGs, just leave out that parm and let mkfs pick the
agcount/agsize on its own. It will likely choose 7, but it may choose
more. The fewer the better with 3 slow disks in this RAID5. mkfs.xfs
doesn't take spindle speed into account, which is why I usually set
parms manually, to best fit the storage hardware.

> Btw: Should I mount every xfs filesystem (also the one for the mythtv
> recordings) with inode64.

Yes. Especially with XFS atop a linear array. The inode64 allocator
spreads directory and file metadata, and files relatively evenly across
all AGs, providing better locality between files and their metadata.
This improves performance for most workloads.

Inode32, the default allocator, puts all directory and file metadata in
AG0, so you end up with a hotspot, causing excessive disk seeking on the
first RAID5 (which is where AG0 is) in the linear array.

Inode64 will be the XFS default in the not too distant future. It would
have been so already, but there are still some key applications in
production, namely some enterprise backup applications, that don't
understand 64bit inode numbers. This is the only reason inode32 is
still the default. Note than with 32bit Linux kernels you are limited
to inode32. So make sure you're running an x64 kernel, which IIRC, you are.

Note that for any XFS filesystem greater than 16TB, you must use the
inode64 allocator as inode32 is limited to 16TB (and again you need an
x64 kernel). In your case you will be continuously expanding your XFS as
you add more 4 drive arrays in the future. Once you add your 4x1.5TB
drives you'll be at 10.4TB. When you add 3x4TB drives your XFS will hit
19.5TB. It's best to already be using inode64 when you go over the 16TB
limit to avoid problems.

> This is not true for the smaller ext4 filesystems I use for the os and
> the home dir I suppose?

No, the inode64 mount option is unique to XFS. It simply tells the XFS
kernel driver to use the inoe64 code path instead the inode32 code path
for a given XFS filesystem, in essence passing a 0 or 1 to an XFS
variable. You can mount multiple XFS filesystems on one machine, some
with inode32 and others with inode64. See: 'man mount' XFS is waay
down at the bottom. Note that it's possible, but not advisable, to
change the inodeXX mount option after the filesystem has some "age" on
it. Pick the right one from the start and stick with it. This is
usually inode64. There are some unique workload cases where a highly
tweaked inode32 filesystem <16TB has a performance advantage, but your
workloads aren't such cases.

And make sure you're using linux-image-3.2.0-0.bpo.2-amd64 so you have
all the latest XFS features and fixes, mainly the delayed logging code
turned on by default.

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FDAF519...@hardwarefreak.com

Stan Hoeppner

unread,
Jun 15, 2012, 5:50:01 PM6/15/12
to
On 6/14/2012 8:02 AM, Ramon Hofer wrote:

> AF drives are Advanced Format drives with more than 512 bytes per
> sector right?

Correct. Advanced Format is the industry wide name chosen for drives
that have 4096B physical sectors, but present 512B sectors at the
interface level, doing translation internally, "transparently".

> I don't trust anybody ;-)

Good for you! :)

> Here's what I was referring to:
> http://www.mythtv.org/docs/mythtv-HOWTO-3.html

> JFS is the absolute best at
> deletion, so you may want to try it if XFS gives you problems.

Interesting. Lets see:

~$ time dd if=/dev/zero of=myth-test bs=8192 count=512000
512000+0 records in
512000+0 records out
4194304000 bytes (4.2 GB) copied, 50.1455 s, 83.6 MB/s

real 0m50.167s
user 0m1.560s
sys 0m43.915s

-rw-r--r-- 1 root root 4.0G Jun 15 04:52 myth-test

~$ echo 3 > /proc/sys/vm/drop_caches
~$ time rm myth-test; sync

real 0m0.027s
user 0m0.000s
sys 0m0.004s

XFS and the kernel block layer required 4ms to perform the 4GB file
delete. The disk access required 23ms. What does this say about the
JFS claim? I simply don't get the "if XFS gives you problems" bit. The
author was obviously nothing close to a filesystem expert.


>
> I additionally found a foum post from four years ago were someone
> states that xfs has problems with interrupted power supply:
> http://www.linuxquestions.org/questions/linux-general-1/xfs-or-jfs-685745/#post3352854

"I found a forum post from 4 years ago"

Myths, lies, and fairy tales. There was an XFS bug related to power
fail that was fixed over a year before this forum post was made. Note
that nobody in that thread posts anything from the authoritative source,
as I do here?

http://www.xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_see_binary_NULLS_in_some_files_after_recovery_when_I_unplugged_the_power.3F

> "I only advise XFS if you have any means to guarantee uninterrupted
> power supply. It's not the most resistant fs when it comes to power
> outages."

I advise using a computer only if you have a UPS, no matter what
filesystem you use. It's incredibe that this guy would make such a
statement, instead of promoting the use of UPS devices. Abrupt power
loss, or worse, voltage "bumping" which often accompanies brown
conditions, is not good for any computer equipment, especially PSUs and
mechanical hard drives, regardless of what filesystem one uses.

The only data lost due to power failure is inflight write data. The
vast majority of that is going to be due to Linux buffer cache. No
matter what FS you use, if you're writing, especially a large file, when
power dies the write has failed and you've lost that file. EXT3 was a
bit more "resilient" to power loss because of a bug, not a design goal.
The same bug caused horrible performance with some workloads because of
the excessive hard coded syncs.

> I usually don't have blackouts. At least as long that the PC turn off.
> But I don't have a UPS.

Get one. Best investment you'll ever make computer-wise. For your
Norco, we'll assume all 20 bays are filled for sizing purposes. One of
these should be large enough to run your server and your desktop:

http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=BR900G-GR&total_watts=200
http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=BR900GI&total_watts=200

(Sorry if I mis guessed your native language as German instead of French
or Italian) I listed both units as I don't know which power plug
configuration you need.

If these UPS seem expensive, consider the fact that they may continue
working for 20+ years. I bought my home office APC SU1400RMNET used in
2003 for US $250 ($1000+ new) after it had been in corporate service for
3 years on lease. It's at least 12 years old and I've been running it
for 9 years continuously. I've replaced the batteries ($80) twice,
about every 4 years. Buying this unit used, at a steal of a price, is
one of the best investments I ever made. I expect it to last at least
another 8 years, if not more.


> I will get better performance if I have the correct parameters.

Yes.

>
>> 2. If device is a single level md striped array, AGs=16, unless the
>> device size is > 16TB. In that case AGs=device_size/1TB.
>
> A single level md striped array is any linux raid containing disks.
> Like my raid5.

I use "single level" simply to differentiate from a nested array, which
is multi-level.

> In contrast would be my linear raid containing one or more raids?

This is called a "nested" array. The term comes from "nested loop" in
programming.

> Ok, the chunck (=stripe)

chunk = "strip", not "stripe"

"Chunk" and "strip" are two words for the same thing. Linux md uses the
term "chunk". LSI and other hardware vendors use the term "strip".
They describe the amount of data written to an individual array disk
during a striped write operation. Stripe is equal to all of the
chunks/strips added together.

E.g. A 16 disk RAID10 has 8 stripe spindles (8 are mirrors). Each
spindle has a chunk/strip size of 64KB. 8*64KB = 512KB. So the
"stripe" size is 512KB.

> size is already set 128 kB when creating the
> raid5 with the command you provided earlier:
>
> ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]
>
> Then the mkfs.xfs parameters are adapted to this.

Correct. If you were just doing a single level RAID5 array, and not
nesting it into a linear array, mkfs.xfs would read the md RAID5 parms
and do all of this stuff automatically. It doesn't if you nest a linear
array on top, as we have.

> I'll try not to make you angry :-)

I'm not Bruce Banner, so don't worry. ;)

> Ok, cool!
> Probably some time I will understand how to choose chunck sizes. In the
> meantime I will just be happy with the number you provided :-)

For your target workloads, finding the "perfect" chunk size isn't
critical. What is critical is aligning XFS to the array geometry, and
the array to the AF disk geometry, which is, again, why I recommended
using bare disks, no partitions.

> Btw: I wasn't clear about mythtv. For the recordings I don't use the
> raid. I have another disk just for it.
> Everyone recommends to not use raids for the recordings. But to be
> honest I don't remember the reaosn anymore :-(

I've never used MythTV, but it probably has to do with the fact that
most MythTV users have 3-4 slow green SATA drives on mobo SATA ports
using md RAID5 with the default CFQ elevator. Not a great combo for
doing multiple concurrent read/write A/V streams.

Using a $300-400 USD 4-8 port RAID controller with 512MB write cache,
4-8 enterprise 7.2k SATA drives in RAID5, and the noop or deadline
elevator allows one to do multiple easily. So does using twice as many
7.2k drives in software RAID10 with deadline. Both are far more
expensive than simply adding one standalone drive for recording.

>> On top of that, because your agcount is way too
>> high, XFS will continue creating new dirs and files in the original
>> RAID5 array until it fills up. At that point it will write all new
>> stuff to the second RAID5.

I should have been more clear above. Directories and files would be
written to AGs on *both* RAID%s until the first one filled up, then
everything would go to AGs on the 2nd RAID5. Above it sounds like the
2nd RAID5 wouldn't be used until the first one filled up, and that's not
the case.

> The xfs seems really intelligent. So it spreads the load if it can but
> it won't copy everything around when a new disk or in my case raid5 is
> added?

Correct. But it's not "spreading the load". It's simply distributing
new directory creation across all available AGs in a round robin
fashion. When you grow the XFS, it creates new AGs on the new disk
device. After that it simply does what it always does, distributing new
directory creation across all AGs until some AGs fill up. This behavior
is more static that adaptive, so it's not really all that intelligent.
The design is definitely intelligent, and it's one of the primary
reasons XFS has such great parallel performance.


> But I thought that a green drive lives at least as long as a normal

With the first series of WD Green drives this wasn't the case. They had
a much higher failure rate. Newer generations are probably much better.
And, all of the manufacturers are adding smart power management
features to most of their consumer drive lines.

> drive or even longer because it *should* wear less because it's more
> often asleep.

The problem is what is called "thermal cycling". When the spindle motor
is spinning the platters at 5-7K RPMS and then shuts down for 30 seconds
or more, and then spins up again, the bearings expand and shrink, expand
and shrink, very slightly, fractions of a millimeter. But this is
enough to cause premature bearing wobble, which affects head flying
height, and thus problems with reads/writes, yielding sector errors (bad
blocks). This excess bearing wear over time can cause the drive to fail
prematurely if the heads begin impacting the platter surface, which is
common when bearings develop sufficient wobble.

Long before many people on this list were born, systems managers
discovered that drives lasted much longer if left running 24x7x365,
which eliminated thermal cycling. It's better for drives to run "hot"
all the time than to power them down over night and up the next day. 15
years ago, constant running would extend drive live by up to 5 years.
With the tighter tolerances of today's drives you may not gain that
much. I leave all of my drives running and disable all power savings
features on all my systems. I had a pair of 9GB Seagate Barracuda SCSI
drives that were still running strong after 14 years of continuous 7.2k
RPM service when I decommissioned the machine. They probably won't spin
up now that they've been in storage for many years.

> If all of this would have been true than I would be willing to pay the
> price of less performance and higher raid problem rate.

Throttling an idle CPU down to half its normal frequency saves more
electricity than spinning down your hard drives, until you have 10 or
more, and that depends on which CPU you have. If it's a 130w Intel
burner, it'll be more like 15 drives.

> But I believe you that the disks don't live as long as normal drives.
> So everything is different and I won't buy green drives again :-)

I'd "play it by ear". These problems may have been worked out on the
newer "green" drives. Bearings can be built to survive this more rapid
thermal cycling, those on vehicle wheels do it daily. Once they get the
bearings right, these drives should last just as long.

> Maybe I will solder a flash light for the LAN LEDs in the front of the
> case too :-D

Just look at the LEDs on the switch its plugged into. If the switch is
on the other side of the room, by a mini switch and set it on top. $10
for a 10/100 and $20 for a GbE switch. USD.

> I never was frustrated because of your help. If I was a unhappy it was
> only because of my missing knowledge and luck.

Well your luck should have changed for the better. You've got all good
quality gear now and it should continue to work well together, barring
future bugs introduced in the kernel.

> If you weren't here to suggest things and help me I would have ended up
> with a case that I couldn't use in the worst case. Or one that eats my
> data (because of the Supermicro AOC-SASLP-MV8 controllers I initially
> had).

That controller has caused so many problems for Linux users I cannot
believe SM hasn't put a big warning on their site, or simply stopped
selling it, replacing it with something that works. Almost of their
gear is simply awesome. This one board gives SM a black eye.

> In the end I'm very happy and proud of my system. Of course I show it
> to my friends and they are jealous for sure :-)

That's great. :)

> So thanks very much again and please let me know how I can buy you a
> beer or two!

As always, you're welcome. And sure, feel free to donate to my beer
fund. ;)

--
Stan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4FDB73D...@hardwarefreak.com

Ramon Hofer

unread,
Jun 15, 2012, 7:40:03 PM6/15/12
to
On Thu, 14 Jun 2012 08:38:27 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 6/14/2012 4:51 AM, Ramon Hofer wrote:
>
> >> These commands don't match the pastebin. The pastebin shows you
> >> creating a 4 disk RAID5 as /dev/md0.
> >
> > Really :-?
>
> That kind of (wrong) analysis is one of the many outcomes of severe
> lack of sleep, too much to do, and not enough time. ;) Having 3
> response/reply chains going for the same project doesn't help either.
> We share fault on that one: you sent 3 emails before I replied to the
> first. I replied to all 3 in succession instead of consolidating all
> 3 into one response. Normally I'd do that. Here I simply didn't
> have the time. So in the future, with me or anyone else, please keep
> it to one response/reply. :) Cuts down on the confusion and overlap
> of thoughts.

Ok I will.
I'm still learning the code of conduct for mailing lists ;-)

I then start right now.


First of all I tried to set the raid5 with the WD 20EARS and didn't
have much luck. They led to fail events when mdadm builds the array.
They "worked" in my Netgear NV+ with very low r/w rates <5 MB/s (which
I now assume is because of the disks.

That's why I'm already thinking of buying new disks.

I have found these drives at my local dealer (the prices are in Swiss
Francs).

2 TB:
- Seagate Barracuda 2TB, 7200rpm, 64MB, 2TB, SATA-3 (129.-)
- Seagate ST2000DL004/HD204UI, 5400rpm, 32MB, 2TB, SATA-II (129.-)

3 TB:
- Seagate Barracuda 3TB, 7200rpm, 64MB, 3TB, SATA-3 (179.-)

I think the Seagate Barracuda 3TB are the best value for money and I
didn't find any problems that could prevent me from using them as raid
drives.

Btw. When I tried to set up the WD20EARS mdstat told me that the
syncing would take about 6 hours. Hopefully the Barracudas have at
least the same rate. Then the process would be finished on maybe less
than 9 hours. This seems to be acceptable for my case.


> Also, please note that with 2TB drives, the throughput will decrease
> dramatically as you fill the disks. If you're copying over 3-4TB of
> files, a write rate of 20-30MB/s at the end of the copy process should
> be expected, as you're now writing to the far inner tracks, which have
> 1/8th or so the diameter of the outer tracks. Aerial density * track
> (cylinder) length * spindle RPM = data rate. The aerial density and
> RPM are constants.

So if I see low rates in the future I can add a new raid5 and get
higher throughbput again because the linear raid would write first to
the new array?


> > Now I only have to setup the details correctly.
> > Like the agcount...
>
> Like I said, it may not make a huge difference, at least when the XFS
> is new, fresh. But at it ages (write/delete/write) over time, the
> wonky agcount could hurt performance badly. You balked at that
> 20MB/s rate which is actually normal. With XFS parms incorrect, a
> year from now you could be seeing max 50MB/s and min 5MB/s. Yeah,
> ouch.

Another reason to set it up properly now :-)


> > You really were an incredible help!
>
> When I'm not such a zombie that I misread stuff, yeah, maybe a little
> help. ;)

No really. The adventure of enlarging my media server would have ended
in total frustration!



--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120615213...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 16, 2012, 2:20:01 AM6/16/12
to
On Fri, 15 Jun 2012 16:40:56 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> On 6/15/2012 8:36 AM, Ramon Hofer wrote:
>
> > First of all I tried to set the raid5 with the WD 20EARS and didn't
> > have much luck. They led to fail events when mdadm builds the array.
> > They "worked" in my Netgear NV+ with very low r/w rates <5 MB/s
> > (which I now assume is because of the disks.
>
> Ok, I'm confused. You had stated you currently have 4x2TB disks and
> 4x1.5TB disks. WD20EARS are 2TB disks. You said you'd already
> created a RAID5 and added to a linear array, then copied a bunch of
> files from the 1.5TB array, these 1.5TB disks presumably in the
> Netgear. Is this correct? Is the md RAID5 inside the linear array
> still working? Which disks is it made of?

Ok, sorry for the confusion.
The four 2 TB WD green were in the Netgear NAS and the four 1.5 TB
Samsung are in the old raid5 (md9).
I removed the 2 TB disks from the NAS and mounted them in the Norco and
connected to the server vio lsi and expander. On these WD drives I
created the raid5 (md1) and on top of that the linear array (md0).
Upon creation of md1 the fourth disk (sdd) was added as a spare which I
had to add manually by setting

mdadm --readwrite /dev/md1

While it was syncing the disks I copied the files from md9 to md0.
During this proces sdb was set as faulty.


> > That's why I'm already thinking of buying new disks.
>
> Well lets look at this more closely. The disks may not be bad. How
> old are they? Send me your dmesg output:
>
> ~$ cp /var/log/dmesg /tmp/dmesg.txt
>
> then email dmesg.txt to me.

I've uploaded dmesg to pastebin hope this is ok.

http://pastebin.com/raw.php?i=dek1wca4


> The WD Black 2TB 7.2k is tested for desktop RAID use (Linux md) and
> has a 5 year warranty, costs $210. The Seagate Barracuda TX 2TB 7.2k
> is also tested for desktop RAID use, has a 3 year warranty, and costs
> $210.
>
> My advice: spend more per drive for less capacity and get a 3/5 times
> longer warranty, and a little piece of mind that the drives are
> designed/tested for RAID use and will last at least 5 years, or be
> replaced at no cost for up to 5 years.

Great advice!
I'll go for the WD Black 2TB. I found them for CHF 199.-


> > So if I see low rates in the future I can add a new raid5 and get
> > higher throughbput again because the linear raid would write first
> > to the new array?
>
> I'm not sure what you're asking here. Adding a new 4 disk RAID5 to
> the linear array doesn't make anything inherently faster. It simply
> adds capacity. Your read/write speed on a per file basis will be
> about the same in the "new space" as in the old. I explained all of
> this before you made the decision to go with the linear route instead
> of using md reshaping to expand. You said you understood and that it
> was fine as your performance requirements are low.

Yes sorry it's absolutely fine. I was just curious because you wrote
"when the array fills up it gets slower". So I thought when I add four
new disks I'll get free space added and the linear array won't be
filled anymore as much as before and so it could regain it's previous
speed again.

But really not important for my case!
Just curiosity ;-)



>
> > No really. The adventure of enlarging my media server would have
> > ended in total frustration!
>
> There's still time for frustration--you're not done quite yet. lol

Yes but now I'm in semi known territory ;-)


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120616041...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 16, 2012, 11:10:03 AM6/16/12
to
On Fri, 15 Jun 2012 16:40:56 -0500
Stan Hoeppner <st...@hardwarefreak.com> wrote:

> Well lets look at this more closely. The disks may not be bad. How
> old are they? Send me your dmesg output:

Sorry I forgot to write the last time: The WD20EARS I have bough
between 14. Dec 2010 and 01. Oct 2011.

Maybe it was also caused by inappripriate cooling.
I'm copying the things from the old raid md9 to the new linear array
while have the old disks in the old case and still directly attached to
the mobo.
The new raid disks are already in the Nroco case. They're attached over
the expander to the LSI which is in the Asus mobo mounted in the old
case.
The expander and all of the disks are powered from the same PSU which
powers the mobo etc.

I had to do this because the sata cables are too short to mount the
mobo in the Norco. Unfortunately I can't connect the fans from the
Norco because these wires are too short as well. But I thought just to
copy things over and having only these four disks in the Norco it would
be ok :-?

Do you think this could cause the problem?


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120616130...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 17, 2012, 10:50:02 PM6/17/12
to
I'm again having problems with the disks getting kicked out of the
array :-o

First of all the old WD green 2TB disk which was marked failed also
makes problems in the Netgear ReadyNas. I will see if I still have
warranty and try to get a new one.

But the other issue scares me a bit ;-)

Here's what I've done so far:

Yesterday I had setup md1 with the four new WD black 2TB disks
~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]
~$ mdadm --readwrite /dev/md

I created md0 with md1 as a linear array
~$ mdadm -C /dev/md0 --force -n1 -l linear /dev/md1

On md0 I created the xfs filesystem
~$ mkfs.xfs -d agcount=7,su=131072,sw=3 /dev/md0

Then I copied everything from the old md9 raid5 with the Samsung 1.5TB
to md0.

Today I shut the server down and mounted the mobo, os hdd, the Samsung
1.5 TB drives from the old md9 hdds and the mythtv recording hdd to the
Norco.
Everything went well. I mounted the expander to the case wall and fixed
the cables to stay in place.

Then I booted up again and created md2 with the four Samsung 1.5TB disks
~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]
~$ mdadm --readwrite /dev/md2

After this I expanded the linear array
~$ mdadm --grow /dev/md0 --add /dev/md2

and the filesystem
~$ xfs_growfs /mnt/media-raid

All this went well too.

But this evening I got 10 emails from mdadm. I've again "pastbined"
them because I didn't want to add them to this text:
http://pastebin.com/raw.php?i=ftpmfSpv


I wanted to recreate the array
~$ sudo mdadm -A /dev/md1 /dev/sd[abcd]
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has no superblock - assembly aborted

Here's the output of blkid:
http://pastebin.com/raw.php?i=5AK0Eia1


> I forgot /var/log/dmesg only contains boot info. Entries since boot
> are only available via the dmesg command.
>
> ~$ dmesg|sendmail st...@hardwarefreak.com
>
> should email your current dmesg output directly to me with no
> copy/paste required, assuming exim or postfix is installed. If not
> you can use paste bin again. I prefer it in email so I can quote
> interesting parts directly, properly.

I'm not sure if you dmesg helps solving this problem too. Unfortunately
I couldn't email it so I created a pastebin:
http://pastebin.com/raw.php?i=2pNf9wGe


> > I removed the 2 TB disks from the NAS and mounted them in the Norco
> > and connected to the server vio lsi and expander. On these WD
> > drives I created the raid5 (md1) and on top of that the linear
> > array (md0). Upon creation of md1 the fourth disk (sdd) was added
> > as a spare which I had to add manually by setting
> >
> > mdadm --readwrite /dev/md1
>
> That's my fault. Sorry. I forgot to have you use "--force" when
> creating the RAID5s. I overlooked this because I NEVER use md parity
> arrays, nor any parity arrays. Reason for the spare:
>
> "When creating a RAID5 array, mdadm will automatically create a
> degraded array with an extra spare drive. This is because building
> the spare into a degraded array is in general faster than resyncing
> the parity on a non-degraded, but not clean, array. This feature can
> be overridden with the --force option."

Thanks for the explanation and the hint. I will use --force from now
on :-)


> > While it was syncing the disks I copied the files from md9 to md0.
> > During this proces sdb was set as faulty.
>
> Probably too much IO load with the array sync + file copy. Regardless
> of what anyone says, wait for md arrays to finish building/syncing
> before trying to put anything on top, whether another md layer,
> filesystem, or files.

I didn't read this before doing all the stuff above. Maybe it would
have saved from some headaches...


> >>> That's why I'm already thinking of buying new disks.
> >>
> >> Well lets look at this more closely. The disks may not be bad.
> >> How old are they?
>
> You didn't answer. How old are the 2TB and 1.5TB drives? What does
> SMART say about /dev/sdb?

Here are the dates I bought the disks:

04.10.2009: 1x Samsung HD154UI
17.02.2010: 3x Samsung HD154UI

12.12.2010: 1x Western Digital Caviar Green 2TB
17.03.2011: 1x Western Digital Caviar Green 2TB
11.08.2011: 2x Western Digital Caviar Green 2TB
01.10.2011: 2x Western Digital Caviar Green 2TB

To be honest I can't remember why I bought 6 of the WDs. But I have sold
at least one of them. The fifth must have disappeared somehow ;-)

I have now stopped md0 and md2 and removed the Samsung and the WD green
drives again. If you want me to post the details of them to I will add
them again. But for now I have here the output of hdparm for the four
drives:
http://pastebin.com/raw.php?i=xcD3mLUA


Maybe the problem now is related to the case because it's again sdb?
Or maybe it's already broken because I didn't cool them while copying
the files and rebuilding the spare drive.


> > Yes sorry it's absolutely fine. I was just curious because you wrote
> > "when the array fills up it gets slower". So I thought when I add
> > four new disks I'll get free space added and the linear array won't
> > be filled anymore as much as before and so it could regain it's
> > previous speed again.
>
> This is generally true and there are multiple reasons for it. To
> explain them fully would occupy many chapters in a book, and I'm sure
> someone has already written on this subject.
>
> In your case, using XFS atop a linear array, each time you add a new
> striped array underneath and grow XFS, access to space in the new
> striped array will generally be faster than into the sections of the
> filesystem that reside on the previous striped array(s) which are
> full, or near full.
>
> One of the reasons is metadata lookup--where is the file I need to
> get? If a phone book has 10 entries it's very quick to look up any one
> entry. What if it has 10 million entries? Takes a bit longer. I
> need to write a new 100GB file, where can I write it? Oh, there's
> not a 100GB chunk of free space to hold the file. Show me the table
> of empty spaces and their sizes. Calculate the best combination of
> those spaces to split the file across. The spaces are far apart on
> the device (array). We go to each one and write a small piece of the
> file.
>
> An hour later we want to read that file. Where is the file? Oh, it's
> here, and here, and here, and here and... So we go here, read a
> chunk, go there read a chunk...
>
> Those a just a couple of the reasons you slow down as your filesystem
> ages. This is true of both arrays and single disks. SSDs have no
> such limitations as the time to go from here to there retrieving file
> fragments is zero as there are no moving parts.
>
> > But really not important for my case!
> > Just curiosity ;-)
>
> I hope that was enough to satisfy your curiosity. :) Plenty of people
> have written about it if you care to Google.

Thank you for the explanation. It's especially hard to get into a new
topic because one doesn't know what to ask google :-)


> >>> No really. The adventure of enlarging my media server would have
> >>> ended in total frustration!
> >>
> >> There's still time for frustration--you're not done quite yet. lol
> >
> > Yes but now I'm in semi known territory ;-)
>
> Heheh. Yeah, at least you're starting to get a little solid footing
> under you. I first started working with hardware RAID about 15 years
> ago when single drive throughput peaked at 15MB/s and you were lucky
> to get 115MB/s out of a 20 drive array due the controllers being
> slow, and due to the PCI bus peaking at 115MB/s after protocol
> overhead when you used 2 or 4 controllers. Now single drives do that
> rate routinely.

I fear the solid footing is already becoming loose :-o


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120618004...@hoferr-x61s.hofer.rummelring

Ramon Hofer

unread,
Jun 18, 2012, 10:30:02 AM6/18/12
to
On Mon, 18 Jun 2012 00:46:55 +0200
Ramon Hofer <ramon...@bluewin.ch> wrote:

> I'm again having problems with the disks getting kicked out of the
> array :-o

I've already asked this before on the debian list and got an answer.
But I'm not sure if I should do this.

Here's a link to my old problem:
http://lists.debian.org/debian-user/2012/04/msg01290.html

The answer from Daniel Koch (thx again) was:

> - Zero all the superblocks on all the disks.
> ~$ mdadm --zero-superblock /dev/sd{b..d}
>
> - Recreate the array with the "--assume-clean" option.
> ~$ mdadm --create --verbose /dev/md0 --auto=yes --assume-clean
> --level=5 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd
>
> - Mark it possibly dirty with:
> ~$ mdadm --assemble /dev/md0 --update=resync"
>
> - Let it resync
>
> - Mount it and see if it is restored

I'm not sure if this is the correct way here too because I have a
nested raid.

If yes then this should work for me now:

~$ mdadm --zero-superblock /dev/sd[abcd]
~$ mdadm --zero-superblock /dev/sd[efgh]

~$ mdadm --create --verbose /dev/md1 --auto=yes --assume-clean
--level=5 --raid-devices=4 /dev/sd[abcd]
~$ mdadm --create --verbose /dev/md2 --auto=yes --assume-clean
--level=5 --raid-devices=4 /dev/sd[efgh]

~$ mdadm --assemble /dev/md1 --update=resync
~$ mdadm --assemble /dev/md2 --update=resync

Now md0 should have it's members back and I can start it again
~$ mdadm -A /dev/md0 /dev/md[12]

And if I'm very lucky this time I still have my data on the array :-)


I wanted to ask you before I try this if this could help.
Maybe I should ask in the linux raid mailing list too?


Cheers
Ramon


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/20120618122...@hoferr-x61s.hofer.rummelring
0 new messages