Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

6.0 on Dell 1850 with PERC4e/DC RAID?

17 views
Skip to first unread message

Scott Mitchell

unread,
Jan 5, 2006, 5:41:50 PM1/5/06
to freebsd...@freebsd.org
Hi all,

I may be getting a new Dell PE1850 soon, to replace our ancient CVS server
(still running 4-STABLE). The new machine will ideally run 6.0 and have a
PERC4e/DC RAID card - the one with battery-backed cache. This is listed as
supported by amr(4), but I'm wondering how well it actually works in the
case of a disk failure. Will the driver tell me that disk has failed (a
syslog message would be enough) or will I have to make a daily trip into
the server room to check the front panel lights? Presumably it handles
hot-swapping a replacement drive OK?

I found some posts mentioning some management/monitoring tools for these
controllers that were allegedly available from the www.lsilogic.com
website, but I can't find anything on there for FreeBSD. Do the Linux
tools work?

Cheers,

Scott

--
===========================================================================
Scott Mitchell | PGP Key ID | "Eagles may soar, but weasels
Cambridge, England | 0x54B171B9 | don't get sucked into jet engines"
scott at fishballoon.org | 0xAA775B8B | -- Anon

Doug White

unread,
Jan 5, 2006, 6:34:24 PM1/5/06
to Scott Mitchell, freebsd...@freebsd.org
On Thu, 5 Jan 2006, Scott Mitchell wrote:

> Hi all,
>
> I may be getting a new Dell PE1850 soon, to replace our ancient CVS server
> (still running 4-STABLE). The new machine will ideally run 6.0 and have a
> PERC4e/DC RAID card - the one with battery-backed cache. This is listed as
> supported by amr(4), but I'm wondering how well it actually works in the
> case of a disk failure. Will the driver tell me that disk has failed (a
> syslog message would be enough) or will I have to make a daily trip into
> the server room to check the front panel lights? Presumably it handles
> hot-swapping a replacement drive OK?

>From what I remember, you will receive status-change kernel messages when
disks disappear, rebuilds start, and so forth. So for most day-to-day
manipulation you should be fine.

You may want to make sure the auto rebuild option is enabled in the
controller's BIOS since no working control programs from userland are
generally available at this time. That also means you can't create new
volumes at runtime, but thats not so horrible...

--
Doug White | FreeBSD: The Power to Serve
dwh...@gumbysoft.com | www.FreeBSD.org

Scott Mitchell

unread,
Jan 5, 2006, 7:01:16 PM1/5/06
to Doug White, freebsd...@freebsd.org
On Thu, Jan 05, 2006 at 03:34:24PM -0800, Doug White wrote:
> On Thu, 5 Jan 2006, Scott Mitchell wrote:
>
> > Hi all,
> >
> > I may be getting a new Dell PE1850 soon, to replace our ancient CVS server
> > (still running 4-STABLE). The new machine will ideally run 6.0 and have a
> > PERC4e/DC RAID card - the one with battery-backed cache. This is listed as
> > supported by amr(4), but I'm wondering how well it actually works in the
> > case of a disk failure. Will the driver tell me that disk has failed (a
> > syslog message would be enough) or will I have to make a daily trip into
> > the server room to check the front panel lights? Presumably it handles
> > hot-swapping a replacement drive OK?
>
> >From what I remember, you will receive status-change kernel messages when
> disks disappear, rebuilds start, and so forth. So for most day-to-day
> manipulation you should be fine.

That would be fine - as long as there's some notification of important
events.

> You may want to make sure the auto rebuild option is enabled in the
> controller's BIOS since no working control programs from userland are
> generally available at this time. That also means you can't create new
> volumes at runtime, but thats not so horrible...

I expect there will only ever be one volume, so that's unlikely to be a
problem :)

Many thanks,

David Sze

unread,
Jan 5, 2006, 7:09:09 PM1/5/06
to Doug White, Scott Mitchell, freebsd...@freebsd.org
On Thu, Jan 05, 2006 at 03:34:24PM -0800, Doug White wrote:

The sysutils/megarc port appears to work for both status change polling
and runtime configuration (at least on a PE800 and a PE2850 that I tested
on).


Scott Mitchell

unread,
Jan 5, 2006, 7:24:19 PM1/5/06
to David Sze, freebsd...@freebsd.org

Cool, I'll check that out when the hardware arrives.

Many thanks,

Michael Vince

unread,
Jan 5, 2006, 9:21:56 PM1/5/06
to Scott Mitchell, freebsd...@freebsd.org
Scott Mitchell wrote:

>Hi all,
>
>I may be getting a new Dell PE1850 soon, to replace our ancient CVS server
>(still running 4-STABLE). The new machine will ideally run 6.0 and have a
>PERC4e/DC RAID card - the one with battery-backed cache. This is listed as
>supported by amr(4), but I'm wondering how well it actually works in the
>case of a disk failure. Will the driver tell me that disk has failed (a
>syslog message would be enough) or will I have to make a daily trip into
>the server room to check the front panel lights? Presumably it handles
>hot-swapping a replacement drive OK?
>
>I found some posts mentioning some management/monitoring tools for these
>controllers that were allegedly available from the www.lsilogic.com
>website, but I can't find anything on there for FreeBSD. Do the Linux
>tools work?
>
>

FYI there also has been a big update to the amr driver which claims to
dramatically increase performance among other things, interestingly
enought it was augmented by Yahoo, I can only assume they are moving to
Dell, yahoo for me (and now you :).
The updates are still in -current but it will be MFC'ed into stable
sooner or later.

http://lists.freebsd.org/pipermail/cvs-src/2005-December/056814.html

Log:
Mega update to the LSI MegaRAID driver:

1. Implement a large set of ioctl shims so that the Linux management apps
from LSI will work. This includes infrastructure to support adding, deleting
and rescanning arrays at runtime. This is based on work from Doug Ambrosko,
heavily augmented by LSI and Yahoo.

2. Implement full 64-bit DMA support. Systems with more than 4GB of RAM
can now operate without the cost of bounce buffers. Cards that cannot do
64-bit DMA will automatically revert to using bounce buffers. This option
can be forced off by setting the 'hw.amr.force_sg32" tunable in the loader.
It should only be turned off for debugging purposes. This work was sponsored
by Yahoo.

3. Streamline the command delivery and interrupt handler paths after
much discussion with Dell and LSI. The logic now closely matches the
intended design, making it both more robust and much faster. Certain
i/o failures under heavy load should be fixed with this.

4. Optimize the locking. In the interrupt handler, the card can be checked
for completed commands without any locks held, due to the handler being
implicitely serialized and there being no need to look at any shared data.
Only grab the lock to return the command structure to the free pool. A
small optimization can still be made to collect all of the completions
together and then free them together under a single lock.

Items 3 and 4 significantly increase the performance of the driver. On an
LSI 320-2X card, transactions per second went from 13,000 to 31,000 in my
testing with these changes. However, these changes are still fairly
experimental and shouldn't be merged to 6.x until there is more testing.

Thanks to Doug Ambrosko, LSI, Dell, and Yahoo for contributing towards
this.


Scott Mitchell

unread,
Jan 6, 2006, 5:55:02 AM1/6/06
to Michael Vince, freebsd...@freebsd.org
On Fri, Jan 06, 2006 at 01:21:56PM +1100, Michael Vince wrote:
> FYI there also has been a big update to the amr driver which claims to
> dramatically increase performance among other things, interestingly
> enought it was augmented by Yahoo, I can only assume they are moving to
> Dell, yahoo for me (and now you :).
> The updates are still in -current but it will be MFC'ed into stable
> sooner or later.
>
> http://lists.freebsd.org/pipermail/cvs-src/2005-December/056814.html

Yeah, I saw that, and it sounds most excellent. Good to see some real
support from the likes of Dell and LSI, too.

I might be able to get away with running -stable on this machine, but
-current will be right out. Hopefully these changes can be MFCed in time
for 6.1.

Scott

> _______________________________________________
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Vivek Khera

unread,
Jan 6, 2006, 10:35:46 AM1/6/06
to freebsd-stable

On Jan 5, 2006, at 5:41 PM, Scott Mitchell wrote:

> I may be getting a new Dell PE1850 soon, to replace our ancient CVS
> server
> (still running 4-STABLE). The new machine will ideally run 6.0 and
> have a
> PERC4e/DC RAID card - the one with battery-backed cache. This is
> listed as

I have an 1850 with the buil-in PERC 4e/Si since all I needed was the
RAID1 mirror of the internal drives. It works extremely well, and
the speed is quite good.

As for notices of when the drives go bad, under 4.x I've had disk
failures with the amr driver (different PERC cards) and not gotten
any such notices in the syslog that I recall. I did find a program
posted to one of the freebsd lists called 'amrstat' that I run
nightly. It produces this kind of output:

Drive 0: 68.24 GB, RAID1 <writeback,no-read-ahead,no-adaptative-
io> optimal

If it says "degraded" it is time to fix a drive. You just fire up
the lsi megaraid tools and find out which drive it is.

If you go to the LSI download area, they have one file for FreeBSD,
which is labeled the driver. In that zip file is also the management
software for freebsd. You'll want that. Personally, I like the
"MEGAMGR" software which was released for freebsd 4.x and mimics the
BIOS' interface in a terminal window.

The rebuild on LSI controllers is set to automatic on the dells as
default. It just works as expected.

Overall, I'm a big fan of the LSI cards and the amr driver...

Unfortunately for me, the latest equipment I just got only takes low-
profile cards, and LSI doesn't offer a dual channel RAID card in low-
profile configuration... so I need to look at adaptec.

Vivek Khera

unread,
Jan 6, 2006, 1:53:30 PM1/6/06
to freebsd-stable

On Jan 5, 2006, at 9:21 PM, Michael Vince wrote:

> Items 3 and 4 significantly increase the performance of the
> driver. On an
> LSI 320-2X card, transactions per second went from 13,000 to
> 31,000 in my
> testing with these changes. However, these changes are still fairly
> experimental and shouldn't be merged to 6.x until there is more
> testing.
> Thanks to Doug Ambrosko, LSI, Dell, and Yahoo for contributing
> towards
> this.

Damn that's awesome! Thanks to all who helped with this... This
will be great for some of my servers.

Now, does anyone have any numbers to compare this with other RAID
cards? Particularly the 2230SLP? :-)

/me wishes LSI maid low profile dual channel cards...

Scott Mitchell

unread,
Jan 6, 2006, 6:38:57 PM1/6/06
to Vivek Khera, freebsd-stable
On Fri, Jan 06, 2006 at 10:35:46AM -0500, Vivek Khera wrote:
>
> On Jan 5, 2006, at 5:41 PM, Scott Mitchell wrote:
>
> >I may be getting a new Dell PE1850 soon, to replace our ancient CVS
> >server
> >(still running 4-STABLE). The new machine will ideally run 6.0 and
> >have a
> >PERC4e/DC RAID card - the one with battery-backed cache. This is
> >listed as
>
> I have an 1850 with the buil-in PERC 4e/Si since all I needed was the
> RAID1 mirror of the internal drives. It works extremely well, and
> the speed is quite good.

We'll only be mirroring the internal drives too for now - the 4e/DC seems
to be the only RAID option on the 1850 with battery-backed cache, and
doesn't cost much more for the extra peace-of-mind.

> As for notices of when the drives go bad, under 4.x I've had disk
> failures with the amr driver (different PERC cards) and not gotten
> any such notices in the syslog that I recall.

That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC
cards? Still, something I can run out of cron to check the array status
should be fine.

> I did find a program
> posted to one of the freebsd lists called 'amrstat' that I run
> nightly. It produces this kind of output:
>
> Drive 0: 68.24 GB, RAID1 <writeback,no-read-ahead,no-adaptative-
> io> optimal
>
> If it says "degraded" it is time to fix a drive. You just fire up
> the lsi megaraid tools and find out which drive it is.
>
> If you go to the LSI download area, they have one file for FreeBSD,
> which is labeled the driver. In that zip file is also the management
> software for freebsd. You'll want that. Personally, I like the
> "MEGAMGR" software which was released for freebsd 4.x and mimics the
> BIOS' interface in a terminal window.

There's a port of the management software now: sysutils/megarc

> The rebuild on LSI controllers is set to automatic on the dells as
> default. It just works as expected.

Cool.

> Overall, I'm a big fan of the LSI cards and the amr driver...
>
> Unfortunately for me, the latest equipment I just got only takes low-
> profile cards, and LSI doesn't offer a dual channel RAID card in low-
> profile configuration... so I need to look at adaptec.

This is on your x4100? Nice machine. We have a v20z with dual Opteron
270s that I totally love. Looking at getting an x4100 too... sadly these
are product development machines so they'll be running RedHat and Solaris.
Doesn't the x4100 have h/w RAID built in? Or does that not work with
FreeBSD?

Vivek Khera

unread,
Jan 6, 2006, 9:43:17 PM1/6/06
to freebsd-stable

On Jan 6, 2006, at 6:38 PM, Scott Mitchell wrote:

> We'll only be mirroring the internal drives too for now - the 4e/DC
> seems
> to be the only RAID option on the 1850 with battery-backed cache, and
> doesn't cost much more for the extra peace-of-mind.

Then you'll be pleasantly surprised to know that the 4e/Si has a
battery too. I certainly was... and it even has 256MB of cache RAM.
Quite the bargain! I'll send you screen shots of the config menus in
private email.

>> Unfortunately for me, the latest equipment I just got only takes low-
>> profile cards, and LSI doesn't offer a dual channel RAID card in low-
>> profile configuration... so I need to look at adaptec.
>
> This is on your x4100? Nice machine. We have a v20z with dual
> Opteron
> 270s that I totally love. Looking at getting an x4100 too... sadly
> these
> are product development machines so they'll be running RedHat and
> Solaris.
> Doesn't the x4100 have h/w RAID built in? Or does that not work with
> FreeBSD?

Yes, this is the X4100. It only has room for two low-profile PCI-X
cards, which the 320-2X certainly is not. Curiously, LSI has on
their web site some big announcements about some deals with Sun to
use their products, so one would hope they would have a low-profile
high-end card. Currently they only have a low-end card that is low
profile.

I'm biting the bullet and getting an Adaptec 2230 low profile card.
I hope it is fast. if not, then back to the drawing board... sigh.

Doug Ambrisko

unread,
Jan 12, 2006, 7:41:17 PM1/12/06
to Scott Mitchell, Vivek Khera, freebsd-stable
Scott Mitchell writes:
| On Fri, Jan 06, 2006 at 10:35:46AM -0500, Vivek Khera wrote:
| >
| > On Jan 5, 2006, at 5:41 PM, Scott Mitchell wrote:
| >
| > >I may be getting a new Dell PE1850 soon, to replace our ancient CVS
| > >server
| > >(still running 4-STABLE). The new machine will ideally run 6.0 and
| > >have a
| > >PERC4e/DC RAID card - the one with battery-backed cache. This is
| > >listed as
| >
| > I have an 1850 with the buil-in PERC 4e/Si since all I needed was the
| > RAID1 mirror of the internal drives. It works extremely well, and
| > the speed is quite good.
|
| We'll only be mirroring the internal drives too for now - the 4e/DC seems
| to be the only RAID option on the 1850 with battery-backed cache, and
| doesn't cost much more for the extra peace-of-mind.
|
| > As for notices of when the drives go bad, under 4.x I've had disk
| > failures with the amr driver (different PERC cards) and not gotten
| > any such notices in the syslog that I recall.
|
| That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC
| cards? Still, something I can run out of cron to check the array status
| should be fine.

Are you refering to this Doug. The Linux ioctl shim requires one file
that hasn't been committed yet. Scott L. & ps have it. I may commit
it now that I'm back. This lets all of the Dell/LSI Linux tools
run on FreeBSD including the firmware update tool. The caveat is
that with the driver re-do it seems the certain things in the ioctl
path causes the firmware to lock-up. I haven't been around enough
to help with that problem. I have a binary that locks it up pretty
quick.

Most of the existing monitoring tools have bugs. The Linux tools
tend to be better but the last copy of MegaMon leaked shared memory
then quit. We have a tool at work but it is encumbered so we can't
give it out.



| > I did find a program
| > posted to one of the freebsd lists called 'amrstat' that I run
| > nightly. It produces this kind of output:
| >
| > Drive 0: 68.24 GB, RAID1 <writeback,no-read-ahead,no-adaptative-
| > io> optimal
| >
| > If it says "degraded" it is time to fix a drive. You just fire up
| > the lsi megaraid tools and find out which drive it is.

This is probably a faily good scheme. Caveat is that you can have
a "optimal" RAID that is broken :-(

On another note, ipmi is pretty good to remotely monitor these boxes
and you can run the Dell SOL proxy tool for Linux on FreeBSD then setup
the BIOS on the serial port and connect the serial port to BMC/LAN.

FWIW, I've been working on an openipmi compatible driver. It basically
works for a bunch of programs that I've tested with as long as they
are compiled with a correct ioctl file.

Doug A.

Jung-uk Kim

unread,
Jan 12, 2006, 8:20:55 PM1/12/06
to freebsd...@freebsd.org, Scott Mitchell, Vivek Khera
On Thursday 12 January 2006 07:41 pm, Doug Ambrisko wrote:

> Scott Mitchell writes:
> | > I did find a program
> | > posted to one of the freebsd lists called 'amrstat' that I run
> | > nightly. It produces this kind of output:
> | >
> | > Drive 0: 68.24 GB, RAID1
> | > <writeback,no-read-ahead,no-adaptative- io> optimal
> | >
> | > If it says "degraded" it is time to fix a drive. You just
> | > fire up the lsi megaraid tools and find out which drive it is.
>
> This is probably a faily good scheme. Caveat is that you can have
> a "optimal" RAID that is broken :-(

That's lame. Under what condition does it happen, do you know?

Thanks,

Jung-uk Kim

Jung-uk Kim

unread,
Jan 13, 2006, 11:58:27 AM1/13/06
to freebsd...@freebsd.org, Scott Mitchell, Vivek Khera
On Friday 13 January 2006 11:49 am, Doug Ambrisko wrote:
> Jung-uk Kim writes:
> [ Charset euc-kr unsupported, skipping... ]

If your mail client cannot handle the charset, read:

http://docs.freebsd.org/cgi/mid.cgi?200601122020.59843.jkim

Jung-uk Kim

Doug Ambrisko

unread,
Jan 13, 2006, 11:49:43 AM1/13/06
to Jung-uk Kim, Scott Mitchell, freebsd...@freebsd.org, Vivek Khera

Doug Ambrisko

unread,
Jan 13, 2006, 11:59:48 AM1/13/06
to Jung-uk Kim, Scott Mitchell, freebsd...@freebsd.org, Vivek Khera

Running RAID 10, a drive was swapped and the rebuild started on the
replacement drive. The rebuild complained about the source drive
for the mirror rebuild having read errors that couldn't be recovered.
It continued on and finished re-creating the mirror. Then the RAID
proceeeded onto a background init which they normal did and started
failing that and re-starting the background init over and over again.
The box changed the RAID from degraded to optimal when the rebuild
completed (with errors). Do a dd of the entire RAID logical device
returned an error at the bad sector since it couldn't recover that.
The RAID controller reported an I/O error and still left the RAID as
optimal.

We reported this and where told that's the way it is designed :-(
Probably the spec. is defined by whatever the RAID controller happens
to do versus what make sense :-(

So far this has only happened once. Changing firmware did not help.

Doug A.

PS. sorry for the null email before this. Hit the wrong key.

Jung-uk Kim

unread,
Jan 13, 2006, 12:00:22 PM1/13/06
to freebsd...@freebsd.org, Scott Mitchell, Vivek Khera

Sorry, my mail client did it again. :-(

Jung-uk Kim

Jung-uk Kim

unread,
Jan 13, 2006, 12:12:15 PM1/13/06
to Doug Ambrisko, Scott Mitchell, Vivek Khera, freebsd...@freebsd.org

Similar thing happened to me once or twice (with RAID5) and I thought
it was just a broken controller. If the culprit was design, it IS
really lame. :-(

> Doug A.
>
> PS. sorry for the null email before this. Hit the wrong key.

No need to be sorry. I made the same mistake again. ;-)

Thanks for the info,

Jung-uk Kim

Mike Tancsa

unread,
Jan 13, 2006, 12:59:37 PM1/13/06
to Doug Ambrisko, freebsd...@freebsd.org
At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
>|
>| That's lame. Under what condition does it happen, do you know?
>
>Running RAID 10, a drive was swapped and the rebuild started on the
>replacement drive. The rebuild complained about the source drive
>for the mirror rebuild having read errors that couldn't be recovered.
>It continued on and finished re-creating the mirror. Then the RAID
>proceeeded onto a background init which they normal did and started
>failing that and re-starting the background init over and over again.
>The box changed the RAID from degraded to optimal when the rebuild
>completed (with errors). Do a dd of the entire RAID logical device
>returned an error at the bad sector since it couldn't recover that.
>The RAID controller reported an I/O error and still left the RAID as
>optimal.
>
>We reported this and where told that's the way it is designed :-(


Interesting timing as I ran into this sort of situation on the
weekend on a 3ware drive in RAID1. The card had complained for a week
about read errors on drive 1. We thought we would wait until the
weekend maintenance window to swap it out. Sadly, before that
window, drive zero totally died a horrible death. We popped in a new
drive on port zero, started the rebuild, and it crapped out saying
there was a read error on drive 1. However, there is a check box
that says continue the build, even with errors on the source drive.

This setup seems to give you the best of both worlds. We did a quick
check of the resultant files compared to backups and only a couple
were toasted. (The box is going to be retired in a month, so if there
is other hidden fs corruption if it holds out for another 3 weeks we
dont care too much). The correct approach would be to do a total
restore of course, but this was good enough for us in this
situation. I guess the question is, is this RAID1 in a proper mirror
given that there are hard errors on the drive on port 1 ?

---Mike

Doug Ambrisko

unread,
Jan 13, 2006, 1:26:23 PM1/13/06
to Mike Tancsa, freebsd...@freebsd.org
Mike Tancsa writes:
| At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
| >|
| >| That's lame. Under what condition does it happen, do you know?
| >
| >Running RAID 10, a drive was swapped and the rebuild started on the
| >replacement drive. The rebuild complained about the source drive
| >for the mirror rebuild having read errors that couldn't be recovered.
| >It continued on and finished re-creating the mirror. Then the RAID
| >proceeeded onto a background init which they normal did and started
| >failing that and re-starting the background init over and over again.
| >The box changed the RAID from degraded to optimal when the rebuild
| >completed (with errors). Do a dd of the entire RAID logical device
| >returned an error at the bad sector since it couldn't recover that.
| >The RAID controller reported an I/O error and still left the RAID as
| >optimal.
| >
| >We reported this and where told that's the way it is designed :-(
|
| Interesting timing as I ran into this sort of situation on the
| weekend on a 3ware drive in RAID1. The card had complained for a week
| about read errors on drive 1. We thought we would wait until the
| weekend maintenance window to swap it out. Sadly, before that
| window, drive zero totally died a horrible death. We popped in a new
| drive on port zero, started the rebuild, and it crapped out saying
| there was a read error on drive 1. However, there is a check box
| that says continue the build, even with errors on the source drive.

With Adaptec we used to do a verify of each disk before a swap
to increase our chances of a successful disk swap. Adaptec was
a little heavy handed in if you are running on the last disk of the
mirror and it has a read-error it will fail the drive. If you have
a RAID 10 then you lose 1/2 the file system :-( I'd rather just
get the read error back to the OS then loose the entire drive.



| This setup seems to give you the best of both worlds. We did a quick
| check of the resultant files compared to backups and only a couple
| were toasted. (The box is going to be retired in a month, so if there
| is other hidden fs corruption if it holds out for another 3 weeks we
| dont care too much). The correct approach would be to do a total
| restore of course, but this was good enough for us in this
| situation. I guess the question is, is this RAID1 in a proper mirror
| given that there are hard errors on the drive on port 1 ?

That sounds like a good controller assuming it says the RAID is still
degraded and it's not optimal. I assume "optimal" means everything
is fine and safe to read the entire volume.

Doug A.

Doug Ambrisko

unread,
Jan 13, 2006, 1:28:24 PM1/13/06
to Jung-uk Kim, Scott Mitchell, freebsd...@freebsd.org, Vivek Khera

I'd suggest whining to them. To me "optimal" means "as far as I know
there are no problems with the RAID". If enough customers whine they
might change their view!

Doug A.

David Kirchner

unread,
Jan 13, 2006, 1:56:01 PM1/13/06
to Doug Ambrisko, Scott Mitchell, Vivek Khera, freebsd...@freebsd.org, Jung-uk Kim
On 1/13/06, Doug Ambrisko <ambr...@ambrisko.com> wrote:
> I'd suggest whining to them. To me "optimal" means "as far as I know
> there are no problems with the RAID". If enough customers whine they
> might change their view!

heh. When we've told Dell that some of our 1750s and 1850s were
locking up randomly, with various errors (most common, either the mpt0
driver complains and never recovers, or the server locks up entirely
with no error), we're told that a) FreeBSD isn't supported and b) to
run the diagnostics disk (which never finds anything except that the
CD ROM drive is empty) which basically leads to the implied c) to piss
up a rope.

Dell cares not about us FreeBSD users.

Better to just go with known-working hardware like 3ware cards. I do
wish they had a SCSI RAID controller. Seems like all major SCSI RAID
cards have various problems: Adaptec 2100S(and the rest in the line)
would not rebuild transparently -- the OS would get various timeout
errors while rebuilds or verifies were ongoing; Mylex's FreeBSD driver
has 2.5 year old bug i386/55603; and then these Dell cards (LSI?) have
obvious problems. I'm sure there are others I've missed.

3ware cards in Supermicro servers (not sure which exact models,
Silicon Mechanics sells them as their R200, R204, Q500, and others)
are rock solid for us even during rebuilds and with degraded arrays.
And the best part is they're not Dells.</educated_bias>

Scott Mitchell

unread,
Jan 14, 2006, 6:36:52 AM1/14/06
to Doug Ambrisko, Vivek Khera, freebsd-stable
On Thu, Jan 12, 2006 at 04:41:17PM -0800, Doug Ambrisko wrote:
> Scott Mitchell writes:
> |
> | That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC
> | cards? Still, something I can run out of cron to check the array status
> | should be fine.
>
> Are you refering to this Doug. The Linux ioctl shim requires one file
> that hasn't been committed yet. Scott L. & ps have it. I may commit
> it now that I'm back. This lets all of the Dell/LSI Linux tools
> run on FreeBSD including the firmware update tool. The caveat is
> that with the driver re-do it seems the certain things in the ioctl
> path causes the firmware to lock-up. I haven't been around enough
> to help with that problem. I have a binary that locks it up pretty
> quick.

Hi Doug,

I was actually referring to Doug White, who said:

>From what I remember, you will receive status-change kernel messages when
>disks disappear, rebuilds start, and so forth. So for most day-to-day
>manipulation you should be fine.

It wasn't clear if this applied to the amr(4)-based PERC cards or just the
aac(4) ones.

Sounds like the re-worked amr driver will be very much better, at least
once a few more bugs have been ironed out of it.

> Most of the existing monitoring tools have bugs. The Linux tools
> tend to be better but the last copy of MegaMon leaked shared memory
> then quit. We have a tool at work but it is encumbered so we can't
> give it out.
>
> | > I did find a program
> | > posted to one of the freebsd lists called 'amrstat' that I run
> | > nightly. It produces this kind of output:
> | >
> | > Drive 0: 68.24 GB, RAID1 <writeback,no-read-ahead,no-adaptative-
> | > io> optimal
> | >
> | > If it says "degraded" it is time to fix a drive. You just fire up
> | > the lsi megaraid tools and find out which drive it is.
>
> This is probably a faily good scheme. Caveat is that you can have
> a "optimal" RAID that is broken :-(

That's pretty sucky, but presumably not a FreeBSD-specific problem?
Despite that, I'm reasonably hopeful that a scheme like this along with
good backups (which we have) will be enough to avoid any major disasters.

Is Dell's support any better if you tell them you're running RedHat?

Regards,

Doug Ambrisko

unread,
Jan 14, 2006, 2:10:51 PM1/14/06
to Scott Mitchell, Vivek Khera, freebsd-stable
Scott Mitchell writes:
| On Thu, Jan 12, 2006 at 04:41:17PM -0800, Doug Ambrisko wrote:
| > Scott Mitchell writes:
| > |
| > | That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC
| > | cards? Still, something I can run out of cron to check the array status
| > | should be fine.
| >
| > Are you refering to this Doug. The Linux ioctl shim requires one file
| > that hasn't been committed yet. Scott L. & ps have it. I may commit
| > it now that I'm back. This lets all of the Dell/LSI Linux tools
| > run on FreeBSD including the firmware update tool. The caveat is
| > that with the driver re-do it seems the certain things in the ioctl
| > path causes the firmware to lock-up. I haven't been around enough
| > to help with that problem. I have a binary that locks it up pretty
| > quick.
|
| Hi Doug,
|
| I was actually referring to Doug White, who said:
|
| >From what I remember, you will receive status-change kernel messages when
| >disks disappear, rebuilds start, and so forth. So for most day-to-day
| >manipulation you should be fine.
|
| It wasn't clear if this applied to the amr(4)-based PERC cards or just the
| aac(4) ones.

Yes that only applies to the aac based machines and not amr based machines
(ie. Adaptec versus LSI). With LSI you have to poll the controller
for RAID events and that is not public.



| Sounds like the re-worked amr driver will be very much better, at least
| once a few more bugs have been ironed out of it.

Yes.



| > Most of the existing monitoring tools have bugs. The Linux tools
| > tend to be better but the last copy of MegaMon leaked shared memory
| > then quit. We have a tool at work but it is encumbered so we can't
| > give it out.
| >
| > | > I did find a program
| > | > posted to one of the freebsd lists called 'amrstat' that I run
| > | > nightly. It produces this kind of output:
| > | >
| > | > Drive 0: 68.24 GB, RAID1 <writeback,no-read-ahead,no-adaptative-
| > | > io> optimal
| > | >
| > | > If it says "degraded" it is time to fix a drive. You just fire up
| > | > the lsi megaraid tools and find out which drive it is.
| >
| > This is probably a faily good scheme. Caveat is that you can have
| > a "optimal" RAID that is broken :-(
|
| That's pretty sucky, but presumably not a FreeBSD-specific problem?
| Despite that, I'm reasonably hopeful that a scheme like this along with
| good backups (which we have) will be enough to avoid any major disasters.

It's not a FreeBSD specific problem.



| Is Dell's support any better if you tell them you're running RedHat?

We can sort-of run RedHat. That is, we ran the Linux RAID binaries
from LSI & Dell with the Linux ioctl emulation layer I did on FreeBSD.
I netboot Linux sometimes to verify some things.

Doug A.

Vivek Khera

unread,
Jan 16, 2006, 12:42:59 PM1/16/06
to freebsd-stable

On Jan 13, 2006, at 1:56 PM, David Kirchner wrote:

> has 2.5 year old bug i386/55603; and then these Dell cards (LSI?) have
> obvious problems. I'm sure there are others I've missed.

i've never had a rebuild error on a Dell LSI card. never had a
failure on a box with adaptec based card, so can't say about that.

Vivek Khera

unread,
Jan 16, 2006, 2:05:20 PM1/16/06
to freebsd-stable

On Jan 14, 2006, at 6:36 AM, Scott Mitchell wrote:

> I was actually referring to Doug White, who said:
>
>> From what I remember, you will receive status-change kernel
>> messages when
>> disks disappear, rebuilds start, and so forth. So for most day-to-day
>> manipulation you should be fine.
>
> It wasn't clear if this applied to the amr(4)-based PERC cards or
> just the
> aac(4) ones.
>
> Sounds like the re-worked amr driver will be very much better, at
> least
> once a few more bugs have been ironed out of it.

From my experience, the amr driver does not issue warnings of any
sort that show up on the console or in log files. The aac driver is
more chatty -- I see log file lines about the battery being
recharged, etc.

I've never had a drive failure on any box in which I have an aac
driven card, so can't speak to that but I'd bet $1 that it would log
it. The amr driver doesn't log drive failures -- one must run some
utility to probe it.

Steven Hartland

unread,
Jan 16, 2006, 2:56:15 PM1/16/06
to Vivek Khera, freebsd-stable
I can confirm even with a down mirror nothing in the log files from amr

Steve


----- Original Message -----
From: "Vivek Khera" <vi...@khera.org>

> From my experience, the amr driver does not issue warnings of any
> sort that show up on the console or in log files. The aac driver is
> more chatty -- I see log file lines about the battery being
> recharged, etc.

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137
or return the E.mail to postm...@multiplay.co.uk.

Scott Mitchell

unread,
Mar 23, 2006, 12:26:11 PM3/23/06
to freebsd...@freebsd.org
On Thu, Jan 05, 2006 at 10:41:50PM +0000, Scott Mitchell wrote:
> Hi all,
>
> I may be getting a new Dell PE1850 soon, to replace our ancient CVS server
> (still running 4-STABLE). The new machine will ideally run 6.0 and have a
> PERC4e/DC RAID card - the one with battery-backed cache. This is listed as
> supported by amr(4), but I'm wondering how well it actually works in the
> case of a disk failure. Will the driver tell me that disk has failed (a
> syslog message would be enough) or will I have to make a daily trip into
> the server room to check the front panel lights? Presumably it handles
> hot-swapping a replacement drive OK?
>
> I found some posts mentioning some management/monitoring tools for these
> controllers that were allegedly available from the www.lsilogic.com
> website, but I can't find anything on there for FreeBSD. Do the Linux
> tools work?

Following up to myself for the benefit of the archives - I can confirm that
the PERC4e in the PE1850 works perfectly with amr(4) under 6.0. I've been
using the sysutils/megarc port for managing the adapter from FreeBSD. It
has a truly awful user interface but allows you to do everything that the
BIOS setup program does, so far as I can tell.

For monitoring we're relying on the email alerts from the DRAC/4 management
card also in the machine, which turn out to work very well. We actually
had a disk failure on the machine already (one of the drives had apparently
worked itself a bit loose in transit and decided to power itself off a few
days after I put the machine in the rack). The DRAC sent out an email when
the drive "died", it auto-rebuilt when shoved back into the slot properly,
then another email from the DRAC when the rebuild was complete.

I'm looking forward to the amr(4) performance improvements in 6.1 and being
able to run the Linux megmgr tool (I think this is the one with the same
user interface as the BIOS setup program).

Cheers,

0 new messages