raid and dump/restore after the disaster

Roberto Nunnari

unread,

Apr 29, 2008, 2:57:22 AM4/29/08

to freebsd-...@freebsd.org

Hi all!

I'm playing with new HW and FreeBSD 6.3 and 7.0.

I set up raid 1 on two sata disks (fakeraid on ICH9R)
and as long as I can see, it seams to work very well.

Now I'm trying to simulate 1 disk failure (I just take
out a disk and boot again). Doesn't matter which of the
two disks I take out, the bios correctly shows the raid
as degraded and bootable, loads the FreeBSD loader, who
loads the kernel and starts the boot.
But when the kernel comes to the drives (or the swap?)
it fatal traps 12. The trap descriptions sais that
current process is 0 (swapper).

Reading that I commented out the swap partition from fstab,
but that doesn't help.

How can I get the system to finish the boot?

Thank you and best regards.

--
Robi

Roberto Nunnari

unread,

Apr 29, 2008, 3:35:15 PM4/29/08

to freebsd-...@freebsd.org

Nobody on this, please? :)

Roberto Nunnari

unread,

Apr 30, 2008, 3:09:18 AM4/30/08

to freebsd-...@freebsd.org

Hi!

Anybody on this, pleeeease? :)

Am I missing something basilar, or it's a FreeBSD bug?
Incomplete support for the ICH9R?

I cannot attach the boot log, because the boot process
panics just before mounting the disks and nothing is
logged on /var/log/

Anyways, booting in verbose mode shows that the last
activities before it panics are on the disks and fakeraid..
it finds one of the disks and then the last output before
the panic is about the Intel MatrixRAID.

Any thoughts on this, please?

Best regards
Robi

Derek Ragona

unread,

Apr 30, 2008, 11:39:50 AM4/30/08

to Roberto Nunnari, freebsd-...@freebsd.org

I believe these are software RAID and the support for the failover is in
the system BIOS. If a drive fails, you need to replace the failed drive
and rebuild the array. If you want hot swapable drives in an array, you
will need to use a different RAID card that supports that feature.

-Derek

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Ted Mittelstaedt

unread,

May 3, 2008, 12:11:05 PM5/3/08

to Roberto Nunnari, freebsd-...@freebsd.org

Roberto,

You can't simulate a disk drive failure that way. If
you really want to know what's going on, the issue is that
your pointing the swap to ar0. If you must get this
booted again you can try booting into single user mode
and editing /etc/fstab and pointing the partitions to
/dev/ad0 instead of /dev/ar0 and booting. But this is
an emergency action and is not recommended.

If you want to simulate a drive failure, WHILE THE
SYSTEM IS RUNNING pull the SATA connector on one drive.

The system should NOT trap, it should simply print a
error to the console and show it's gone into degraded mode.

If you then reboot, the system may or may not come back up.

You have to understand the approach of RAID mirroring. Basically
this is poor-man's data protection. The idea is that a disk
usually fails in the middle of the day during the worst possible
time. When it does you do NOT want the server to stop or
crash. You want it to keep running until the evening when you
can spend a couple hours getting the disk replaced. (or until
the next day when you can buy a replacement drive)

When you have the replacement disk ready to plug into the
system, you are supposed to run a full backup of your data
on the degraded array just in case the reinsertion goes badly.

I have found the safest is to leave the server alone and
get the replacement disk ready. Wiping it in another system
with dd if=/dev/zero of=/dev/ad1 bs=50k is the best policy
before reinsertion.

Follow the steps in the man page for reinsertion. Keep in
mind that they don't always work. If they don't then you will
have to wipe both disks and regenerate the array and reinstall
the OS. That is why you make a backup first when the system is
off-duty.

Ted

> _______________________________________________
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questi...@freebsd.org"
>

Roberto Nunnari

unread,

May 3, 2008, 3:54:21 PM5/3/08

to Ted Mittelstaedt, freebsd-...@freebsd.org

Hi Ted.

Thank you for replying to my post.

see my comments below.

Ted Mittelstaedt wrote:
> Roberto,
>
> You can't simulate a disk drive failure that way. If
> you really want to know what's going on, the issue is that
> your pointing the swap to ar0. If you must get this
> booted again you can try booting into single user mode
> and editing /etc/fstab and pointing the partitions to
> /dev/ad0 instead of /dev/ar0 and booting. But this is
> an emergency action and is not recommended.

It doesn't even get to that point.. it panics before giving
me the shell for single user mode.. I even made a few tests
trying to comment out the swap in fstab, but that didn't help.

but it doesn't matter.. I solved the problem by removing
and then readding the disk from the raid in the bios.
One up again had to rebuild the raid in the OS.. and
that was it..

>
> If you want to simulate a drive failure, WHILE THE
> SYSTEM IS RUNNING pull the SATA connector on one drive.

For sure that makes for a real test.. but..
Are you sure that that will not fry up the mainboard or the drive?

>
> The system should NOT trap, it should simply print a
> error to the console and show it's gone into degraded mode.
>
> If you then reboot, the system may or may not come back up.
>
> You have to understand the approach of RAID mirroring. Basically
> this is poor-man's data protection. The idea is that a disk
> usually fails in the middle of the day during the worst possible
> time. When it does you do NOT want the server to stop or
> crash. You want it to keep running until the evening when you
> can spend a couple hours getting the disk replaced. (or until
> the next day when you can buy a replacement drive)
>
> When you have the replacement disk ready to plug into the
> system, you are supposed to run a full backup of your data
> on the degraded array just in case the reinsertion goes badly.
>
> I have found the safest is to leave the server alone and
> get the replacement disk ready. Wiping it in another system
> with dd if=/dev/zero of=/dev/ad1 bs=50k is the best policy
> before reinsertion.

Thank you very much for these instructions. Luckly I'm not familiar
with failing drives! :)

>
> Follow the steps in the man page for reinsertion. Keep in

What man page?

Again Thank you very much.

Best regards.
Robi