md0 inactive - what do I do?

489 views
Skip to first unread message

Kevin Goffe

unread,
Feb 6, 2012, 2:34:25 PM2/6/12
to Alt-F
Had my box freeze earlier so I did a reboot and now md0 is listed as
'inactive'. Looked at 'RAID maintenance' but that will only allow me
to select 'Destroy'.

System log -

http://pastebin.com/MzMv1sKQ

Thanks.

Kevin Goffe

unread,
Feb 6, 2012, 2:42:19 PM2/6/12
to Alt-F
Tried stopping and starting but that gave me this error -

http://yfrog.com/0zclipboard01rhhj

Joao Cardoso

unread,
Feb 6, 2012, 8:26:47 PM2/6/12
to Alt-F
You should start describing your setup, so we could verify it from the
logs.

From the logs, I understand that you have two internal disks in RAID1
plus one USB disk (setup as a RAID spare? your second message seems to
indicate it, not the logs)

But the logs show nothing wrong:

md/raid1:md1: active with 1 out of 2 mirrors, then
md: recovery of RAID array md1, then
md: md1: recovery done, then
hot: Start fscking md1, then
hot: Finish fscking md1: fsck 1.41.14 (22-Dec-2010) /dev/md1: clean,
then
kernel: EXT4-fs (md1): mounted filesystem

also, you have ffp installed in the RAID, "hot: ffp directory found in
md1"

The only odd thing I noticed is that the log refers to "md1" and not
to "md0", and in your messages you refer to "md0". What used to be
your device name, md0 or md1? Do md0 and/or md1 appear in the Status
or RAID pages? Which one?

Please post the output at the following commands, typed at the command
line:

cat /proc/mdstat
mdadm --detail /dev/md0 # or /dev/md1, depending on the real device
mdadm --examine /dev/sda3 /dev/sdb3

Your second message (that refers to md0) also shows something odd for
a RAID1, it looks like more a RAID5 diagnostic message, and it
certainly does not make sense for RAID1 (RAID1 can start in degraded
mode from one drive, and RAID5 can start degraded from 2 drives). Can
you give some more details about this?


Kevin Goffe

unread,
Feb 7, 2012, 4:26:39 AM2/7/12
to Alt-F
Sorry I should have checked the log in more detail.

I have two internal and one external drives that make up md0 (sda2,
sdb2, sdc2) and they're setup as a RAID5.

Here's the output you requested -

http://pastebin.com/XTPnbKKH

Thanks.

Joao Cardoso

unread,
Feb 7, 2012, 11:34:00 AM2/7/12
to Alt-F


On Feb 7, 9:26 am, Kevin Goffe <hitche...@gmail.com> wrote:
> Sorry I should have checked the log in more detail.
>
> I have two internal and one external drives that make up md0 (sda2,
> sdb2, sdc2) and they're setup as a RAID5.

How I would report this:

-I have RC1 flashed on a rev-B1 board, with two internal and one USB
disk.
-I have RAID1 on the internal drives and RAID5 on the internal and USB
drives.
-The RAID arrays was created using the Disk Wizard, and I changed
anything afterwards.
-Everything has been working fine for xx weeks/month, surviving
several reboots and power-downs.

-Today I noticed that, without changing anything in the box setup,
neither hardware nor software, the RAID5 array was marked as inactive
in the status page.
-The RAID1 array is working fine.
-I tried to start the RAID5 array in the RAID web page, receiving the
error message:
"Starting the md0 array failed: <paste the diagnostic message>

I attach the output of the commands ...

Please consider the above as a didactic example of an proper bug
report that would save time to both of us (mine, mainly), and would
increase the outcome of you getting useful responses.


Now, for your problem, the puzze is the failed and spare devices in
the "madam --examine /dev/sdc2" output:

Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
...

Number Major Minor RaidDevice State
this 3 8 66 3 spare

0 0 8 2 0 active sync
1 1 8 18 1 active sync
2 2 0 0 2 faulty removed
3 3 8 66 3 spare

The Major/Minor columns refers to the disk partitions, 8/2 is sda2,
8/18 is sdb2 and both are fine, 0/0 is a device that was considered
faulty and removed (have you did this?) and 8/66 is sdc2 and is
considered a spare drive for the RAID.

The puzzle is how a device was removed and turned into a spare. Given
that sda2/sdb2 are considered to be OK, you should be able to start
the array in the degraded state and use the spare to rebuild the
array.
This should happens automatically, but as we don't know what happened
in the first place (the faulty removed), you can only try two or three
thinks.

1-The first is to ask for specialized help in the mdadm list (you
should state that you are using mdadm - v3.1.5 and linux kernel
2.6.35.15, and post the same command outputs)

2-From the mdadm manual page:

--run
Attempt to start the array even if fewer drives were
given than were present last
time the array was active. Normally if not all the
expected drives are found and
--scan is not used, then the array will be assembled but
not started. With --run
an attempt will be made to start it anyway.


so you can try the command, after backing-up all important data:

mdadm --assemble --run /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2


and if no error is displayed, you can use

cat /proc/mdstat

to watch what is happening (or just use the Status page).
If this works, you have to make sure that sdc2 is not a spare anymore
(marked as green in the RAID page) and that the array is not in the
degraded state.

3-From the mdadm manual page:

--force
Assemble the array even if the metadata on some devices
appears to be out-of-date.
If mdadm cannot find enough working devices to start
the array, but can find some
devices that are recorded as having failed, then it
will mark those devices as
working so that the array can be started. An array
which requires --force to be
started may contain data corruption. Use it carefully.

so you can try the command, after backing-up all important data:

mdadm --assemble --run --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/
sdc2

and continue as in option 2 above.

Did it work?
If yes, wait until until rebuilding finish (can take tens of hours --
even days, it does not depends on the amount of the data you have
stored on the array, but on the array size), then reboot the box and
see if the solution is permanent -- remember that the power-on
sequence is important when using external USB disks.

Please say, for future reference, what you have done and post the
output of the same mdadm --examine/--detail commands.

PS-It occurred now to me an alternative to all the above: power-down
the box, unplug the USB disk and power-on the box; the RAID5 should
start as degraded and you could then plower-on the USB disk, plug it
to the box and add the partition to the array (or it could be added
and rebuild started automatically). Read the online help in the RAID
page.

Kevin Goffe

unread,
Feb 7, 2012, 12:56:28 PM2/7/12
to Alt-F
Tried the power down/unplugging option first but they were marked
green.

Figured out how to use mdadm and had to use the --force option which
then marked them as clean.

Now in the status page I get -

Dev. Capacity Level State Status Action Done ETA
md0 2793.6 GB raid5 clean degraded recover 0% 2639.9min

Fingers crossed that sorts it out.

Once again many thanks.
Reply all
Reply to author
Forward
0 new messages