Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Debian machine not booting

80 views
Skip to first unread message

James Allsopp

unread,
Jun 18, 2013, 8:30:03 AM6/18/13
to
Hi,
I have a debian machine which was on for a long time (~months). Just moved house and rebooted and now it doesn't boot.

My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM spanning them. Originally there was just one pair, but then I got two new hard drives and added them. I then increased the space of VolGroup-LogVol03 to cover these new drives and increase the space of Home (/ wass on one of the other logical volume groups). This all worked fine for ages.

When I boot all four drives are detected in BIOS and I've check all the connections.

It gets to "3 logical volumes in volume group "VolGroup" now active" which sounds good.
Then "Activating lvm and md swap.. done"
"Checking file sysmtes...fsck from util-linux-ng 2.17.2"
Says
/dev/sde1: clean
/dev/sda1:clean
/dev/mapper/VolGroup-LogVol01: clean
/dev/mapper/VolGroup-LogVol02: clean

Then here's the error:
"fsck.ext4: No such file or directory while trying to open /dev/mapper/VolGroup-LogVol03
/dev/mapper/VolGroup-LogVol03:
The superblock could not be read or does not describe a correct ext2 ........."

NB. All partitions are Ext4, from memory.

It then drops to a maintenance shell. and says to check a log (/var/log/fsck/checkfs) but I don't even have a log directory at this point in the boot process.

I'm wondering if some of the drive id's have been switched.

Apologies for quoting, I'm not using the computer in question.

Any help would be really appreciated. I'm worried I've lost all my data on home
Thanks,
James

Bob Proulx

unread,
Jun 18, 2013, 3:50:01 PM6/18/13
to
James Allsopp wrote:
> I have a debian machine which was on for a long time (~months). Just moved
> house and rebooted and now it doesn't boot.

Bummer.

> My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM
> spanning them. Originally there was just one pair, but then I got two new
> hard drives and added them. I then increased the space of VolGroup-LogVol03
> to cover these new drives and increase the space of Home (/ wass on one of
> the other logical volume groups). This all worked fine for ages.

Sounds fine. Assuming that it booted after those changes.

> When I boot all four drives are detected in BIOS and I've check all the
> connections.

Good.

> It gets to "3 logical volumes in volume group "VolGroup" now active" which
> sounds good.

That does sound good.

> Then here's the error:
> "fsck.ext4: No such file or directory while trying to open
> /dev/mapper/VolGroup-LogVol03
> /dev/mapper/VolGroup-LogVol03:
> The superblock could not be read or does not describe a correct ext2
> ........."

Hmm... I am not familiar with that error. But searching the web
found several stories about it. Most concerned recent changes to the
system that prevented it from booting.

> I have a debian machine which was on for a long time (~months). Just
> moved house and rebooted and now it doesn't boot.
>
> My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM
> spanning them. Originally there was just one pair, but then I got two
> new hard drives and added them. I then increased the space of
> VolGroup-LogVol03 to cover these new drives and increase the space of
> Home (/ wass on one of the other logical volume groups). This all
> worked fine for ages.

And you rebooted in that time period? Otherwise these changes, if not
done completely correct, seem prime to have triggered your current
problem independent of any other action. You say it was on for a long
time. If you had not rebooted in that long time then this may have
been a hang-fire problem for all of that time.

> I'm wondering if some of the drive id's have been switched.

If you mean the drive UUIDs then no those would not have changed.

> Any help would be really appreciated. I'm worried I've lost all my data on
> home

First, do not despair. You should be able to get your system working
again. You are probably simply missing the extra raid pair
configuration.

I strongly recommend using the debian-installer rescue mode to gain
control of your system again. It works well and is readily
available. Use a standard Debian installation disk. Usually we
recommend the netinst disk because it is the smallest image. But any
of the netinst or CD#1 or DVD#1 images will work fine for rescue mode
since it is not actually installing but booting your system at that
point so the difference between them does not matter. You have a
disk? Go fish it out and boot it.

Here is the official documentation for it:

http://www.debian.org/releases/stable/i386/ch08s07.html.en

But that is fairly terse. Let me say that the rescue mode looks just
like the install mode initially. It will ask your keyboard and locale
questions and you might wonder if you are rescuing or installing! But
it will have "Rescue" in the upper left corner so that you can tell
that you are not in install mode and be assured. Get the tool set up
with keyboard, locale, timezone, and similar and eventually it will
give you a menu with a list of actions. Here is a quick run-through.

Advanced options...
Rescue mode
keyboard dialog
...starts networking...
hostname dialog
domainname dialog
...apt update release files...
...loading additional components, Retrieving udebs...
...detecting disks...

Then eventually it will get to a menu "Enter rescue mode" that will
ask what device to use as a root file system. It will list the
partitions that it has automatically detected. If you have used a
RAID then one of the menu entry items near the bottom will be
"Assemble RAID array" and you should assemble the raid at that point.
That will bring up the next dialog menu asking for partitions to
assemble. Select the appropriate for your system. Then continue.
Since you have two RAID configurations I think you will need to do
this twice. Once for each. I believe that you won't be able to use
the automatically select partitions option but not sure. In any case
get both raid arrays up and online at this step before proceeding.

At that point it presents a menu "Execute a shell in /dev/...". That
should get you a shell on your system with the root partition
mounted. It is a /bin/sh shell. I usually at that point start bash
so as to have bash command line recall and editing. Then mount all of
the additional disks.

# /bin/bash
root@hostname:~# mount -a

At that point you have a root superuser shell on the system and can
make system changes. After doing what needs doing you can reboot to
the system. Remove the Debian install media and boot to the normal
system and see if the changes were able to fix the problem.

Now what is your original problem? I think (not sure) you have added
a second raid pair but have not propagated the changes completely
through the boot system.

Basically make sure that mdadm.conf is updated correctly and rebuild
the initramfs to make sure that it includes it.

/etc/mdadm/mdadm.conf

dpkg-reconfigure linux-image-$(uname -r)

Here are some previous messages on this topic.

https://lists.debian.org/debian-user/2013/01/msg00195.html

https://lists.debian.org/debian-user/2013/01/msg00392.html

Start by getting your system booted using rescue mode and then work
through the problems of the raid arrays not being assembled at boot
time. Come back here and report your progress.

Bob
signature.asc

James Allsopp

unread,
Jul 1, 2013, 6:40:02 PM7/1/13
to
Thanks Bob for your e-mail, it was really helpful. I think you've identified the nub of the problem, not updating mdadm.conf and the initramfs. However things are a bit unusual on the  other side. I'm not sure if the rescue disk or myself has screwed something up, but the second raid which has home extended onto it has divided into two raid arrays. Here's a summary,

cat /proc/mdstat:

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md126 : active raid1 sdb3[0] sdc3[1]
      972550912 blocks [2/2] [UU]
     
md127 : active raid1 sdd1[0]
      1953510841 blocks super 1.2 [2/1] [U_]
     
md1 : active raid1 sde1[1]
      1953510841 blocks super 1.2 [2/1] [_U]
     
unused devices: <none>

cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04

# This file was auto-generated on Mon, 11 Jan 2010 22:18:22 +0000
# by mkconf 3.0.3-2

mdadm --detail --scan:

ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04

ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 10 Jun 30 23:25 5e39b4bc-3b24-4df3-978d-1b3d3dca97da -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jun 30 23:25 93a8d1f1-96f2-4169-852a-b37100b3e497 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jun 30 23:25 a5c8d2c0-e454-4288-9987-ea7712242858 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jun 30 23:25 ba9f44ad-d43e-4863-801d-2de96d80ca08 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Jun 30 23:25 ea2afa32-26b3-42af-83a3-57efc3ae3dce -> ../../sdb2

fdisk -l

Disk /dev/sdb: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf229fe3e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          37      297171   83  Linux
/dev/sdb2              38         524     3911827+  82  Linux swap / Solaris
/dev/sdb3             525      121601   972551002+  fd  Linux raid autodetect

Disk /dev/sda: 120.0 GB, 120033041920 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002ae52

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       14593   117218241   83  Linux

Disk /dev/sdc: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00049c5c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1          37      297171   83  Linux
/dev/sdc2              38         524     3911827+  82  Linux swap / Solaris
/dev/sdc3             525      121601   972551002+  fd  Linux raid autodetect

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xe044b9be

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1      243201  1953512001   fd  Linux raid autodetect
Partition 1 does not start on physical sector boundary.

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcfa9d090

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1      243201  1953512001   fd  Linux raid autodetect
Partition 1 does not start on physical sector boundary.

Disk /dev/md1: 2000.4 GB, 2000395101184 bytes
2 heads, 4 sectors/track, 488377710 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Alignment offset: 512 bytes
Disk identifier: 0x00000000


Disk /dev/md127: 2000.4 GB, 2000395101184 bytes
2 heads, 4 sectors/track, 488377710 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Alignment offset: 512 bytes
Disk identifier: 0x00000000


Disk /dev/md126: 995.9 GB, 995892133888 bytes
2 heads, 4 sectors/track, 243137728 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/dm-0: 10.5 GB, 10485760000 bytes
255 heads, 63 sectors/track, 1274 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/dm-1: 36.7 GB, 36700160000 bytes
255 heads, 63 sectors/track, 4461 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/dm-2: 1375.7 GB, 1375731712000 bytes
255 heads, 63 sectors/track, 167256 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Alignment offset: 512 bytes
Disk identifier: 0x00000000


Disk /dev/dm-3: 10.5 GB, 10485760000 bytes
255 heads, 63 sectors/track, 1274 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

------------------------------------------------------

I don't know if this helps or where to go from here, but I think I need to get the mdadm up and running properly before I do anything.
I get some errors running those commands but they didn't get written to the file.

E.g. mdadm --detail --scan
mdadm: cannot open /dev/md/Hawaiian:1: No such file or directory
mdadm: cannot open /dev/md/1: No such file or directory
ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04

If there's any commands you need me to run, please ask,

Thanks,
James

Bob Proulx

unread,
Jul 1, 2013, 7:50:01 PM7/1/13
to
James Allsopp wrote:
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
> md126 : active raid1 sdb3[0] sdc3[1]
> 972550912 blocks [2/2] [UU]

So sdb3 and sdc3 are assembled into /dev/md126. That seems good. One
full array is assembled.

Is /dev/md126 your preferred name for that array? I would guess not.
Usually it is /dev/md0 or some such. But when that name is not
available because it is already in use then mdadm will rotate up to a
later name like /dev/md126.

You can fix this by using mdadm with --update=super-minor to force it
back to the desired name. Something like this using your devices:

mdadm --assemble /dev/md0 --update=super-minor /dev/sdb3 /dev/sdc3

But that can only be done at assembly time. If it is already
assembled then you would need to stop the array first and then
assemble it again.

> md127 : active raid1 sdd1[0]
> 1953510841 blocks super 1.2 [2/1] [U_]
>
> md1 : active raid1 sde1[1]
> 1953510841 blocks super 1.2 [2/1] [_U]

I think this array is now has a split brain problem. At this point
the original single mirrored array has had both halves of the mirror
assembled and both are running. So now you have two clones of each
other and both are active. Meaning that each think they are newer
than the other. Is that right? In which case you will eventually
need to pick one and call it the master. I think the sde1 is the
natural master since it is assembled on /dev/md1.

> cat /etc/mdadm/mdadm.conf
> ...
> # definitions of existing MD arrays
> ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04

Only one array specified. That is definitely part of your problem.
You should have at least two arrays specified there.

> mdadm --detail --scan:
>
> ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04

That mdadm --scan only found one array is odd.

> fdisk -l
>
> Disk /dev/sda: 120.0 GB, 120033041920 bytes
> 255 heads, 63 sectors/track, 14593 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0002ae52
>
> Device Boot Start End Blocks Id System
> /dev/sda1 1 14593 117218241 83 Linux

I take it that this is your boot disk? Your boot disk is not RAID?

I don't like that the first used sector is 1. That would have been 63
using the previous debian-installer to leave space for the MBR and
other things. But that is a different issue.

> Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
^^^^ ^^^^

That is an Advanced Format 4k sector drive. Meaning that the
partitions should start on a 4k sector alignment. The
debian-installer would do this automatically.

> Disk identifier: 0xe044b9be
>
> Device Boot Start End Blocks Id System
> /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect
^^^^^
> /dev/sde1 1 243201 1953512001 fd Linux raid autodetect
^^^^^
> Partition 1 does not start on physical sector boundary.


I don't recall if the first sector is 0 or 1 but I think the first
sector is 0 for the partition table. Meaning that sector 1 is not
going to be 4k aligned. (Can someone double check me on this?)
Meaning that this will require a lot of read-modify-write causing
performance problems for those drives.

The new standard for sector alignment would start at 2048 to leave
space for the partition table and other things and still be aligned
properly.

> I don't know if this helps or where to go from here, but I think I need to
> get the mdadm up and running properly before I do anything.

Probably a good idea.

> If there's any commands you need me to run, please ask,

How are you booted now? Are you root on the system through something
like the debian-installer rescue boot? Or did you use a live cd or
something?

Please run:

# mdadm --detail /dev/sdd1
# mdadm --detail /dev/sde1

Those are what look to be the split brain of the second array. They
will list something at the bottom that will look like:

Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1

0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1

Except in your case each will list one drive and will probably have
the other drive listed as removed. But importantly it will list the
UUID of the array in the listing.

Magic : a914bfec
Version : 0.90.00
UUID : b8eb34b1:bcd37664:2d9e4c59:117ab348
Creation Time : Fri Apr 30 17:21:12 2010
Raid Level : raid1
Used Dev Size : 497856 (486.27 MiB 509.80 MB)
Array Size : 497856 (486.27 MiB 509.80 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0

Check each physical volume and verify that the UUID and other stats
verify that the same array has been forked and is running on both.
The data in that header should be the same for both halves of the
cloned and split mirror.

Corrective Action:

I _think_ you should stop the array on /dev/md127. Then add that disk
to the array running on /dev/md1. Don't do this until you have
confirmated that the two drives are clones of each other. If they are
split then you need to join them. I think something like this:

mdadm --stop /dev/md127
mdadm --manage /dev/md1 --add /dev/sdd1

Be sure to double check all of my device nodes and agree with those
before you do these commands. But I think those are what you want to
do. That will basically destroy anything what is currently sdd1 and
sync sde1 upon sdd1.

At that point you should have both arrays running. You could stop
there and live with /dev/md126 but I think you want to fix the device
minor numbering on /dev/md126 by stopping the array and assembling it
again with the correct name.

mdadm --stop /dev/md126
mdadm --assemble /dev/md0 --update=super-minor /dev/sdb1 /dev/sdc1

At that point you should have two arrays up and running on /dev/md0
and /dev/md1 and both should have the low level lvm physical volumes
needed to assemble the lvm volume groups. Run the --scan again.

mdadm --detail --scan

Any errors at this time? Hopefully it will list two arrays. If not
then something is still wrong. Here are some additional commands to
get the same information anyway.

mdadm --detail /dev/md0
mdadm --detail /dev/md1

mdadm --examine /dev/sdb3
mdadm --examine /dev/sdc3

mdadm --examine /dev/sdd1
mdadm --examine /dev/sde1

If that turns out favorable then edit the /etc/mdadm/mdadm.conf file
and update the list of ARRAY lines there. I don't have the UUID
numbers from your system so can't suggest anything. But the above
will list out the UUID numbers for the arrays. Use them to update the
mdadm.conf file.

Then after updating that file update the initramfs. I usually
recommend using dpkg-reconfigure of the current kernel package. But
using 'update-initramfs -u' if you want is okay too. The important
concept is that the initrd needs to be rebuilt including the new
arrays as listed in mdadm.conf so that the arrays are assembled at
initramfs time.

dpkg-reconfigure linux-image-$(uname -r)

At this point if everything worked then you should be good to go. I
would cross your fingers and reboot. If all is good then it should
reboot okay.

Just as additional debug, after having both arrays up and online then
you can activate the lvm manually. I would probably try letting the
system reboot first. But just as low-level commands to further debug
things as hints of where to look next in case they might be needed.

modprobe dm-mod
vgscan
vgchange -aly

That should activate the LVM. You should have devices in
/dev/mapper/* corresponding to them. You should be able to see a
listing of the logical volumes on the system.

lvs

Good luck!
Bob
signature.asc

James Allsopp

unread,
Jul 2, 2013, 5:00:02 AM7/2/13
to
Thanks Bob, I'll get back to after I've followed your instructions. I think I'm going to have to learn to type with crossed fingers!

I think I initially sorted out all my partitions manually, rather than directly using the installer to do it automatically,
Really appreciated,
James

James Allsopp

unread,
Jul 2, 2013, 9:00:02 AM7/2/13
to
One other point sda isn't the boot hard drive, that's the partitions /sdb1 and sdc1, but these should be the same (I thought I'd mirrored them to be honest).

I tried mdadm --detail /dev/sdd1 but it didn't work. I have these results if they help?
/dev/md1:
        Version : 1.2
  Creation Time : Thu Jan 31 22:43:49 2013
     Raid Level : raid1
     Array Size : 1953510841 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Tue Jul  2 13:49:55 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : Hawaiian:1  (local to host Hawaiian)
           UUID : a544829f:33778728:79870439:241c5c51
         Events : 112


    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       65        1      active sync   /dev/sde1
/dev/md127:
        Version : 1.2
  Creation Time : Thu Jan 31 22:43:49 2013
     Raid Level : raid1
     Array Size : 1953510841 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Tue Jul  2 13:49:29 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : Hawaiian:1  (local to host Hawaiian)
           UUID : a544829f:33778728:79870439:241c5c51
         Events : 106


    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       0        0        1      removed

How should I proceed from here?
James

James Allsopp

unread,
Jul 2, 2013, 9:10:02 AM7/2/13
to
For ruther information:
/dev/sdb3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : a529cd1b:c055887e:bfe78010:bc810f04
  Creation Time : Fri Nov 20 09:37:34 2009
     Raid Level : raid1
  Used Dev Size : 972550912 (927.50 GiB 995.89 GB)
     Array Size : 972550912 (927.50 GiB 995.89 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Tue Jul  2 13:49:18 2013
          State : clean
 Active Devices : 2
Working Devices : 2

 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6203fa40 - correct
         Events : 1036616



      Number   Major   Minor   RaidDevice State
this     0       8       19        0      active sync   /dev/sdb3

   0     0       8       19        0      active sync   /dev/sdb3
   1     1       8       35        1      active sync   /dev/sdc3
/dev/sdc3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : a529cd1b:c055887e:bfe78010:bc810f04
  Creation Time : Fri Nov 20 09:37:34 2009
     Raid Level : raid1
  Used Dev Size : 972550912 (927.50 GiB 995.89 GB)
     Array Size : 972550912 (927.50 GiB 995.89 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Tue Jul  2 13:49:18 2013
          State : clean
 Active Devices : 2
Working Devices : 2

 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6203fa52 - correct
         Events : 1036616



      Number   Major   Minor   RaidDevice State
this     1       8       35        1      active sync   /dev/sdc3

   0     0       8       19        0      active sync   /dev/sdb3
   1     1       8       35        1      active sync   /dev/sdc3
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a544829f:33778728:79870439:241c5c51

           Name : Hawaiian:1  (local to host Hawaiian)
  Creation Time : Thu Jan 31 22:43:49 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB)
     Array Size : 3907021682 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 3907021682 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 1e0de6be:bbcc874e:e00e2caa:593de9b1

    Update Time : Tue Jul  2 13:51:19 2013
       Checksum : a8cf720f - correct
         Events : 108


   Device Role : Active device 0
   Array State : A. ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a544829f:33778728:79870439:241c5c51

           Name : Hawaiian:1  (local to host Hawaiian)
  Creation Time : Thu Jan 31 22:43:49 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB)
     Array Size : 3907021682 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 3907021682 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 926788c3:9dfbf62b:26934208:5a72d05d

    Update Time : Tue Jul  2 13:51:05 2013
       Checksum : 94e2b4a1 - correct
         Events : 114


   Device Role : Active device 1
   Array State : .A ('A' == active, '.' == missing)


Thanks
James

Bob Proulx

unread,
Jul 2, 2013, 5:50:03 PM7/2/13
to
James Allsopp wrote:
> One other point sda isn't the boot hard drive, that's the partitions /sdb1
> and sdc1, but these should be the same (I thought I'd mirrored them to be
> honest).

I don't see sda anywhere. It might be a dual booting Windows disk?
Or other. But the BIOS will boot the first disk from the BIOS boot
order. BIOS boot order may be different from OS disk order. It can
be confusing. I might assume that BIOS sata0 is the same as the OS
disk sda but actually it often is different. Let's ignore this for
now.

You have sdb1 and sdc1 mirrored into md1. I can see that because the
UUID is identical.

> /dev/md1:
> Version : 1.2
> Creation Time : Thu Jan 31 22:43:49 2013
> Raid Level : raid1
> Array Size : 1953510841 (1863.01 GiB 2000.40 GB)
> Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB)

> Raid Devices : 2
> Total Devices : 1
> Persistence : Superblock is persistent
>
> Update Time : Tue Jul 2 13:49:55 2013
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 0
> Spare Devices : 0
>
> Name : Hawaiian:1 (local to host Hawaiian)
> UUID : a544829f:33778728:79870439:241c5c51
> Events : 112
>
>
> Number Major Minor RaidDevice State
> 0 0 0 0 removed
> 1 8 65 1 active sync /dev/sde1

That info is the same as:

> /dev/md127:
> Version : 1.2
> Creation Time : Thu Jan 31 22:43:49 2013
> Raid Level : raid1
> Array Size : 1953510841 (1863.01 GiB 2000.40 GB)
> Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB)
> Raid Devices : 2
> Total Devices : 1
> Persistence : Superblock is persistent
>
> Update Time : Tue Jul 2 13:49:29 2013
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 0
> Spare Devices : 0
>
> Name : Hawaiian:1 (local to host Hawaiian)
> UUID : a544829f:33778728:79870439:241c5c51
> Events : 106
>
>
> Number Major Minor RaidDevice State
> 0 8 49 0 active sync /dev/sdd1
> 1 0 0 1 removed

The UUIDs are identical. Therefore those two disks are mirrors of
each other. And note:

> /dev/md1: (/dev/sde1)
> Update Time : Tue Jul 2 13:49:55 2013
> /dev/md127: (/dev/sdd1)
> Update Time : Tue Jul 2 13:49:29 2013

sde1 is newer than sdd1. This seems consistent with it being the best
copy to keep. If it were the other way around I would think about
using the other one. But selecting the right master is important
since it is a component of the lvm.

> How should I proceed from here?

I would proceed as previously suggested. I would do this:

mdadm --stop /dev/md127
mdadm --manage /dev/md1 --add /dev/sdd1
watch cat /proc/mdstat

That will discard the older stale copy of the mirror on sdd1. It will
use sdd1 as a mirror of sde1. After doing the add the mirror will
sync and you can watch the progress using 'watch cat /proc/mdstat'.
Use control-c to interrupt it when you want to stop it.

> For ruther information:
> /dev/sdb3:
> Preferred Minor : 126
> ...
> /dev/sdc3:
> Preferred Minor : 126
> ...

That further information looked _okay_ to me. But I would still
change the md126 back to md0.

mdadm --stop /dev/md126
mdadm --assemble /dev/md0 --update=super-minor /dev/sdb1 /dev/sdc1
cat /proc/mdstat

Since it is clean now it will be stopped cleanly and reassembled
cleanly and no sync will be needed. The --update=super-minor will
reset the superblock with the updated md0 minor device number.

Then update /etc/mdadm/mdadm.conf and rebuild the initrd.

Bob
signature.asc

James Allsopp

unread,
Jul 3, 2013, 9:00:01 AM7/3/13
to
Thanks Bob, really can't thank you enough. Just to be clear about this, I'd do these commands from the rescue disk after I have assembled the arrays and gone to the bash shell?

Much appreciated,
James

Bob Proulx

unread,
Jul 3, 2013, 7:50:01 PM7/3/13
to
James Allsopp wrote:
> Thanks Bob, really can't thank you enough. Just to be clear about this, I'd
> do these commands from the rescue disk after I have assembled the arrays
> and gone to the bash shell?

Short answer: Yes. Go for it!

Longer answer: There are all kinds of things I want to say here. And
I already said a lot! There are some optimizations that could be
made. But if you do what is outlined it should work. But I don't
want to make things more confusing by talking about minor things.

I have my fingers crossed for you! :-)

Bob
signature.asc

James Allsopp

unread,
Jul 4, 2013, 2:10:03 PM7/4/13
to
Thanks Bob, like I say, very much appreciated and I'll let you know how it goes!
I'd like to hear about the optimisations, but I think I'll wait till I get the system rebuilt!
James

Bob Proulx

unread,
Jul 4, 2013, 8:20:03 PM7/4/13
to
James Allsopp wrote:
> I'd like to hear about the optimisations, but I think I'll wait till I get
> the system rebuilt!

Basically I had expected you to use either rescue mode of the d-i or a
livecd or other to assemble the arrays. You did. But neither array
came up completely correct. One came up with one disk degraded. The
split brain clone came up on md127 instead of md0. The other one came
up on md126. So you should fix those using the discussed
instructions. I was thinking you would do that from the same system
boot that you had posted that information from.

But your recent mail implies that you shut the system down and went
away for a while. So now it appears you need to "rescue" the system
again by the same method you used to get that information you posted.

All of that is fine. Except now we already know the information you
posted. And so now we know how those arrays are supposed to go
together. But that is okay. You can go through rescue mode and
assemble the arrays exactly as you did before. And then --stop the
arrays and assemble them correctly.

But since we know how they are supposed to be assembled now you could
skip the assembly of them in rescue mode or livecd mode or whatever
you used and simply assemble the arrays correctly the first time.
Basically I think you are going to do:

* rescue
* assemble arrays
* stop arrays
* assemble arrays correctly

Which is perfectly acceptable. The result will be fine. But now that
we know what we need to do you could simply do this:

* rescue
* assemble arrays correctly

But I don't want to distract you with complications like this! :-)

And then after you get everything working you should visit the
partitioning on that second array. Your partitioning starts at the
sector 1. But that won't be aligned. It will cause all writes to be
a read-modify-write and performance will suffer.

> Device Boot Start End Blocks Id System
> /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect
> Partition 1 does not start on physical sector boundary.
> Device Boot Start End Blocks Id System
> /dev/sde1 1 243201 1953512001 fd Linux raid autodetect
> Partition 1 does not start on physical sector boundary.

Instead of using the entire disk starting at 1 it would be much better
if you started at sector 2048 as is the new standard for Advanced
Format 4k sector drives. I would expect that to be a large
performance lever on your system. But fix that after you have your
data up and available.

Bob
signature.asc

James Allsopp

unread,
Jul 7, 2013, 5:00:02 PM7/7/13
to
Hello,
I've been too nervous to reboot, so I've left it in the rescue mode at the point where I assembled the raid arrays and went into boot at the \ partition.
Tried to run:
  mdadm --stop /dev/md127
but got a mdadm: failed to stop array /dev/md127: Device or resource busy.
Perhaps a running process, mounted filesustem or active volume group?

I tried unmounting /home which stretches onto this disk via LVM, but this made no difference. Any idea how I should proceed?
Thanks,
James 

James Allsopp

unread,
Jul 10, 2013, 5:40:02 PM7/10/13
to
Hi,
Got the problems solved. I couldn't solve the problem by using the rescue disk, as it wouldn't let me stop the raid array.
What I did was drop into the maintenance mode:
mdadm --assemble /dev/md1 /dev/sdd1 /dev/sde1   //This should recreate the array
mdadm --detail --scan >> /etc/mdadm/mdadm.conf

//edit the file to remove duplicates
dpkg-reconfigure linux-image-$(uname -r)

then reboot.

Huge amount of thanks go to Bob Proulx for all the help along the way,
James

0 new messages