Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

LILO with root-raid nightmare - please help.

85 views
Skip to first unread message

drbob

unread,
Jan 15, 2006, 1:10:22 PM1/15/06
to
Hello,

I've replaced a failed disk in my raid 1 setup. I replaced the disk (hda
on-board ide), however the system then hung on reboot, this was because
hda, being blank, had no MBR. The second disk (hde) is plugged into a
promise ide card. So I went into the bios and set it to boot off the
promise card first. Reboot when fine. I partitioned the new disk and
added it to the array.

Then I tried to run lilo. Both disks need a valid mbr if I'm to be able
to boot in the event of a failed disk (the raid-extra-boot line in
lilo.conf). However this is what lilo said:

LILO version 22.6.1 (test mode), Copyright (C) 1992-1998 Werner Almesberger
Development beyond version 21 Copyright (C) 1999-2004 John Coffman
Released 17-Nov-2004, and compiled at 12:32:32 on May 25 2005
Debian GNU/Linux

Warning: Kernel & BIOS return differing head/sector geometries for
device 0x80
Kernel: 13872 cylinders, 16 heads, 63 sectors
BIOS: 1023 cylinders, 255 heads, 63 sectors
Using BIOS device code 0x80 for RAID boot blocks
Reading boot sector from /dev/md1
Fatal: Filesystem would be destroyed by LILO boot sector: /dev/md1

I believe the "Warning" line may be my bios switching the drive
addresses in order to boot off the promise card (is this correct?).
However I can't switch things back until I have a valid mbr on hda. I've
no idea what the "Fatal:" line means. md1 is the boot partition made up
of hda1 and hde1. My lilo.conf is as follows:

lba32
boot=/dev/md1
raid-extra-boot=/dev/hda,/dev/hde
install=/boot/boot-menu.b
map=/boot/map
delay=120
vga=normal
default=RAID

image=/boot/vmlinuz-2.4.24.raidserver1
label=RAID
read-only
root=/dev/md3

Is it safe to just put the disk with a valid mbr (hde) onto the on-board
controller? This would appear to be a simple way around the issue but
I'm not sure if this would break the array in some other way.

Any help solving this would be much appreciated.

regards,

drbob


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Mike Bird

unread,
Jan 15, 2006, 1:30:17 PM1/15/06
to
On Sun, 2006-01-15 at 09:59, drbob wrote:
> I've replaced a failed disk in my raid 1 setup. I replaced the disk (hda
> on-board ide), however the system then hung on reboot, this was because
> hda, being blank, had no MBR. The second disk (hde) is plugged into a
> promise ide card. So I went into the bios and set it to boot off the
> promise card first. Reboot when fine. I partitioned the new disk and
> added it to the array.

I would check that this really worked.

What do you have in "fdisk -l" and "cat /proc/mdstat".

--Mike Bird

drbob

unread,
Jan 15, 2006, 1:50:37 PM1/15/06
to
Mike Bird wrote:
On Sun, 2006-01-15 at 09:59, drbob wrote:
  
I've replaced a failed disk in my raid 1 setup. I replaced the disk (hda 
on-board ide), however the system then hung on reboot, this was because 
hda, being blank, had no MBR. The second disk (hde) is plugged into a 
promise ide card. So I went into the bios and set it to boot off the 
promise card first. Reboot when fine. I partitioned the new disk and 
added it to the array.
    
I would check that this really worked.

What do you have in "fdisk -l" and "cat /proc/mdstat".

--Mike Bird

I'm pretty sure it did. Attached are the outputs from fdisk and mdstat.

regards,

drbob
fdisk -l.txt
mdstat.txt

Mike Bird

unread,
Jan 15, 2006, 5:10:04 PM1/15/06
to
On Sun, 2006-01-15 at 10:40, drbob wrote:
> I'm pretty sure it did. Attached are the outputs from fdisk and
> mdstat.

They look good. I'm beginning to think that either your partition
table is corrupt or it's a non-DOS type that's too large to leave
room for LILO.

I can't find an fdisk option to ask for the partition table type.

You could try "parted /dev/hde print" (etc) to verify that both
drives have msdos partition tables. The partition table type
is listed as e.g. "Disk label type: msdos".

FWIW, I always use "raid-extra-boot=mbr-only" but what you have
should work AFAIK.

drbob

unread,
Jan 15, 2006, 5:50:21 PM1/15/06
to
Mike Bird wrote:

>They look good. I'm beginning to think that either your partition
>table is corrupt or it's a non-DOS type that's too large to leave
>room for LILO.
>
>I can't find an fdisk option to ask for the partition table type.
>
>You could try "parted /dev/hde print" (etc) to verify that both
>drives have msdos partition tables. The partition table type
>is listed as e.g. "Disk label type: msdos".
>

Here is the output from parted:

:~# parted /dev/hde print
Disk geometry for /dev/hde: 0.000-39083.625 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0.031 15.750 primary ext3 boot, raid
2 15.750 494.648 primary linux-swap raid
3 494.648 635.906 primary ext3 raid
4 635.906 39083.625 extended
5 635.937 730.406 logical ext3 raid
6 730.437 4636.898 logical ext3 raid
7 4636.929 8543.390 logical ext3 raid
8 8543.421 39083.625 logical ext3 raid
Information: Don't forget to update /etc/fstab, if necessary.

:~# parted /dev/hda print
Disk geometry for /dev/hda: 0.000-39205.687 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0.031 23.532 primary ext3 boot, raid
2 23.533 509.875 primary linux-swap raid
3 509.875 658.916 primary ext3 raid
4 658.916 39205.502 extended
5 658.947 760.891 logical ext3 raid
6 760.922 4675.166 logical ext3 raid
7 4675.197 8589.440 logical ext3 raid
8 8589.472 39205.502 logical ext3 raid
Information: Don't forget to update /etc/fstab, if necessary.

I'm going to try taking hda out of the array again. I'll move hde to the
on board ide so I don't have to tell the bios to boot off the promise
card. If this boots ok I'll try and add the new drive again. After a
little reading I believe software Raid volumes are identified by their
superblock, right? So it shouldn't matter to the array if a disk that
was previously hde is booted as hda? I partitioned the drives using
cfdisk and they've only been used for linux (the 200gb hdg with a fat 32
partition isn't currently in use, though I did put it in at the same
time as I replaced the failed raid drive).

I haven't changed lilo.conf since I set the array up a year ago so I had
been hoping things would "just work" oh well :-)

I'll post here if the changes I suggest above have any effect.

regards,

drbob

Alvin Oga

unread,
Jan 16, 2006, 12:40:10 AM1/16/06
to

hi ya drbob

On Sun, 15 Jan 2006, drbob wrote:

> I've replaced a failed disk in my raid 1 setup. I replaced the disk (hda
> on-board ide),

what exaactly did you type BEFORE you removed the bad disk ??

raidhotadd, raidhotremove, etc, etc.. is required ( aka good idea )

> So I went into the bios and set it to boot off the
> promise card first.

some bios cannot boot off of additional ide cards ( pci cards )
and mixing with onboard ide might be confusing the system

> LILO version 22.6.1

too old ...

you should be using lilo-22.7.1 for raid stuff if you want a working
bootable raid system

> lba32
> boot=/dev/md1
> raid-extra-boot=/dev/hda,/dev/hde

why do you use "extra-boot" ??
- try commenting that line out

> image=/boot/vmlinuz-2.4.24.raidserver1
> label=RAID
> read-only
> root=/dev/md3

this implies /dev/md1 is your /boot ??
and that your rootfs ( /bin, /etc, /sbin .. ) is in /dev/md3 ??

for more debugging ...
what is the contents of /etc/fstab
what is the contents of /etc/mdadm.conf
what is the output of "cat /proc/mdstat"

> Is it safe to just put the disk with a valid mbr (hde) onto the on-board
> controller?

yes and no .. depends on the mb and bios and your lilo config options

c ya
alvin

drbob

unread,
Jan 16, 2006, 5:20:09 AM1/16/06
to
Alvin Oga wrote:

>what exaactly did you type BEFORE you removed the bad disk ??
>
>raidhotadd, raidhotremove, etc, etc.. is required ( aka good idea )
>
>
>

I use mdadm to manage my array. The command was

mdadm --set-faulty /dev/md1 /dev/hda1 --remove /dev/md1 /dev/hda1

(repeated for the other partitions substituting md2,hda2, md3,hda3 etc;
only some of the partitions had been marked as failed in the old drive I
tried re-adding them but they failed again after a few days so I swapped
the drive)

>some bios cannot boot off of additional ide cards ( pci cards )
>and mixing with onboard ide might be confusing the system
>
>
>

My system wouldn't boot at all with the new drive in place until I went
into the bios and explicitly told it to boot off the drive on the ide
card. The drives attached to the card were listed in the bios setup
screen as boot options so I'm pretty sure my bios is compatible with
booting off raid. Since Windows will only boot off the drive with bios
device code 0x80 I believe that the bios may be switching the codes
around and this is what is confusing lilo. I don't know why this is a
problem though since I thought that since version 22.5 lilo didn't care
about device codes.

>>LILO version 22.6.1
>>
>>
>
>too old ...
>
>you should be using lilo-22.7.1 for raid stuff if you want a working
>bootable raid system
>
>

It's the version that comes with sarge. I set up the array with the LILO
that came with woody, surely upgrading isn't necessary? I'll look for a
backport.

>>lba32
>>boot=/dev/md1
>>raid-extra-boot=/dev/hda,/dev/hde
>>
>>
>
>why do you use "extra-boot" ??
> - try commenting that line out
>
>

AFAIK that line is essential. It tells lilo to write the mbr to both
members of the raid array, so that if one drive dies the system remains
bootable. If I'm wrong about this please let me know. I set up the array
using the instructions at http://alioth.debian.org/projects/rootraiddoc/

>this implies /dev/md1 is your /boot ??
>and that your rootfs ( /bin, /etc, /sbin .. ) is in /dev/md3 ??
>
>

That is correct, however this was my first Linux system and I went a bit
overboard on the partitioning: tmp, usr, var, and home each have their
own partition.

>for more debugging ...
> what is the contents of /etc/fstab
>
>

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump>
<pass>
/dev/md3 / ext3 errors=remount-ro 0 1
/dev/md2 none swap sw 0 0
proc /proc proc defaults 0 0
/dev/fd0 /floppy auto user,noauto 0 0
/dev/cdrom /cdrom iso9660 ro,user,noauto 0 0
/dev/md5 /tmp ext3 defaults 0 2
/dev/md6 /usr ext3 defaults 0 2
/dev/md7 /var ext3 defaults 0 2
/dev/md8 /home ext3 defaults 0 2
/dev/md1 /boot ext3 defaults 0 2

> what is the contents of /etc/mdadm.conf
>
>

:~# cat /etc/mdadm/mdadm.conf
DEVICE /dev/hda* /dev/hde*
ARRAY /dev/md1 level=raid1 disks=2 UUID=f126c8af:1fd56337:a653c52b:3e50c388
devices=/dev/hda1,/dev/hde1
ARRAY /dev/md2 level=raid1 disks=2 UUID=6a91045f:babc1717:3a48db49:e10249c1
devices=/dev/hda2,/dev/hde2
ARRAY /dev/md3 level=raid1 disks=2 UUID=b32dcb66:a802ec78:795cdb4e:3ed450ae
devices=/dev/hda3,/dev/hde3
ARRAY /dev/md5 level=raid1 disks=2 UUID=173c680c:200676a5:29113ce8:5e0c54e9
devices=/dev/hda5,/dev/hde5
ARRAY /dev/md6 level=raid1 disks=2 UUID=73a930b9:674865c7:1c5030f1:55ae62a9
devices=/dev/hda6,/dev/hde6
ARRAY /dev/md7 level=raid1 disks=2 UUID=641c416c:2ffd8890:3cce2a6f:811f694b
devices=/dev/hda7,/dev/hde7
ARRAY /dev/md8 level=raid1 disks=2 UUID=850a410c:1fa77dd9:5234f468:43814e67
devices=/dev/hda8,/dev/hde8

> what is the output of "cat /proc/mdstat"
>
>

:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md1 : active raid1 hde1[1] hda1[0]
16000 blocks [2/2] [UU]

md2 : active raid1 hde2[1] hda2[0]
490304 blocks [2/2] [UU]

md3 : active raid1 hde3[1] hda3[0]
144576 blocks [2/2] [UU]

md5 : active raid1 hde5[1] hda5[0]
96640 blocks [2/2] [UU]

md6 : active raid1 hde6[1] hda6[0]
4000128 blocks [2/2] [UU]

md7 : active raid1 hde7[1] hda7[0]
4000128 blocks [2/2] [UU]

md8 : active raid1 hde8[1] hda8[0]
31273088 blocks [2/2] [UU]


Thanks for any help/advice you can give.

drbob

unread,
Jan 16, 2006, 1:00:40 PM1/16/06
to
drbob wrote:

> I'll post here if the changes I suggest above have any effect.
>
> regards,
>
> drbob


Before I did this I tried a few different settings in lilo.conf and had
some success :-).

Setting raid-extra-boot=mbr-only avoided the error I was seeing previously:

:/etc# lilo -v
LILO version 22.6.1, Copyright (C) 1992-1998 Werner Almesberger


Development beyond version 21 Copyright (C) 1999-2004 John Coffman
Released 17-Nov-2004, and compiled at 12:32:32 on May 25 2005
Debian GNU/Linux

Warning: Kernel & BIOS return differing head/sector geometries for
device 0x80
Kernel: 13872 cylinders, 16 heads, 63 sectors
BIOS: 1023 cylinders, 255 heads, 63 sectors
Using BIOS device code 0x80 for RAID boot blocks
Reading boot sector from /dev/md1

Using MENU secondary loader
Calling map_insert_data

Boot image: /boot/vmlinuz-2.4.24.raidserver1
Added RAID *

Reading boot sector from /dev/hda
Warning: /dev/hda is not on the first disk
/boot/boot.0300 exists - no boot sector backup copy made.
The Master boot record of /dev/hda has been updated.
Reading boot sector from /dev/hde
/boot/boot.2100 exists - no boot sector backup copy made.
The Master boot record of /dev/hde has been updated.

I believe the problem may have stemmed from the fact that I originally
set up the array on woody with a version of lilo that needed to
overwrite the mbr of each of the physical disks, apparently this isn't
the default behavior for raid in newer versions of lilo. I can now boot
from the new disk. I'm still not sure why lilo thought installing an mbr
onto /md1 would wipe the file system, (the mbr-only option prevents it
from attempting this) could it be due to /md1 being quite small, it's
only 15Mb?

Hope this helps anyone who has a similar problem.

Alvin Oga

unread,
Jan 16, 2006, 1:50:05 PM1/16/06
to

hi ya drbob


On Mon, 16 Jan 2006, drbob wrote:

> I use mdadm to manage my array. The command was
>
> mdadm --set-faulty /dev/md1 /dev/hda1 --remove /dev/md1 /dev/hda1

i'd think there might be a short cut version, but i'm being
lazy to go look it up
- i'd set it faulty first, if the system didn't already
flag it
- than remove it after the the set-faulty
--- insert new disk
- than (hot-swap-ADD) the new disk


> My system wouldn't boot at all with the new drive in place until I went
> into the bios and explicitly told it to boot off the drive on the ide
> card.

lilo takes care of writing the mbr to both disks without the extra-boot

> The drives attached to the card were listed in the bios setup
> screen as boot options so I'm pretty sure my bios is compatible with
> booting off raid.

being visible to the bios boot sequence does not guarantee it is bootable

> Since Windows will only boot off the drive with bios
> device code 0x80 I believe that the bios may be switching the codes
> around and this is what is confusing lilo.

lilo has NOTHING to do with windoze and windows cannot change the bios

windoze can boot off of D: or E: or any other disk or partitions, but
its happiest as C:

> It's the version that comes with sarge. I set up the array with the LILO
> that came with woody, surely upgrading isn't necessary? I'll look for a
> backport.

for raid ... lilo-22.7.1 is best to fix boot problems

> >>raid-extra-boot=/dev/hda,/dev/hde


>
> AFAIK that line is essential. It tells lilo to write the mbr to both
> members of the raid array,

see above

> That is correct, however this was my first Linux system and I went a bit
> overboard on the partitioning: tmp, usr, var, and home each have their
> own partition.

nope ... personally, i require those partitions on any system that
i maintain or the customers are billed extra time and $$ for me to
sit and wait for it to fsck the whole 500GB instead of just 256MB
to get into single user mode to fix things

> /dev/md8 /home ext3 defaults 0 2

in the prev posts, one disk had /dev/hda8 listed while /dev/hde did NOT
have /dev/hde8
- i don't know if you inadvertantly deleted partition-8 in
the prev examples

> :~# cat /etc/mdadm/mdadm.conf

looks good

> :~# cat /proc/mdstat

looks good ... all shows as aactive with "UU"

now to fix booting problems ...

- try lilo-22.7.1

- copy all the data on the disk off to another set of disks
and break the normal sarge setup and use a newer lilo
and i'd bet it will boot off either disk

- other test ...
- move the cd or ?? in /dev/hdc and /dev/hdd onto the
pci-based ide controller

- boot into single user with a cdrom and change mdadm.conf to show
/dev/hdc instead of /dev/hde and power off

- move the disk on /dev/hde into /dev/hdc

- reboot and keep your thumbs crossed

c ya
alvin

drbob

unread,
Jan 16, 2006, 6:40:10 PM1/16/06
to
Thanks for the advice, I managed to get the system booting off either
disc without further swapping of the drives around though. I set
"raid-extra-boot=mbr-only" in lilo.conf. This forces lilo to rewrite the
mbr of each disc in the raid1 array and not to the first sector of the
raid1 partition. The default since version 22.0 has been to only write a
boot record to the first sector of the raid1 partition, which for some
reason didn't work for me. I can now boot from either drive and so am
happy. I still don't know why lilo wouldn't write a boot record to my
raid partition, but things seem to work ok like this. Thanks for all
your help.

From the relevant part of the lilo man page:

raid-extra-boot=<option>
This option only has meaning for RAID1 installations.
The <option> may be specified as
none, auto, mbr, mbr-only, or a
comma-separated list of devices; e.g.,
"/dev/hda,/dev/hdc6". Starting with LILO version 22.0,
the boot record is normally written
to the first sector of the RAID1 partition. On PARALLEL
raid sets, no other boot records
are needed. The default action is auto, meaning,
automatically generate auxilary boot
records as needed on SKEWED raid sets. none means
suppress generation of all auxiliary
boot records. mbr-only suppresses generation of a
boot record on the raid device, and
forces compatibility with versions of LILO earlier than
version 22.0 by writing boot
records to all Master Boot Records (MBRs) of all disks
which have partitions in the raid
set. mbr is like mbr-only except the boot record on the
RAID parition is not suppressed.
Use of an explicit list of devices, forces writing of
auxiliary boot records only on those
devices enumerated, in addition to the boot record on the
RAID1 device. Since the version
22 RAID1 codes will never automatically write a boot
record on the MBR of device 0x80, if
such a boot record is desired, this is one way to have it
written. Use of mbr is the other
way to force writing to the MBR of device 0x80.


regards,

drbob

0 new messages