Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problems with Mirrored Rootvg

248 views
Skip to first unread message

Matthew Godman

unread,
Oct 23, 2001, 8:02:26 PM10/23/01
to
I am getting a "555" error code on our B50 when I try to boot from the
second of two mirrored hard drives. These disks are hot-swappable.
All of the O/S LV's have been mirrored, a boot image was created on the
second disk via "bosboot -ad /dev/hdisk1". The quorum was turned off.
The volume group was sync'ed (syncvg -v rootvg). The dump device
was switched to a non-mirrored LV. The bootlist was set to include
hdisk1 (fd0, cd0, hdisk0, hdisk1). They system was rebooted once
cleanly.

To test the mirrored rootvg, I shut down the system, pulled out hdisk0
and
then powered on the system. The LED shows 555 and the screen displays:

Starting software... Please wait.

------------------------------------------------------------
Welcome to AIX
Boot image timestamp 21:46 10/23
etc.
-------------------------------------------------------------

and it stays there. The book says "An ODM error occurred when trying to

vary-on the rootvg...." What ODM error? If I pop in the first disk, it
boots
just fine. The above info implies that it found the boot image on the
disk. Any
ideas? I have to verify that mirrored rootvg drives in AIX can be used
to
recover from disk failures. Thanks in advance.

Matthew Godman
Austin, TX

Randy Cooper

unread,
Oct 23, 2001, 7:33:22 PM10/23/01
to
Matthew,

It may be possible that on boot the single disk left from rootvg is being
detected as hdisk0 rather than hdisk1. If this is so, then the PVID on the
disk would not match that in the ODM (remember the ODM is accessed during
boot).

Just my 2 cents worth,
Randy Cooper

Matthew Godman wrote:

--
Reply to: rwco...@mb.sympatico.ca (mail checked week nights and weekends)

Randy Cooper

unread,
Oct 24, 2001, 9:46:37 AM10/24/01
to
Matthew,

I beleive you need the Disk Replacement process for a mirrored rootvg
as described in the AIX V4 Advanced System Administration Student
Notebook.
Pgs 6-4 to 6-7 are what you need. You may have to modify it slighly to
reduce the number of mirrored copies of rootvg to 1. If you do not
have a copy of this document I can FAX you the appropriate pages.

Randy Cooper.

Randy Cooper <rwco...@mb.sympatico.ca> wrote in message news:<3BD5FE42...@mb.sympatico.ca>...

Matthew Godman

unread,
Oct 24, 2001, 10:43:15 AM10/24/01
to
I would think that the disks retain their ID's much as other devices do when
they are not detected by the system. The merely show up as "Defined." Also,
the second disk remains in the same slot. I don't know what the problem is,
but IBM's method of mirroring disks doesn't appear to work. Thanks.

Matthew.

Matthew Godman

unread,
Oct 24, 2001, 12:16:11 PM10/24/01
to
I don't need to replace a disk. I have two disks in the system,
in rootvg. They are mirrored. They both have boot images
on them. They are both in the bootlist. When hdisk0 fails,
the system should continue to run from the mirrored disk. Or,
if the system crashes, I should be able to boot it from the
second drive. What is the benefit of mirroring a drive if I can't
use it's copy?

Does anyone out there have experience using mirrored disks
during a disk failure? Did the mirror work as planned? If so,
what did you do differently? Thanks.

Matthew.

Hans-Joachim Ehlers

unread,
Oct 24, 2001, 12:25:17 PM10/24/01
to
Hi Matthew,

Matthew Godman wrote:

> I am getting a "555" error code on our B50 when I try to boot from the
> second of two mirrored hard drives. These disks are hot-swappable.
> All of the O/S LV's have been mirrored, a boot image was created on the
> second disk via "bosboot -ad /dev/hdisk1". The quorum was turned off.


Why did you make a bosboot -ad /dev/hdisk1 ? There should be no bosboot
nessarey.

You must /dev/hd5 mirror also !

Normal a mirroring is invoked by:
* add hdisk1 to the rootvg
* make a mirror of all LV - like
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 32 64 2 open/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 1 2 2 open/syncd /
hd2 jfs 35 70 2 open/syncd /usr
hd9var jfs 1 2 2 open/syncd /var
hd3 jfs 3 6 2 open/syncd /tmp
hd1 jfs 1 2 2 open/syncd /home
hd10opt jfs 2 4 2 open/syncd /opt

* do a syncvg
* update bootlist with bootlist -m normal hdisk0 hdisk1 do not forget to
update the service bootlist also.

For your case:
* for the sake of security do a bosboot -ad /dev/hdisk0
* mirror hd5 and check all the other
* set bootlist
* do a syncvg. CAN TAKE SOME TIME. DO NOT PULL ANY DISK AT THAT TIME

Hajo

Matthew Godman

unread,
Oct 24, 2001, 12:52:50 PM10/24/01
to
I did all of that. Thanks, anyway.

kal

unread,
Oct 24, 2001, 2:24:23 PM10/24/01
to
Matthew Godman wrote:

If you call IBM, they are gonna tell you that removing the disk drive is
not a true way to test out your mirroring, I just had this happen to me
a couple of days back.
If you look at /sbin/rc.boot, and LED 555 points to a failure in running
"fsck -fp /dev/hd4", and this was proved right when I booted the machine
in
debug mode. I got the following errors:

Starting device is ipldevice
Starting device's PVID: 00020404be2158440000000000000000
Root VGID: 00020404cdeb21e2
Looking for value=000829068632b65a0000000000000000 AND attribute=pvid
hdisk0 is in boot disk's VGDA
Could not determine ROOTVG physical volumes from VGDA.
Checking all physical volumes to determine those in ROOTVG.
Found 12 AVAILABLE disks
VGID: 0000000000000000
PVID: 00020404be215844
VGID: 0000000000000000
PVID: 000204045f4d256d
VGID: 0000000000000000
PVID: 000204045f4d2f0e
VGID: 0000000000000000
PVID: 00020404d3e3c3a5
VGID: 0000000000000000
PVID: 00020404d3e3dde9
VGID: 0000000000000000
PVID: 00020404d3e3e49c
VGID: 0000000000000000
PVID: 00020404d3e3eb52
VGID: 0000000000000000
PVID: 00020404d3e5ef05
VGID: 0000000000000000
PVID: 00020404d3e608d1
VGID: 0000000000000000
PVID: 00020404d3e60f6f
VGID: 0000000000000000
PVID: 00020404d3e61600
VGID: 0000000000000000
PVID: 00020404a6a923e6
ERROR: unable to determine the physical volumes belonging to ROOTVG.
+ rc=0
+ ln /usr/sbin/mount /etc/umount
+ /usr/lib/methods/showled 0x517
showled + echo rc.boot: executing "fsck -fp /dev/hd4"
+ 1>> /tmp/boot_log
+ fsck -fp /dev/hd4
: /dev/hd4 is not a known file system
+ [ 8 -ne 0 ]
+ loopled 0x555
showled
------------------------------------

the ipl_varyon -v command looks for the disk that has is linked to
/dev/ipldevice and then looks for the PVID from the ODM, in our case since
the
PVID disk not exist on the machine (disk had been removed), it fails
Now if you boot into service and reduce the disk out of rootvg, it will
come back up in normal

hope this helps


Matthew Godman

unread,
Oct 24, 2001, 3:14:40 PM10/24/01
to
Thank you. Apparently you have to bring the system up in single-user mode,
start
a shell, remove hdisk0 from rootvg and then reboot? I've read a lot of IBM
documentation on this subject and none of them state anything like this. To
me,
the whole point of mirroring the O/S is so that when hdisk0 fails, the system
will stay up or at least it will boot from the second drive after the crash.
The
document below outlines "...the supported method for mirroring the rootvg
volume group to provide high availability access of the AIX operating system."

http://service.software.ibm.com/aix.us/go?/pdocs/os/mrv41.html

It also says

WHY MIRROR THE ROOTVG VOLUME GROUP?
By maintaining an active mirrored copy of the rootvg volume on another

disk it ensures continuous operation of the AIX operating system in
the
event that a disk that is part of the operating system experiences
failure.
It provides the ability to boot more than one disk of the rootvg in
the event
that another boot disk has failed. In some cases, the ability to boot
from
alternate disks may require some user intervention

I wish someone at IBM in the LVM/Sys-Recovery group would address this issue
and
set it straight. Anyway, thanks for your help.

Matthew.

kal

unread,
Oct 24, 2001, 3:16:58 PM10/24/01
to
Matthew Godman wrote:

I am pretty sure it is supposed to work as they claim that it should. I (we) may
be over looking something here. I think IBM's assumption is that
the machine stays up and running and someone can log in, remove the mirrors,
reduce the disk out of the VG, rerun bosboot and bootlist

kal

Vicki Walker

unread,
Oct 24, 2001, 3:42:03 PM10/24/01
to
If "quorum is switched off, all VGDAs must be available and consistent
when the volume group is made active", for instance, at boot time.
Switching quorum off will only affect what happens to a volume group
should a physical volume becomes unavailable while the system is
running. (Redbook - Logical Volume Manager A-Z: Introduction and
Concepts
page 24.)


Matthew Godman <mgo...@fundsxpress.com> wrote in message news:<3BD6D383...@fundsxpress.com>...

Joachim Ring

unread,
Oct 24, 2001, 4:07:37 PM10/24/01
to
> Does anyone out there have experience using mirrored disks
> during a disk failure? Did the mirror work as planned? If so,
> what did you do differently? Thanks.

well, when i was last playing around with mirroring on my 4.2.1 home
system, i could switch off one of the disks during runtime with no
hiccups from the system & a reboot was fine too with only one disk.

iirc i used smitty to mirror the rootvg, ran bosboot like you did and
changed the bootlist and didn't worry about unusable dumps with a
mirrored primary paging space and no extra dump dev as i don't have ibm
support to read the dump anyways.
if this (foolproof if it worked for me :-) path doesn't help you, i'd
have a look at the hdisk mixup randy described.
iirc the hdisk should still be defined when removed.

joachim


Hans-Joachim Ehlers

unread,
Oct 25, 2001, 3:58:05 AM10/25/01
to
Hi Matthew,
would you mind to post the output of:
oslevel
lspv
lscfg
lsvg rootvg


You have no other bootable devices in your system except of your
mirrored ones ?
You have disabled quorum ?
If yes you could try the unset the bootlist completly:
bootlist -m normal
or set it to
bootlist -m normal scdisk

Norman Levin

unread,
Nov 11, 2001, 2:04:54 PM11/11/01
to
Hans-Joachim Ehlers wrote:
>
> Hi Matthew,
>
> Matthew Godman wrote:
>
> > I am getting a "555" error code on our B50 when I try to boot from the
> > second of two mirrored hard drives. These disks are hot-swappable.
> > All of the O/S LV's have been mirrored, a boot image was created on the
> > second disk via "bosboot -ad /dev/hdisk1". The quorum was turned off.
>
> Why did you make a bosboot -ad /dev/hdisk1 ? There should be no bosboot
> nessarey.
>
** Yes it is. Mirroring hd5 just means the data in hd5 and its mirror
is identical. However, you MUST get a boot/ipl record now on BOTH of
the real disks. "bosboot" used a routine "mkboot" which does this.
This routine is smart enough to check for a mirrored hd5 and writes
a boot record to the necessary real volumes. BTW - this code was in
mkboot since 3.2.5. Makes you wonder why it took so long for IBM to
support mirrored rootvg.

Norm Levin

Kristian Strickland

unread,
Nov 15, 2001, 9:48:50 AM11/15/01
to
I know I'm several weeks late, but someone might see this...

"Matthew Godman" <mgo...@fundsxpress.com> wrote in message

news:3BD7131F...@fundsxpress.com...


> Thank you. Apparently you have to bring the system up in single-user mode,
> start
> a shell, remove hdisk0 from rootvg and then reboot? I've read a lot of
IBM
> documentation on this subject and none of them state anything like this.
To
> me,
> the whole point of mirroring the O/S is so that when hdisk0 fails, the
system
> will stay up or at least it will boot from the second drive after the
crash.

I think you are still missing the point a little bit. As was already
stated, removing a drive without informing the OS is not a valid test of
your mirroring. There's a difference between a device which is present but
failed, and a device which is not present. If you remember the boot
process, the steps are: check and initialize hardware; load BLV; configure
base devices; process /etc/inittab. In your test case, because hdisk0 is
missing, you don't even get to step 2.
However, during "normal" operation, which is what you seem to be trying
to simulate, when hdisk0 fails, the OS will continue to use the LV copies on
hdisk1. If you reboot before replacing hdisk0, you'll get past step 1, and
step 2 will search (in your case) fd0, cd0, hdisk0, and hdisk1 for BLVs. If
fd0 and cd0 are empty, it won't find a BLV there. If hdisk0 is flaky and
the BLV that you know is there can't be accessed, the boot process will
continue on to find and use the BLV on hdisk1.

Hope this sheds some light.
Best of luck,
--Kristian

Please R.U.N.S.A.F.E.
http://www.jmu.edu/computing/info-security/engineering/runsafe.shtml


0 new messages