To test the mirrored rootvg, I shut down the system, pulled out hdisk0
and
then powered on the system. The LED shows 555 and the screen displays:
Starting software... Please wait.
------------------------------------------------------------
Welcome to AIX
Boot image timestamp 21:46 10/23
etc.
-------------------------------------------------------------
and it stays there. The book says "An ODM error occurred when trying to
vary-on the rootvg...." What ODM error? If I pop in the first disk, it
boots
just fine. The above info implies that it found the boot image on the
disk. Any
ideas? I have to verify that mirrored rootvg drives in AIX can be used
to
recover from disk failures. Thanks in advance.
Matthew Godman
Austin, TX
It may be possible that on boot the single disk left from rootvg is being
detected as hdisk0 rather than hdisk1. If this is so, then the PVID on the
disk would not match that in the ODM (remember the ODM is accessed during
boot).
Just my 2 cents worth,
Randy Cooper
Matthew Godman wrote:
--
Reply to: rwco...@mb.sympatico.ca (mail checked week nights and weekends)
I beleive you need the Disk Replacement process for a mirrored rootvg
as described in the AIX V4 Advanced System Administration Student
Notebook.
Pgs 6-4 to 6-7 are what you need. You may have to modify it slighly to
reduce the number of mirrored copies of rootvg to 1. If you do not
have a copy of this document I can FAX you the appropriate pages.
Randy Cooper.
Randy Cooper <rwco...@mb.sympatico.ca> wrote in message news:<3BD5FE42...@mb.sympatico.ca>...
Matthew.
Does anyone out there have experience using mirrored disks
during a disk failure? Did the mirror work as planned? If so,
what did you do differently? Thanks.
Matthew.
Matthew Godman wrote:
> I am getting a "555" error code on our B50 when I try to boot from the
> second of two mirrored hard drives. These disks are hot-swappable.
> All of the O/S LV's have been mirrored, a boot image was created on the
> second disk via "bosboot -ad /dev/hdisk1". The quorum was turned off.
Why did you make a bosboot -ad /dev/hdisk1 ? There should be no bosboot
nessarey.
You must /dev/hd5 mirror also !
Normal a mirroring is invoked by:
* add hdisk1 to the rootvg
* make a mirror of all LV - like
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 32 64 2 open/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 1 2 2 open/syncd /
hd2 jfs 35 70 2 open/syncd /usr
hd9var jfs 1 2 2 open/syncd /var
hd3 jfs 3 6 2 open/syncd /tmp
hd1 jfs 1 2 2 open/syncd /home
hd10opt jfs 2 4 2 open/syncd /opt
* do a syncvg
* update bootlist with bootlist -m normal hdisk0 hdisk1 do not forget to
update the service bootlist also.
For your case:
* for the sake of security do a bosboot -ad /dev/hdisk0
* mirror hd5 and check all the other
* set bootlist
* do a syncvg. CAN TAKE SOME TIME. DO NOT PULL ANY DISK AT THAT TIME
Hajo
If you call IBM, they are gonna tell you that removing the disk drive is
not a true way to test out your mirroring, I just had this happen to me
a couple of days back.
If you look at /sbin/rc.boot, and LED 555 points to a failure in running
"fsck -fp /dev/hd4", and this was proved right when I booted the machine
in
debug mode. I got the following errors:
Starting device is ipldevice
Starting device's PVID: 00020404be2158440000000000000000
Root VGID: 00020404cdeb21e2
Looking for value=000829068632b65a0000000000000000 AND attribute=pvid
hdisk0 is in boot disk's VGDA
Could not determine ROOTVG physical volumes from VGDA.
Checking all physical volumes to determine those in ROOTVG.
Found 12 AVAILABLE disks
VGID: 0000000000000000
PVID: 00020404be215844
VGID: 0000000000000000
PVID: 000204045f4d256d
VGID: 0000000000000000
PVID: 000204045f4d2f0e
VGID: 0000000000000000
PVID: 00020404d3e3c3a5
VGID: 0000000000000000
PVID: 00020404d3e3dde9
VGID: 0000000000000000
PVID: 00020404d3e3e49c
VGID: 0000000000000000
PVID: 00020404d3e3eb52
VGID: 0000000000000000
PVID: 00020404d3e5ef05
VGID: 0000000000000000
PVID: 00020404d3e608d1
VGID: 0000000000000000
PVID: 00020404d3e60f6f
VGID: 0000000000000000
PVID: 00020404d3e61600
VGID: 0000000000000000
PVID: 00020404a6a923e6
ERROR: unable to determine the physical volumes belonging to ROOTVG.
+ rc=0
+ ln /usr/sbin/mount /etc/umount
+ /usr/lib/methods/showled 0x517
showled + echo rc.boot: executing "fsck -fp /dev/hd4"
+ 1>> /tmp/boot_log
+ fsck -fp /dev/hd4
: /dev/hd4 is not a known file system
+ [ 8 -ne 0 ]
+ loopled 0x555
showled
------------------------------------
the ipl_varyon -v command looks for the disk that has is linked to
/dev/ipldevice and then looks for the PVID from the ODM, in our case since
the
PVID disk not exist on the machine (disk had been removed), it fails
Now if you boot into service and reduce the disk out of rootvg, it will
come back up in normal
hope this helps
http://service.software.ibm.com/aix.us/go?/pdocs/os/mrv41.html
It also says
WHY MIRROR THE ROOTVG VOLUME GROUP?
By maintaining an active mirrored copy of the rootvg volume on another
disk it ensures continuous operation of the AIX operating system in
the
event that a disk that is part of the operating system experiences
failure.
It provides the ability to boot more than one disk of the rootvg in
the event
that another boot disk has failed. In some cases, the ability to boot
from
alternate disks may require some user intervention
I wish someone at IBM in the LVM/Sys-Recovery group would address this issue
and
set it straight. Anyway, thanks for your help.
Matthew.
I am pretty sure it is supposed to work as they claim that it should. I (we) may
be over looking something here. I think IBM's assumption is that
the machine stays up and running and someone can log in, remove the mirrors,
reduce the disk out of the VG, rerun bosboot and bootlist
kal
Matthew Godman <mgo...@fundsxpress.com> wrote in message news:<3BD6D383...@fundsxpress.com>...
well, when i was last playing around with mirroring on my 4.2.1 home
system, i could switch off one of the disks during runtime with no
hiccups from the system & a reboot was fine too with only one disk.
iirc i used smitty to mirror the rootvg, ran bosboot like you did and
changed the bootlist and didn't worry about unusable dumps with a
mirrored primary paging space and no extra dump dev as i don't have ibm
support to read the dump anyways.
if this (foolproof if it worked for me :-) path doesn't help you, i'd
have a look at the hdisk mixup randy described.
iirc the hdisk should still be defined when removed.
joachim
You have no other bootable devices in your system except of your
mirrored ones ?
You have disabled quorum ?
If yes you could try the unset the bootlist completly:
bootlist -m normal
or set it to
bootlist -m normal scdisk
Norm Levin
"Matthew Godman" <mgo...@fundsxpress.com> wrote in message
news:3BD7131F...@fundsxpress.com...
> Thank you. Apparently you have to bring the system up in single-user mode,
> start
> a shell, remove hdisk0 from rootvg and then reboot? I've read a lot of
IBM
> documentation on this subject and none of them state anything like this.
To
> me,
> the whole point of mirroring the O/S is so that when hdisk0 fails, the
system
> will stay up or at least it will boot from the second drive after the
crash.
I think you are still missing the point a little bit. As was already
stated, removing a drive without informing the OS is not a valid test of
your mirroring. There's a difference between a device which is present but
failed, and a device which is not present. If you remember the boot
process, the steps are: check and initialize hardware; load BLV; configure
base devices; process /etc/inittab. In your test case, because hdisk0 is
missing, you don't even get to step 2.
However, during "normal" operation, which is what you seem to be trying
to simulate, when hdisk0 fails, the OS will continue to use the LV copies on
hdisk1. If you reboot before replacing hdisk0, you'll get past step 1, and
step 2 will search (in your case) fd0, cd0, hdisk0, and hdisk1 for BLVs. If
fd0 and cd0 are empty, it won't find a BLV there. If hdisk0 is flaky and
the BLV that you know is there can't be accessed, the boot process will
continue on to find and use the BLV on hdisk1.
Hope this sheds some light.
Best of luck,
--Kristian
Please R.U.N.S.A.F.E.
http://www.jmu.edu/computing/info-security/engineering/runsafe.shtml