I've inherited an old AIX 4.3 system with 1GB drives in it. Yesterday
I noticed some stale LVs. I googled the issue and ran some commands I
found in other results. Here is what the system looked like before I
started:
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd5 boot 2 4 2 closed/syncd N/A
hd6 paging 128 256 2 open/stale N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 2 4 2 open/stale /
hd2 jfs 146 292 2 open/stale /usr
hd9var jfs 1 2 2 open/stale /var
hd3 jfs 15 30 3 open/stale /tmp
hd1 jfs 1 2 2 open/syncd /home
lv01 jfs 1 1 1 closed/syncd N/A
lv00 jfs 5 10 1 open/syncd /local
lv02 jfs 360 360 2 open/syncd #
After running 'syncvg -v rootvg':
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd5 boot 2 4 2 closed/syncd N/A
hd6 paging 128 256 2 open/stale N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 2 4 2 open/syncd /
hd2 jfs 146 292 2 open/syncd /usr
hd9var jfs 1 2 2 open/syncd /var
hd3 jfs 15 30 3 open/stale /tmp
hd1 jfs 1 2 2 open/syncd /home
lv01 jfs 1 1 1 closed/syncd N/A
lv00 jfs 5 10 1 open/syncd /local
lv02 jfs 360 360 2 open/syncd #
Now only hd3 (/tmp) and paging space are stale
# syncvg -l hd6
0516-068 lresynclv: Unable to completely resynchronize volume. Run
diagnostics if neccessary.
0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume hd6.
I see a missing device when I run this command:
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE
DISTRIBUTION
hdisk7 active 248 92
06..48..00..00..38
0516-304 lsvg: Unable to find device id 000048054d75e6b0 in the Device
Configuration Database.
000048054d75e6b0 missing 248 105
50..50..00..00..05
hdisk12 active 248 73
50..22..00..00..01
hdisk13 active 248 7
00..00..00..00..07
hdisk9 active 248 0
00..00..00..00..00
But this command shows all disks OK:
# lspv
hdisk7 00004805507bbe5d rootvg
hdisk9 0000145383ffebcf rootvg
hdisk12 0000145384007ccf rootvg
hdisk13 000014538402b614 rootvg
In more detail:
# lspv -l hdisk7
hdisk7:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
hd5 2 2 02..00..00..00..00 N/A
hd2 146 146 41..01..43..49..12 /usr
hd4 2 2 01..00..01..00..00 /
hd1 1 1 00..01..00..00..00 /home
hd8 1 1 00..00..01..00..00 N/A
hd9var 1 1 00..00..01..00..00 /var
hd3 3 3 00..00..03..00..00 /tmp
# lspv -l hdisk9
hdisk9:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
lv02 247 247 50..49..49..49..50 #
lv01 1 1 00..01..00..00..00 N/A
# lspv -l hdisk12
hdisk12:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
lv00 5 10 00..10..00..00..00 /local
hd2 146 146 00..01..47..49..49 /usr
hd5 2 2 00..02..00..00..00 N/A
hd8 1 1 00..01..00..00..00 N/A
hd1 1 1 00..01..00..00..00 /home
hd3 12 12 00..12..00..00..00 /tmp
hd9var 1 1 00..01..00..00..00 /var
hd4 2 2 00..00..02..00..00 /
# lspv -l hdisk13
hdisk13:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
lv02 113 113 50..05..00..15..43 #
hd6 128 128 00..45..49..34..00 N/A
So, 2 questions:
#1 How do I get the stale LVs to read open/synced
#2 How do I get the missing 000048054d75e6b0 out of the ODM?
without too onerous a lecture, this is why documentation and regular
backups are so valuable.
it looks like you have had a disk failure.
the AIX error log will hopefully tell you what happened, and when. try
the "errpt" command. Also look through the syslog (check /etc/
syslog.conf for setup)
run
lsdev -Cc disk
lscfg -vl hdisk\*
lslv -l lvname (hd6 etc; will probably fail)
hope you have some good backups :)
when was the last mksysb or other backups ?
seriously, you will probably be able to remove the half of the mirrors
that are on the problem PV, replace the failing disk (if that's the
problem) and remirror.
I had problems with AIX 4.3 and failing disks years back.
I wouldn't suggest removing ODM entries, it's very easy to corrupt the
VGDA on the PV's as it synchronises itself with the ODM and then
you'll really be in the do-do's.
Thanks for the info. We do have current backups but no mksysb for
this system. errpt talks about hdisk12 but when I run diags on
hdisk12 (as well as the others) it comes back clean. Also lspv shows
hdisk12 as available.
How do I determine what device is associated with 000048054d75e6b0
from the following:
0516-304 lsvg: Unable to find device id 000048054d75e6b0 in the
Device Configuration Database
Is that a disk? If so which hdisk#? All I can tell is that 'it' is
missing but I do not know that 'it' is
Thanks again
odmget -q "attribute LIKE pvid AND value LIKE 000048054d75e6b0*" CuAt
if there is nothing in the ODM, perhaps this command can solve your
problem :
reducevg rootvg 000048054d75e6b0
Pierre-yves
I ran the odmget command and got nothing in return. So I ran the
reducevg command and got:
0516-016 ldeletepv: Cannot delete physical volume with allocated
partitions. Use either migratepv to move the
partitions or
reducevg with the -d option to delete the
partitions.
0516-884 reducevg: Unable to remove physical volume
000048054d75e6b0.
Not sure I want to use the 'reducevg -d' command since I'm still not
sure which hdisk 000048054d75e6b0 is.
Thanks for the commands though
# rmlvcopy hd3 1 000048054d75e6b0
# rmlvcopy hd6 1 000048054d75e6b0
You can also list any LVs that have partitions on that mystery drive
the same way:
# lspv -l 000048054d75e6b0
If you can list the LVs that it thinks are still on that PV, simply
rmlvcopy those and then reducevg when no partitions are left.
And when I looked for the information in the ODM for my hdisk0, maybe
I did something wrong, but the LIKE option did not work for me, but I
was able to find my hard disks by providing trailing zeros for the
value. Try this:
# odmget -q "value = 000048054d75e6b00000000000000000" CuAt
My server gave two entries: Attribute = pvid and Attribute = pv
Check which device is using which disk for devices within your rootvg
$ lslv -l DEVX.
Check if problem causing devices are mirrored or not.
$ lsvg -l rootvg
$ lslv DEVX; lslv -l DEVX
In case they are mirrored run a rmlvcopy
$ /usr/sbin/rmlvcopy MYLV 1 PVID
Example: /usr/sbin/rmlvcopy hd3 1 000048054d75e6b0
Example: /usr/sbin/rmlvcopy hd6 1 000048054d75e6b0
Remove the offending PVID
$ reducevg rootvg PVID
In case a devices has pv allocated on the missing PVID and it is NOT
mirrored the device must be recreated.
From your output i assume tmp is not mirrored.
1) Login via console:
2) Stop all processes which are not needed.
3) Check which process is using /tmp
$ fuser -xuc /tmp
or
$ lsof /tmp
4) kill all processes using /tmp
5) recreate /tmp - The hd4 should have at least 128MB free since since
it will be the temporary tmp target
hth
Hajo
Reference:
Technote:
http://www-01.ibm.com/support/docview.wss?uid=isg3T1000426
AIX 4.3 documentation
http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/aix43.htm
AIX Version 4.3 Problem Solving Guide and Reference
http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/43_docs/aixprob/prbslvgd/toc.htm
hd3 : /tmp and hd6 (paging) ... that's not a big deal to loose them.
/tmp is entirely on hdisk7 and hdisk12
hdisk7:
hd3 3 3 00..00..03..00..00 /tmp
hdisk12:
hd3 12 12 00..12..00..00..00 /tmp
corresponding to the result of lsvg -l :
hd3 jfs 15 30 3 open/stale /tmp
you can confirm with :
lslv -l hd3
and hd6 is all on hdisk13
hdisk13:
hd6 128 128 00..45..49..34..00 N/A
So i think you can safely break the mirror with rmlvcopy and delete
entries of the pvid 000048054d75e6b0 with reducevg.
But it's true that it's a risky operation. It's never too late to make a
mksysb.
At the end, don't forget to recreate copies of /tmp and hd6 on disks
with free PPs. ;)
Pierre-yves
To All,
Thanks for all the help and ideas so far. I agree that the missing
device 000048054d75e6b0 is a ghost of an older disk. I've tried all
the above ideas and every time I get some various error code with the
following description:
Unable to find device id 000048054d75e6b0 in the
Device Configuration Database
I also agree that hd3 and hd6 do not NEED to be mirrored but am just
looking to clean things up a bit. This is and OLD server and has
limited users on it.