Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

QUORUM LOST, VOLUME GROUP CLOSING

556 views
Skip to first unread message

pric...@blm.gov

unread,
May 14, 2009, 6:31:23 PM5/14/09
to
I'm running AIX 5.3 TL 9 on a P550 with a 2104-DU3 disk array that is
raid 5 on the controller and appears as a single drive to the OS,
hdisk4.

A couple of weeks ago I lost the power supply to the 2104. After
replacing the power supply, and putting in a second one, I had to
"revive" the array and all seemed in order. No messages in the error
report.

This last Monday, 5/11/2009, I attempted to remove an unused lv from
the vg. The vg went off line and there were 2 messages in the errpt:

QUORUM LOST, VOLUME GROUP CLOSING
PHYSICAL VOLUME DECLARED MISSING

lspv showed the hdisk4 was still known by the OS and
"smitty ibmscraid"
showed that the raid was still in optimal condition.

After some futzing I was able to get the vg back with the following 2
commands:

exportvg 1tb2
/usr/sbin/importvg -y'1tb2' -f hdisk4

I again attempted to remove the lv but the same thing happened and
forcing the importvg worked.

Next I tried to change the vg with "smitty chvg" and set the
"A QUORUM of disks required to keep the volume group on-line ?" to
Yes. Same results:


QUORUM LOST, VOLUME GROUP CLOSING
PHYSICAL VOLUME DECLARED MISSING

This vg has over 300 raw lv's for Informix so I can't handily remove
it and rebuild it.


Clearly I'm lacking in understanding. What are my options at this
point?


Niel Lambrechts

unread,
May 15, 2009, 2:29:01 AM5/15/09
to
On 05/15/2009 12:31 AM, pric...@blm.gov wrote:
> I'm running AIX 5.3 TL 9 on a P550 with a 2104-DU3 disk array that is
> raid 5 on the controller and appears as a single drive to the OS,
> hdisk4.

What does 'oslevel -s' show?

> A couple of weeks ago I lost the power supply to the 2104. After
> replacing the power supply, and putting in a second one, I had to
> "revive" the array and all seemed in order. No messages in the error
> report.
>
> This last Monday, 5/11/2009, I attempted to remove an unused lv from
> the vg. The vg went off line and there were 2 messages in the errpt:
>
> QUORUM LOST, VOLUME GROUP CLOSING
> PHYSICAL VOLUME DECLARED MISSING

Are you perhaps using CLVM, or is this a standalone server? Can you also post
the output of 'lsvg VG' as well as any errpt entries that may or may not occur
during this?


> This vg has over 300 raw lv's for Informix so I can't handily remove
> it and rebuild it.
> Clearly I'm lacking in understanding. What are my options at this
> point?

You could always try to turn off quorum checking (chvg -Qn VG) and see if that
makes a difference.

Regards,1
Niel


chemfiz

unread,
May 15, 2009, 6:29:41 AM5/15/09
to
On May 15, 8:29 am, Niel Lambrechts <n...@devnull.org> wrote:

#chvg -Qn vgname
#varyoffvg vgname
#varyonvg vgname
#mount /fs

pric...@blm.gov

unread,
May 15, 2009, 10:16:02 AM5/15/09
to
>
> What does 'oslevel -s' show?
>
5300-09-02-0849

>
> Are you perhaps using CLVM, or is this a standalone server? Can you also post
> the output of 'lsvg VG' as well as any errpt entries that may or may not occur
> during this?
>

This is a standalone server. No errors when I run lsvg vgname, here
are the results:

lsvg 1tb2
VOLUME GROUP: 1tb2 VG IDENTIFIER:
0002c6ff00004c00000000ff9d28c180
VG STATE: active PP SIZE: 1024
megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 820
(839680 megabytes)
MAX LVs: 512 FREE PPs: 3 (3072
megabytes)
LVs: 19 USED PPs: 817
(836608 megabytes)
OPEN LVs: 17 QUORUM: 2
(Enabled)
TOTAL PVs: 1 VG DESCRIPTORS: 2
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 1 AUTO ON: yes
MAX PPs per VG: 130048
MAX PPs per PV: 1016 MAX PVs: 128
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY:
relocatable

>
> You could always try to turn off quorum checking (chvg -Qn VG) and see if that
> makes a difference.

I'll come in this weekend, shut down Informix and see if I can do
this. When I tried it on Monday the vg went off line with the a fore
mentioned errors.

pric...@blm.gov

unread,
May 15, 2009, 10:16:55 AM5/15/09
to

Thank you. I'll give it a try this weekend when I can shut down
Informix first.

pric...@blm.gov

unread,
May 17, 2009, 2:04:09 PM5/17/09
to
Sunday, May 17, 2009 at about 11:30 am I was able to shut down
Informix.

When I entered "chvg -Qn 1tb1" I get the error:

0516-011 lchangevg: The volume group has been forcefully varied off
due to a loss of quorum.
0516-732 chvg: Unable to change volume group 1tb1.

To get it back I entered the following 2 commands:

exportvg 1tb1
/usr/sbin/importvg -y'1tb1' -f hdisk4

That brings it back with no errors.


Hajo Ehlers

unread,
May 17, 2009, 6:16:10 PM5/17/09
to

Verify that the VG does not think it has more then one PV
$ lsvg -p 1tb1
Check also the allocation map
$ lsvg -M 1tb1 | grep -v hdisk4

Also has this VG been changed in any way ( VG / Scalabel VG / big VG
and back ? )
Was a hdisk removed from this VG previously ?

Since you get the problem each time you read the VGDA i would check
what happens in case you run the
lqueryvg command

$ lqueryvg -Atp hdisk4

You might also check the output from a lqueryvg -Atp from another
hdisk and compare the output

hth
Hajo

Hajo Ehlers

unread,
May 17, 2009, 6:28:39 PM5/17/09
to
Addendum
Check as well with the readvgda command

try to read and verify the first and second vgda and compare them as
well.

$ readvgda -p hdisk4
$ readvgda -s hdisk4

cheers
Hajo

pric...@blm.gov

unread,
May 24, 2009, 7:01:31 PM5/24/09
to
Hajo,

Thank you for the reply. You have been a tremendous help as has any
one else who has answered.

Here are the results so far:


>
> Verify that the VG does not think it has more then one PV
> $ lsvg -p  1tb1

1tb1:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE
DISTRIBUTION
hdisk4 active 820 8
00..00..00..00..08


> Check also the allocation map
> $ lsvg -M 1tb1 | grep -v hdisk4

1tb1 #this is the expected result.

>
> Also has this VG been changed in any way ( VG / Scalabel VG / big VG
> and back ? )

AFAIK the VG was not changed in any way. Perhaps I did something
wrong when I revived the array. But I thought that was transparent to
LVM.

> Was a hdisk removed from this VG previously ?

No, it has always been hdisk4.

>
> Since you get the problem each time you read the VGDA i would check
> what happens in case you run the
> lqueryvg command
>
> $  lqueryvg -Atp hdisk4

Max LVs: 512
PP Size: 30
Free PPs: 8
LV count: 352
PV count: 1
Total VGDAs: 2
Conc Allowed: 0


MAX PPs per PV 1016
MAX PVs: 128

Quorum (disk): 1
Quorum (dd): 1
Auto Varyon ?: 1
Conc Autovaryo 0
Varied on Conc 0
Logical: 0002c6ff00004c00000000ff9d287e12.1 raw48 1
0002c6ff00004c00000000ff9d287e12.2 raw49 1
.
.
.
0002c6ff00004c00000000ff9d287e12.351 raw415 1
0002c6ff00004c00000000ff9d287e12.352 raw416 1
0002c6ff00004c00000000ff9d287e12.353 raw417 1
Physical: 0002c6bf97d139ad 2 0
Total PPs: 820
LTG size: 128
HOT SPARE: 0
AUTO SYNC: 0
VG PERMISSION: 0
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 16128
VARYON MODE: 0
VG Type: 1
Max PPs: 130048

>
> You might also check the output from a lqueryvg -Atp from another
> hdisk and compare the output


>


> try to read and verify the first and second vgda and compare them as
> well.
>
> $ readvgda -p hdisk4
> $ readvgda -s hdisk4
>

# diff 1tb1.readvgda.dash.p.hdisk4 1tb1.readvgda.dash.s.hdisk4
2,3c2,3
< ..... Sun May 24 16:48:11 MDT:2009
< ..... readvgda -p hdisk4
---
> ..... Sun May 24 16:48:46 MDT:2009
> ..... readvgda -s hdisk4
24c24
< ============= B: VGSA 0x80 (0x10000) =============
---
> ============= B: VGSA 0x23a8 (0x475000) =============
38c38
< ============= B: VG HEADER 0x180 (0x30000) =============
---
> ============= B: VG HEADER 0x24a8 (0x495000) =============
62c62
< ============= B: VG TRAILER 0x23a7 (0x474e00) =============
---
> ============= B: VG TRAILER 0x46cf (0x8d9e00) =============
67,68c67,68
< ============= B: LV ENTRIES 0x181 (0x30200) =============
< & NAMELIST 0x2367 (0x46ce00)
---
> ============= B: LV ENTRIES 0x24a9 (0x495200) =============
> & NAMELIST 0x468f (0x8d1e00)
11688c11688
< ============= B: PV HEADER 0x329 (0x65200) =============
---
> ============= B: PV HEADER 0x2651 (0x4ca200) =============

I will post this but will continue analyzing the results and post
again if I find out anything.

-Park

pric...@blm.gov

unread,
May 24, 2009, 7:20:05 PM5/24/09
to
I can't find documentation on the lqueryvg command and its output. I
compared the output of "lqueryvg -Atp hdisk4" with the output of
"lqueryvg -Atp hdisk6" hdisk6 is a 2TB array and is a different
model. It also did not have a power supply failure. The difference
in the output was the next to the last "VG Type:" line. On the 1TB
array we have been discussing the VG Type: is 1 but on the 2TB array,
hdisk6, the VG Type: is 2. That is the only difference I'm finding.

Hajo Ehlers

unread,
May 25, 2009, 6:31:47 AM5/25/09
to


Just FYI

VG type 1 should be " Big" where type 2 should be scalable and 0 for
standard or check via "lsvg myvg" with
MAX PVs = 32 ( vg)
MAX PVs = 128 ( big vg )
MAX PVs = 1024 ( scalable vg )

Further trouble shooting !
1) You used used the "-f " option for you importvg . This means forced
varyon. What happens if you doing it without the force option. ?

2) Check the output of a "truss -auf rmlv MyLV" . Might give already
a hint. If not:
2.1) The rmlv is a script , make a copy and add into each function as
well as for the main section a " set -x " and rerun the rmlv ( not
started via truss )
Post the output where is script will fail.

3) There has been various problems with big volume groups in the past
and some needed a rebuild of the VG. Maybe your VG has been changed in
the past to BIG but the oslevel was not at the requried level.
For example see http://www-01.ibm.com/support/docview.wss?uid=isg1IY67833

hth
Hajo

BTW: Have you taken over the system ? I ask because you have 352 LV
but its looks like that only a few are active - "OPEN LVs:
17 "
Not that in the past only LVs could be added but not removed :-(
See 3)

pric...@blm.gov

unread,
Jun 1, 2009, 9:19:49 PM6/1/09
to
Hajo,

Thank you for your detailed attention to this problem. After reading
the APAR I determined it will be most time efficient for me to

* - Backup all data in the volume group
* - Remove the existing volume group (varyoffvg, exportvg)
* - Create a new volume group using the mkvg -S option
* - Create logical volumes and restore data

As I recall there is a way to reduce the Informix data space of unused
raw logical volumes. I'll try to do that first to minimize my backup
and restore times.

-Park

0 new messages