LABEL: LVM_IO_FAIL
Resource Name: LVDD
Description: I/O ERROR DETECTED BY LVM
PHYSICAL VOLUME DEVICE MAJOR/MINOR: 002D 2008
ERROR CODE AS DEFINED IN sys/errno: 5
BLOCK NUMBER: 72032256
LOGICAL VOLUME DEVICE MAJOR/MINOR: 002E 0006
SENSE DATA:
0004 4B20 0004 6909 281A 1A09 0000 0000 0000 0000 0004 6909 DE4E 9084
0000 0000 0000 0000
I am getting this error every few seconds. (approx 100 every 3 hours)
I am "guessing" that the LVM is finding bad blocks and is relocating
the data to another block. Is this correct?
I would also like to know how to translate major/minor numbers into
pdisks and logical volumes as applicable.
I would like to know point 2 as I am getting the errors on two drives
at present and I am putting six new drives into the array within the
the next ten days. If I can determine which drives are failing, I
will order an extra two drives and replace the failing drives at the
same time. (I am doing a full rebuild from tape of this volume
group.)
The drives for this volume group are SSA in MOD600 towers.
I am running AIX 4.3.3.0.8
Any info would be appreciated.
George Bagley
Systems Engineer, DBA
> I am getting the following in errpt
> LABEL: LVM_IO_FAIL
> Resource Name: LVDD
> Description: I/O ERROR DETECTED BY LVM
> PHYSICAL VOLUME DEVICE MAJOR/MINOR: 002D 2008
> ERROR CODE AS DEFINED IN sys/errno: 5
> BLOCK NUMBER: 72032256
> LOGICAL VOLUME DEVICE MAJOR/MINOR: 002E 0006
> I am getting this error every few seconds. (approx 100 every 3 hours)
> I am "guessing" that the LVM is finding bad blocks and is relocating
> the data to another block. Is this correct?
Possibly. That's sufficiently frequent to indicate a failing
drive, though.
> I would also like to know how to translate major/minor numbers into
> pdisks and logical volumes as applicable.
# physical volume device major/minor
$ echo 'ibase=16; 2D; 2008' | bc
45
8200
# logical volume device major/minor
$ echo 'ibase=16; 2E; 06' | bc
46
6
I'm skeptical of the minor number for the physical device.
Nevertheless, look in /dev/ for a device with a major number
of 45 and a minor number of 8200 (?) for the physical volume;
the logical volume -- the numbers for which are much more
reasonable to me -- has a major number 46 and minor number 6.
Just so you know how to find these devices: the major number
is the first in the comma-separated pair which occurs where
you usually see the file size in the output of ls -l.
$ ls -l /dev/hdisk0
brw------- 1 root system 12, 1 Dec 19 2000 /dev/hdisk0
The major number of hdisk is is 12; its minor number is 1.
Regards,
Nicholas Dronen
P.S. I'm glad to hear you're working; not so for your drives. :-)
I still have a problem though
The major and minor number of the physical device point to a raid
array with 16 odd disks in it.
The logical volume indicated is spread over the disks in the array. I
have run diags on the disks and no errors/problems are being flagged.
Is there a way to interpret the sense data so that I can narrow it
down to a pdisk??
Thanks again
George
George,
are these errors preceded in the error log by SSA related messages?
If not, I don't think this is a hardware problem and the error is being
recognised by the LVM before anything is passed to the raid manager.
Is it a raw logical volume that the errors are logged against? If so,
what is accessing it? Is 72032256 a valid block number?
Regards,