During a reorganization of my LVs, I did a lvremove that in itself
produced no errors. However, the next lvremove failed with messages that
devices (?) had been 'left open' and metadata for the VG that contained
both LVs was inconsistent.
I looked for a solution, but was not able to find anything. So I decided
to reboot the system in the hope that that would at lease solve the 'left
open' messages. In retrospect probably not the best action.
The result was, as my / is on the broken VG, that the system failed to
boot as / can not be mounted.
On reboot, lvm2 seems to try to recover the inconsistency, but AFAICT
fails because of an error in the code.
IMO there are two problems:
1. the error in the lvremove operation that caused the inconsistency;
2. an error in 'lvm vgscan' that fails to correct the inconsistency.
I checked the upstream changelog for sid's 2.0.0.32-1 version, but did not
see anything that looked like it would fix the 1st problem.
Note. It looks like the 2nd problem _may_ have been fixed in upstream
2.01. From their changelog:
<snip>
Version 2.01.00 - 17th January 2005
===================================
Fix vgscan metadata auto-correction.
</snip>
Using Debian Installer I've managed to revive the system insofar as I have
SW-RAID operational and can now use lvm commands to access the VG's and
LV's, but have not been able to repair the problem.
I have good faith that the system can be recovered, as it seems that the
inconsistency is relatively minor and fixable, but I will need some help
to do it.
The rest of this report gives more background and details of my
configuration, of what exactly happened and of my analysis.
Sorry if it's a bit long, but I tried to include all relevant info.
TIA for and help and suggestions on how to proceed.
BACKGROUND
==========
The system is a recently installed Sarge box used as a server in my home
network, running current 2.4.27-2 kernel.
The system has an internal 160G ide harddisk and an external 12G Megaraid
scsi storage unit.
When I installed the system, I decided to use reiserfs for some partitions
as I had read that it was more efficient for small files. A problem
during a reboot and some comments on #debian-boot (IRC) made me decide
that maybe ext3 would have been a better option, so I decided to
reorganize things.
The raid and VG setup may seem a bit strange, but was partly caused by a
hardware failure that cased me to install Sarge in the first place.
CONFIGURATION
=============
Before I started the reorganization, my config was as follows.
Disk /dev/discs/disc1/disc: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Start End Blocks Id System Mountpoint
part1 1 6 48163+ 83 Linux /boot
part2 7 26 160650 82 Linux swap swap
part3 27 19929 159870847+ 5 Extended
part5 27 4488 35840983+ fd Linux raid autodetect
part6 4489 9588 40965718+ 8e Linux LVM
part7 9589 14688 40965718+ 8e Linux LVM
part8 14689 18316 29141878+ 8e Linux LVM
part9 18317 19929 12956391 fd Linux raid autodetect
Disk /dev/discs/disc0/disc: 13.2 GB, 13268680704 bytes
255 heads, 63 sectors/track, 1613 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Start End Blocks Id System
part1 1 1613 12956391 fd Linux raid autodetect
RAID configuration:
md0: part5 on IDE disk (degraded RAID1; waiting for replacement 2nd HD)
md1: part9 on IDE disk and part1 on SCSI (RAID1)
Both defined as PVs for LVM.
Volume groups:
- sys on PVs md0 + md1
- work on PVs part6 + part7
(part8 was spare PV)
Logical volumes:
LV Mountpoint Filesys PV
- sys-root / ext3 md0
- sys-home /home reiserfs md0
- sys-var /var reiserfs md0
- sys-tmp /tmp reiserfs md0
- sys-exports /exports reiserfs created on md1, extended to md0
- work-debmirror not relevant reiserfs part6+part7
- work-installer not relevant reiserfs part7
WHAT HAPPENED
=============
I decided to start with sys-exports and sys-home.
I created a new VG 'temp' on part8 and new LVs temp-exports and temp-home.
Copied and verified data. umounted /exports and lvremoved sys-exports.
The error must have occurred at that point.
After umount /home, lvremove sys-home failed with the message of 'left
open'.
The LV's in both VG work and VG temp are still accessible normally.
If I now (using the Debian Installer 'rescue' system) do a vgdisplay.
Note: Debian installer uses devfs.
<output of vgdisplay -- start>
# vgdisplay
Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm:
using /dev/ide/host0/bus0/target0/lun0/part9
not /dev/scsi/host0/bus0/target0/lun0/part1
--- Volume group ---
VG Name temp
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 27.79 GB
PE Size 4.00 MB
Total PE 7114
Alloc PE / Size 1500 / 5.86 GB
Free PE / Size 5614 / 21.93 GB
VG UUID baXpMk-IvVl-Dz1K-Qn5P-wXGF-cZoD-nfO4v9
--- Volume group ---
VG Name work
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 78.13 GB
PE Size 4.00 MB
Total PE 20002
Alloc PE / Size 15500 / 60.55 GB
Free PE / Size 4502 / 17.59 GB
VG UUID r3BOv2-31pu-xehU-ufdi-Neae-852L-GUyLTM
WARNING: Volume group "sys" inconsistent
--- Volume group ---
VG Name sys
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 10
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 46.54 GB
PE Size 4.00 MB
Total PE 11913
Alloc PE / Size 6912 / 27.00 GB
Free PE / Size 5001 / 19.54 GB
VG UUID YOc5lQ-m6o8-wob9-ghrY-VAZe-9YX4-qwWbCp
<output of vgdisplay -- end>
vgscan gives:
<output of vgscan -- start>
# vgscan
Reading all physical volumes. This may take a while...
Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm:
using /dev/ide/host0/bus0/target0/lun0/part9
not /dev/scsi/host0/bus0/target0/lun0/part1
Found volume group "temp" using metadata type lvm2
Found volume group "work" using metadata type lvm2
Inconsistent metadata copies found - updating to use version 10
Automatic metadata correction failed
Volume group "sys" not found
<output of vgscan -- end>
ANALYSIS
========
An indication of why correction of metadata fails is in vgscan -vvv:
<snip of output of vgscan -vvv -- start>
Finding volume group "sys"
Opened /dev/md/0
/dev/md/0: lvm2 label detected
Opened /dev/scsi/host0/bus0/target0/lun0/part1
/dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected
Read sys metadata (10) from /dev/md/0 at 18432 size 1879
Opened /dev/ide/host0/bus0/target0/lun0/part9
/dev/md/0: lvm2 label detected
/dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected
Read sys metadata (9) from /dev/ide/host0/bus0/target0/lun0/part9
at 7168 size 2233
Inconsistent metadata copies found - updating to use version 10
Writing sys metadata to /dev/md/0 at 20480 len 1909
Automatic metadata correction failed
Volume group "sys" not found
Unlocking /var/lock/lvm/V_sys
Closed /dev/md/0
Closed /dev/scsi/host0/bus0/target0/lun0/part1
Closed /dev/ide/host0/bus0/target0/lun0/part9
<snip of output of vgscan -vvv -- end>
It shows that md0 has metadata (10) and part9 (md1) has metadata (9).
However, when updating vgscan writes (10) to md0 instead to md1, in effect
changing absolutely nothing! It should write to md1.
Would it be possible to copy this metadata manually?
I also don't understand why it uses
/dev/scsi/host0/bus0/target0/lun0/part1
instead of /dev/md/1 to access the 2nd PV in the VG.
Especially as it _does_ use /dev/md/0.
During preparation for this reorganization I noticed another strange
thing. Several commands would say something like:
Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm:
using /dev/hda9, not /dev/sda1
However, later in the output /dev/sda1 would always be printed.
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
firstly, enable md_component_autodetection in /etc/lvm/lvm.conf - it look slike
the system has found BOTH the MD and it's components, that explains the
duplicate PV entries. If this doesn't work then add filters to lvm.conf to
exclude the SCSI disks (or just include the MDs).
Secondly try using vgcfgrestore to restore the metadata onto the disks, you on;y
need to do this if 1) above fails
--
patrick
First of all, thank you for your very quick reply.
Sorry to break the thread. I saw your reaction in the BTS about 3 hours
ago, but the mail has not reached me yet, so I can't reply :-(
> Two things to try:
>
> firstly, enable md_component_autodetection in /etc/lvm/lvm.conf - it
> looks like the system has found BOTH the MD and it's components, that
> explains the duplicate PV entries. If this doesn't work then add filters
> to lvm.conf to exclude the SCSI disks (or just include the MDs).
That _does_ get rid of the listing of physical partitions.
vgscan -vvv now looks like
<snip>
Finding volume group "sys"
Opened /dev/md/0
/dev/md/0: lvm2 label detected
Opened /dev/md/1
/dev/md/1: lvm2 label detected
Read sys metadata (10) from /dev/md/0 at 18432 size 1879
/dev/md/0: lvm2 label detected
/dev/md/1: lvm2 label detected
Read sys metadata (9) from /dev/md/1 at 7168 size 2233
Inconsistent metadata copies found - updating to use version 10
Writing sys metadata to /dev/md/0 at 20480 len 1879
Automatic metadata correction failed
Volume group "sys" not found
Unlocking /var/lock/lvm/V_sys
Closed /dev/md/0
Closed /dev/md/1
</snip>
Unfortunately it does not help to fix the inconsistency.
> Secondly try using vgcfgrestore to restore the metadata onto the disks,
> you only need to do this if 1) above fails
Hmmm. How would I do that?
As I understand it, the backups are saved in /etc and as I can not access
my sys-root LV, I can not get to the backups.
Or is there some kind of trick to that?
I've also attached the output for 'lvdisplay >lvdisplay.txt 2>&1'.
Note the 'Device ??? has been left open' lines in that output.
Cheers,
Frans Pop
On Monday 31 January 2005 16:40, you wrote:
> Secondly try using vgcfgrestore to restore the metadata onto the disks,
> you only need to do this if 1) above fails
I've got my system back! :-D
Your suggestions and the fact that lvdisplay gave proper output led me to
try vgcfgbackup -f. I reviewed the resulting file for VG sys, and it
looked good. So I did a vgcfgrestore from that file, and bingo, the VG
was OK again.
I've now finished the reorganization and the system is back up.
Thanks very much for your help.
(And I learned a lot about LVM in the process.)
I'll leave it to you what to do with this bug report.
IMO both errors I reported are still there, but as it is after all
relatively easy to recover from the resulting inconsistency (once you
know how), you may want to downgrade to important.
Thanks again,
Frans Pop
> Version 2.01.00 - 17th January 2005
> ===================================
> Fix vgscan metadata auto-correction.
This does indeed refer to a fix for the recovery problem you hit.
Alasdair
Good news! I'll close this bug when 2.01 is uploaded. Thanks for letting me
know.
--
patrick