Unable to mount ZFS root filesystem after hardware upgrades

Affichage de 113 messages sur 13
Unable to mount ZFS root filesystem after hardware upgrades Daniel Davis 06/01/13 07:56
I was operating a ZFS root pool mirror successfully on SSD on Ubuntu
12.04, ZFS RC13 on an LSI 9207-8i HBA using IT firmware.  I was in the
process of adding a second duplicate HBA.  I went a little too fast and
installed both the second card and updated the firmware.  I immediately
had problems with the mirror.  Since I had a similar situation with an
install on another manufacturer's SSDs on the motherboard controller I
unmirrored the root pool and removed the drive from the system.

It worked and I was able to boot the root pool twice while I was trying
to understand the problem.  Since there are so many variables I am
trying to methodically work through them.  But during this initial
period, while I was able to boot the root pool, merely putting the
unmirrored SSD back on the controller would cause a failure to boot.  
The same thing happened with the other manufacturers SSDs on the
motherboard.  The host is an EFI/GPT machine but bios compatibility is
on (CSM) which seems necessary to support the boot process.  EFI and
Grub are on their own partitions on a separate drive so I have full
access to them.

I have been unable to downgrade the firmware which I figured was the
first step.  I am still trying to find a procedure.  I have removed the
second HBA.  I am able to boot a ZFS enabled Ubuntu and mount the rpool
partition and it seems intact.  I don't believe that the firmware is the
immediate problem since I was able to boot and my "safety" Ubuntu ZFS
can boot.  But rather some change in the devices caused by the insertion
of the second card.

I am also able to start the boot from the rpool partition on the SSD in
emergency mode.  It proceeds through to the BusyBox prompt.

The failure happens because it cannot mount the root pool.  The error
shown is:

Command: mount -t zfs -o zfsutil - /root
Message: filesystem '-' cannot be mounted, unable to open the dataset
mount: mounting '-' on /root failed: No such file or directory
Error: 1

Manually mount the root file system on /root and then exit

So I have a lot of tools to look at and fix the installation but I would
like pointers to possible causes and good experiments to try. The first
thing I tried is Fajar's alternate grub boot syntax using

"boot=zfs rpool=rpool bootfs=rpool/ROOT/precise-5"


without success.

Dan Davis



Re: Unable to mount ZFS root filesystem after hardware upgrades Daniel Davis 07/01/13 13:20
Well, I learned a lot but the strange problem persists.

First, I downgraded the firmware on one LSI 9207-8i and removed the
other.  I had to erase the existing firmware then install the older
version and its associated bios.  There are problems doing the downgrade
on my motherboard so I did it directly from the Tianocore EFI shell.

After a failed rpool boot I am left in BusyBox. A zpool status -v there
showed the rpool SSD was "Faulted" and would not mount pointing to the
"label".  I re-connected its mirror SSD which was not faulted and it had
sufficient redundancy to boot.  No version of "clear" would fix the
faulted rpool slice or cause it to resilver so I un-mirrored the faulted
slice (partition).  Zero-ed the faulted partition, removed the
partition, remade the partition as Ext4, trimmed it, removed the
partition, and made it again as unformatted.  Admitted overkill but I
did not want any data or metadata on the SSD partition.  During this
period, I had reason to perform several successful boots from the rpool
on one SSD.

Everything looked good.  I added the mirror and the rpool was happy
(good zpool status, update-grub ok).  I rebooted and the boot failed
dropping me into BusyBox.  A zpool status now reports BOTH mirror
partitions were faulted.  And this is in the original configuration in
which it was working.

Using my safety Ubuntu, I mounted the rpool to /mnt and zpool status
says it is in good shape.  Additional attempts at booting failed though
sometime I can recover an unmirrored rpool this way and get it to boot.

I took restored backup of the rpool to a new partition on the SSD and
deconfigured it to make it an ordinary ZFS enabled Ubuntu 12.04.  I am
running on that deconfigured copy for now until I understand what is
faulting the rpool.  All the rpool SSD partitions are intact so I can
keep looking from a mount and I can boot the old rpool to BusyBox to
look around.

In the mean time I had to move on with other tasks so I completed the
install of the second HBA.  I moved the SSDs to the second HBA as
planned and the non-rpool Ubuntu with ZFS is running fine with its
storage pool.  Clean mounts on boot, no discernable problems.

Same problems with two different brand SSDs, and two different
controllers.  This gave me a good tour of the rpool boot process and
zfs-enabled mountall (pure vanilla, I read the Wiki page) plus the
zfs-grub which is good.  But I don't know much about what could be
throwing the rpool into a faulted state at my current level of
understanding ZoL (noting the problems with mpt2sas installations).

Interesting.  Lots of good web reading but no smoking gun yet.  If there
are some good checks I can do from BusyBox I would be glad to do them.  
ZFS seems OK.  I can do a zpool import and it shows my storage pool.  
Otherwise, I will keep reading and experimenting.

-- Dan Davis
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Gregor Kopka 07/01/13 23:01
Could it be that you have a zpool.cache in the initrd?
If so it it may contain some metadata of the pool from the running system, but with paths which are not (yet) available in the boot environment.

clear a faulted pool by forcing zfs to search for the disks instead of relying on stale data in zpool.cache:
$ zpool export rpool ; zpool import rpool

Devs please consider to drop the zpool.cache file since it is more trouble than what it's worth and replace it with a simple text file consisting of pool names (or UIDs) to be imported.

Gregor
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Cyril Plisko 07/01/13 23:40
>
> Devs please consider to drop the zpool.cache file since it is more trouble
> than what it's worth and replace it with a simple text file consisting of
> pool names (or UIDs) to be imported.

That would have a devastating effect on those with _really_ large pools.
zpool import takes minutes without cache as opposed to seconds with cache.


--
Regards,
        Cyril
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Fajar A. Nugraha 08/01/13 00:10
Yeah. But for people like me who only use several disks per pool, and
only one or two pools, zpool.cache became a hassle when moving disks
around (e.g. internal -> USB enclosure).

Personally I'd be happy if "zpool set cachefile=none" is persistent.
So far it seems to keep resetting to "-", which is the default.

--
Fajar
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Daniel Davis 08/01/13 08:59
Thanks to everyone, your clues did the trick in getting the rpool booting.  There were other bumps on the way (gotta love system administration) from several sources.

In the process I messed up the old rpool so badly it was safer to start from backup.  My old rpool was called "prpool" and my new one is called "rpool" to help understand the issue.  At this point I am booting from a single SSD (unmirrored).

When I do a zpool status on the running "rpool" Ubuntu, the prior "prpool" is still seen.  I have destroyed it with "zpool destroy prpool" and "zpool destroy -f prpool" with reboots between each cycle.  If I look at it from my safety Ubuntu (zpool import -d /dev/disk/by-id -R /mnt rpool) I do not see "prpool".  Any ideas how to get rid of it in the running "rpool" Ubuntu?

Dan Davis
duraspace.org

root@kingfish:~# zpool status -v
  pool: atlantic                                                                                                                                                                    &nb sp;                                    
 state: ONLINE                                                                                                                                                                    &nbsp ;                                      
 scan: scrub repaired 0 in 0h28m with 0 errors on Mon Nov 26 16:34:06 2012                                                                                                                                               
config:                                                                                                                                                                    &nbs p;                                             
                                                                                                                                                                     &nbsp ;                                                   
        NAME                                             STATE     READ WRITE CKSUM                                                                                                                                      
        atlantic                                         ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            scsi-35000cca01a90f788                       ONLINE       0     0     0
            scsi-35000cca01a97d054                       ONLINE       0     0     0
          mirror-1                                       ONLINE       0     0     0
            scsi-35000cca01aa3e198                       ONLINE       0     0     0
            scsi-35000cca01aa3e564                       ONLINE       0     0     0
        cache
          scsi-SATA_SAMSUNG_SSD_830S0XZNEAC505934-part2  ONLINE       0     0     0
          scsi-SATA_SAMSUNG_SSD_830S0XZNEAC505020-part2  ONLINE       0     0     0

errors: No known data errors

  pool: prpool
 state: FAULTED
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from
        a backup source.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
 scan: none requested
config:

        NAME                                             STATE     READ WRITE CKSUM
        prpool                                           FAULTED      0     0     0  corrupted data
          mirror-0                                       DEGRADED     0     0     0
            scsi-SATA_SuperSSpeed_S30YTAF128G0924-part2  UNAVAIL      0     0     0  corrupted data
            scsi-SATA_SuperSSpeed_S30YTAF128G0918-part2  UNAVAIL      0     0     0

  pool: rpool
 state: ONLINE
 scan: none requested
config:

        NAME                                           STATE     READ WRITE CKSUM
        rpool                                          ONLINE       0     0     0
          scsi-SATA_SuperSSpeed_S30YTAF128G0918-part2  ONLINE       0     0     0

errors: No known data errors

On 01/08/2013 03:10 AM, Fajar A. Nugraha wrote:
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades LustreOne 09/01/13 16:03
Linux already keeps a cache of block devices independent of the ZFS code, so any difference between import with/without zpool.cache could probably be avoided.

Cheers, Andreas





Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Gregor Kopka 10/01/13 01:14
I don't get why the import from the raw disks should take _that_ longer - unless ZFS skips checks on the pool members in case the .cache is used (thus would possibly ignore possible corruption in the pool, which would be a bad idea). Determining if a drive contains a pool member should be relatively cheap (read uberblock, check magic and uid), this should be done with a handful of block reads per drive.

In my tests i found for a pool consisting of 3 vdev * 1TB 2-way mirrors:
- zpool import (without a pool name and cleared .cache) is faster than real importing with the .cache: so it can't be the enumeration of the physical drives.
- zpool import pool took roughly 20 minutes to complete (without .cache) , but about 2 seconds after issuing the command (while it is still running) you could see the pool in zpool list as online, zfs mount -a the datasets, access them, create snapshots - havn't found anything not working except another zpool import since the old one was still running.

So the problem with the slow import seems not that the pool isn't available quickly, more like that uncached zpool import idles on something else after onlining the pool before it returns (since left alone it would read a relative constant <100KB/s from each member drive for the rest of the 20 minutes).

Posted this to the list some time ago but got no feedback.
I'll try to reproduce this tonight, maybe this thread receives higher attention.

Gregor

Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Cyril Plisko 10/01/13 02:18
On Thu, Jan 10, 2013 at 11:14 AM, Gregor Kopka <gre...@kopka.net> wrote:
>
> I don't get why the import from the raw disks should take _that_ longer -
> unless ZFS skips checks on the pool members in case the .cache is used (thus
> would possibly ignore possible corruption in the pool, which would be a bad
> idea). Determining if a drive contains a pool member should be relatively
> cheap (read uberblock, check magic and uid), this should be done with a
> handful of block reads per drive.

I think without cache ZFS needs to query significantly larger set of
block devices in order to create a coherent view. With cache this view
is given and the only thing left is to validate it. I confess it is a
speculation based on observed behavior (i.e, guess), rather than firm
assertion.

> In my tests i found for a pool consisting of 3 vdev * 1TB 2-way mirrors:
> - zpool import (without a pool name and cleared .cache) is faster than real
> importing with the .cache: so it can't be the enumeration of the physical
> drives.
> - zpool import pool took roughly 20 minutes to complete (without .cache) ,
> but about 2 seconds after issuing the command (while it is still running)
> you could see the pool in zpool list as online, zfs mount -a the datasets,
> access them, create snapshots - havn't found anything not working except
> another zpool import since the old one was still running.

That's interesting observation. I'll try to investigate such scenario
on my systems.

--
Regards,
        Cyril
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Brian Behlendorf 06/02/13 16:13
On 01/10/2013 02:18 AM, Cyril Plisko wrote:
> On Thu, Jan 10, 2013 at 11:14 AM, Gregor Kopka <gre...@kopka.net> wrote:
>>
>> I don't get why the import from the raw disks should take _that_ longer -
>> unless ZFS skips checks on the pool members in case the .cache is used (thus
>> would possibly ignore possible corruption in the pool, which would be a bad
>> idea). Determining if a drive contains a pool member should be relatively
>> cheap (read uberblock, check magic and uid), this should be done with a
>> handful of block reads per drive.
>
> I think without cache ZFS needs to query significantly larger set of
> block devices in order to create a coherent view. With cache this view
> is given and the only thing left is to validate it. I confess it is a
> speculation based on observed behavior (i.e, guess), rather than firm
> assertion.

If we properly integrate with blkid, as Andreas suggested, then there's
really no compelling reason to keep the zpool.cache file.  As devices
are added to the system they are identified as ZFS pool members and
automatically imported when all the vdevs are available.

It largely just a matter of making this work a priority and getting it
done.  This would also nicely resolve the issue of having a stale
zpool.cache file in your initramfs.

Thanks,
Brian
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Cyril Plisko 06/02/13 22:18
>
> If we properly integrate with blkid, as Andreas suggested, then there's
> really no compelling reason to keep the zpool.cache file.  As devices are
> added to the system they are identified as ZFS pool members and
> automatically imported when all the vdevs are available.
>

I believe such behaviour may be wrong in some cases. Today zpool.cache
serves two different purposes (and that is probably the main reason
people find it suboptimal).
One - it effectively does what /etc/fstab did to filesystem. I.e. it
tells the system which pool to import on boot automatically.
Two - it significantly accelerates the import process of a specific pool.

It looks to me that if we will leverage blkid to determine devices
with ZFS pool on them, we may decide to import pool by mistake.
In a simple case, where a single host sees only local drives, that
behaviour (import everything I can see) may be correct. However, when
the storage is shared and multiple servers see the devices
simultaneously that may not be a desired behaviour. Indeed, my host
may be able to detect device with ZFS pool, that should be imported by
some other host. We may avoid import if pool appears to be imported
by some other host, but what if the pool in question happens to be
exported for the moment ? We will prevent the legitimate owner of the
pool from importing it later if we go ahead and import it blindly. We
still need a way to tell ZFS which pools it should consider
its own and try to import them automatically.

The other purpose - accelerating import process - well, blkid very
well may be helpful here. It is not clear to me why that would be
superior to current cache file, but it is completely possible I am
missing something obvious here. I would like to learn more about
proposed blkid integration.

Meanwhile my own take on this is that separating these two uses of
cache file is a good thing. What I am doing is using a dedicated cache
file for each pool. Some other logic elsewhere decides which pool to
import. When decided it uses the appropriate cache file to do the
import itself. It very well may not be the best solution ever, but it
is simple and works very reliably.


--
Regards,
        Cyril
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Brian Behlendorf 11/02/13 16:31
> Meanwhile my own take on this is that separating these two uses of
> cache file is a good thing. What I am doing is using a dedicated cache
> file for each pool. Some other logic elsewhere decides which pool to
> import. When decided it uses the appropriate cache file to do the
> import itself. It very well may not be the best solution ever, but it
> is simple and works very reliably.

For those interested, I've posted in issue #1035 my thoughts on what
should be done about the /etc/zfs/zpool.cache file.  I'd love some
feedback so we don't miss any important use cases.

https://github.com/zfsonlinux/zfs/issues/1035

Thanks,
Brian
Re: [zfs-discuss] Re: Unable to mount ZFS root filesystem after hardware upgrades Petter B 12/02/13 23:23
I have run into the exact same problem and it 's not clear to me what the fix / work-around is, if any.

Currently I cannot reboot cleanly. I have to boot into the rescue disk an import + export the root pool before I can boot properly.