No login after reverting to btrfs snapshot

105 views
Skip to first unread message

schul...@googlemail.com

unread,
May 31, 2017, 5:14:14 AM5/31/17
to qubes-users
After experiencing the “severe lagging” issue described at https://groups.google.com/forum/#!searchin/qubes-users/severe$20lagging%7Csort:relevance/qubes-users/iidmHBxVJPA/tiQo-8ZCCAAJ, I decided to try the snapshot feature of btrfs in order to be able to undo a dom0 update, should something go wrong. Sadly, after reverting to a snapshot and rebooting, the machine hang, the last message being:

<last-message>
Welcome to emergency mode! After logging in, type “journalctl -xb” to view
system logs, “systemctl reboot” to reboot, “systemctl default” or ^D to
try again to boot into default mode.

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue
</last-message>

After pressing “continue”, the same message reappears prepended with the line:

“Error getting authority: Error initializing authority: Culd not connect: No such file or directory (g-io-error-quark, 1)”

That’s for the short story. The long story follows. Here’s the subvolume list before making the snapshot:

# In a dom0 terminal
$>sudo btrfs subvolume list /
ID 257 gen 173 top level 5 path root
ID 259 gen 43 top level 257 path var/lib/machines

The default subvolume is 257 at that point in time. Next I snapshot the root filesystem:

$>sudo btrfs subvolume snapshot / /before-update
$>sudo btrfs subvolume list /
ID 257 gen 173 top level 5 path root
ID 259 gen 43 top level 257 path var/lib/machines
ID 276 gen 174 top level 257 path before-update

After updating dom0 and rebooting, the “severe lagging” issue appeared, so I reverted to the snapshot:

$>sudo btrfs subvolume set-default 276 /

After rebooting the machine hangs with the message described above. Any ideas how to fix that?

Chris Laprise

unread,
May 31, 2017, 7:17:33 AM5/31/17
to schul...@googlemail.com, qubes-users
I think that can happen when /etc/fstab points to a partition that no
longer exists. Perhaps that means fstab is referencing a particular
subvolume, the old one which may be deleted/renamed/moved.

IIRC, Fedora installer does this "subvol=" thing in fstab which means
setting the default doesn't have any effect.

You could change the fstab entry to not use "subvol=" so the default is
used, or you can point the parameter to the new subvolume.

--

Chris Laprise, tas...@openmailbox.org
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB 4AB3 1DC4 D106 F07F 1886

Chris Laprise

unread,
May 31, 2017, 7:31:17 AM5/31/17
to schul...@googlemail.com, qubes-users
On 05/31/2017 07:17 AM, Chris Laprise wrote:
> I think that can happen when /etc/fstab points to a partition that no
> longer exists. Perhaps that means fstab is referencing a particular
> subvolume, the old one which may be deleted/renamed/moved.
>
> IIRC, Fedora installer does this "subvol=" thing in fstab which means
> setting the default doesn't have any effect.
>
> You could change the fstab entry to not use "subvol=" so the default is
> used, or you can point the parameter to the new subvolume.
>

I should add that the contents of /boot figure in this, as /etc/fstab
settings are transferred to the /boot configuration. Unfortunately,
/boot is not subject to root subvolume manipulation. So what you ended
up with is the updated /boot trying to mount the default Fedora subvol
(which I think is "root").

The ideal way to do this would be to remove the "subvol=" parameter from
fstab, update your /boot with the dracut command, then revert to older
subvol by changing the default. And on top of all this, preferably you
would also have an older copy of /boot to match your older root volume,
and copy that to the /boot partition.

schul...@googlemail.com

unread,
Jun 1, 2017, 8:58:50 AM6/1/17
to qubes-users, schul...@googlemail.com
Thanks for all the context, Chris. Not sure I'm getting all of this. After trying to wrap my head around dracut, I went with the following approach:

1. Removed the subvol option from fstab
2. Set the subvolume default of btrfs
3. Run sudo dracut --force

Doing #2 and #3 multiple times switching between subvolumes 257 and 276 and rebooting after each switch, everything went ok.

BUT, when doing a qubes-dom0-update after switching to the 'root' subvolume (ID 257), then I got said error again after switching back to ID 276 (subvolume before-update).

I'm completely stuck here, and a step-by-step howto would be greatly appreciated.

Thanks,
Bodo

Chris Laprise

unread,
Jun 1, 2017, 2:38:20 PM6/1/17
to schul...@googlemail.com, qubes-users
It may be best to ask a popular Linux forum how to do this; my choices
would be Fedora forum or SuSE (they have a lot of experience using btrfs
for /). I've only switched root subvols once, a long time ago.

Part of your problem may be running dracut more than once. If you only
want to switch the subvol for /, then updating it once (with subvol=
deleted from fstab) should be sufficient. I noticed that Qubes
documentation includes -H option along with --force when running dracut;
don't know if that will make any difference.

I also feel like doing a chroot into the new subvol (again, making sure
fstab was fixed) before running dracut could help put things back the
way they were before the unwanted update. I can imagine it screwing
things up worse, too :) Backup your /boot vol if you haven't already.

Only other thing I can recommend re:btrfs is make sure your snapshot
isn't read-only. Also, you may want to create a regular non-snapshot
subvol and transfer filesystem over to that:
$ sudo cp -a --reflink=always /snapshot/* /newsubvol

A different kind of remedy you can try is to boot w/ the original (slow)
subvol and use dnf to undo the update:

https://docs.fedoraproject.org/en-US/Fedora/25/html/System_Administrators_Guide/sec-DNF-Transaction_History.html

schul...@googlemail.com

unread,
Jun 7, 2017, 1:14:20 PM6/7/17
to qubes-users, schul...@googlemail.com, tas...@openmailbox.org
Thanks again for all your precious hints, Chris. It took me while to get that problem straight. Most helpful haven been:

https://github.com/QubesOS/qubes-issues/issues/1871 https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Snapshots

Here are my notes to solve the problem of making and rolling back a btrfs snapshot after updating dom0 (in Emacs org-mode format):

* Qubes
** Install on btrfs
*** Manual partitioning
In order to select btrfs as the root filesystem you have to
select"Manual Partitioning". However selecting btrfs is not so
easy. You have to follow this contrived avenue:
1. Select "Automatic partitioning" first and go thru the
procedure until you're back to the main installation screen.
2. Select "Manual partitioning". In the dropdown select "btrfs".
Then click on the link above to create the partitions.
*** Later during installation
During the installation we are dropped into a shell due to
missing rootflags in xen.cfg. To fix this temporarily:

# Make our btrfs system writable
$>mount -o remount,rw /sysroot

# List all subvolumes. Write down the ID for the 'root'
subvolume. For me it was 257
$>btrfs subvolume list /sysroot

# Change default subvolume to 'root'. Change 257 to whatever you
got from the previous command
$>btrfs subvolume set-default 257 /sysroot

# Make sure the change was applied
$>btrfs subvolume get-default /sysroot

# Unmount and reboot into a working system
$>umount /sysroot

# Reboot the system
*** After finishing the installation
**** Reset the btrfs default subvolume to the top-level subvolume (ID 5)
$>sudo btrfs subvolume set-default 5 /
**** Edit /boot/efi/EFI/qubes/xen.cfg
Add the following to the kernel line(s)
rootflags=subvol=root
Do NOT set the subvolid! It would make things harder down the
road.
**** Edit /etc/fstab
Copy the line that mounts to / and contains the option
subvol=root. On the copied line change
/ to /top-level
and
subvol=root to subvol=/

This will allow us to access the top-level subvolume again.
We'll need that to move around snapshots.
**** Add the mount point for the top-level subvolume
$>sudo mkdir /top-level
*** Make a snapshot, e.g. before updating dom0
First, let's have a look at our subvolumes.

$>sudo btrfs subvolume list /

ID 257 top level 5 path root
ID 274 top level 257 path var/lib/machines

As level 257 indicates in the second line, the subvolume
var/lib/machines is nested in subvolume root. Nested subvolumes
are NOT snapshot when snapshotting their parent. Hence we need
not only snapshot subvolume root, but also subvolume
var/lib/machines to be able to rollback:

# Snapshot subvolume root
$>sudo btrfs subvolume snapshot /top-level/root /top-level/before-update

# Remove the empty directory from the snapshot
$>sudo rm -rf /top-level/before-update/var/lib/machines

# Snapshot the nested subvolume var/lib/machines
$>sudo btrfs subvolume snapshot /var/lib/machines \
/top-level/before-update/var/lib/machines

Don't forget to note the kernel you are currently booting.

*** Rollback a snapshot, e.g. after a dom0 update went havoc
Let's assume we have the above created snapshots. Do NOT set the
default subvolume to the snapshot. Instead use 'mv'.
**** Move snapshot into place
# Move subvolume var/lib/machines out of the way
$>sudo mv /top-level/root/var/lib/machines \
/top-level/root/var/lib/machines.old

# Move subvolume root out of the way
$>sudo mv /top-level/root /top-level/root.old

# Rollback the snapshot into root's place
$>sudo mv /top-level/before-update /top-level/root
**** NOTABENE: MAKE SURE YOU'LL BOOT THE CORRECT KERNEL
In the context of rolling back a snapshot after updating dom0,
it might have occured that the dom0 update installed a newer
kernel,
thus modifying the xen.cfg. After rolling back the snapshot you
must therefore make sure that you'll boot into the pre-update
kernel next time. Hopefully, you noted the kernel used before
the dom0 update!
So, edit /boot/efi/EFI/qubes/xen.cfg to point the default back
to the right kernel!

Now you're ready to reboot.

Hope that helps.

Thanks,
Bodo

Chris Laprise

unread,
Jun 7, 2017, 3:50:53 PM6/7/17
to schul...@googlemail.com, qubes-users
Thanks for the notes! Looks like UEFI (which I don't use) adds an extra
wrinkle to the issue.
Reply all
Reply to author
Forward
0 new messages