Domain [*] has failed to start: Cannot execute qubesdb-daemon

50 views
Skip to first unread message

Steve Coleman

unread,
May 31, 2019, 3:48:10 PM5/31/19
to qubes...@googlegroups.com
After doing a reinstall and restore from backup of my Qubes 4.0 system,
I have a number of AppVM's that do not want to start for some unknown
reason. The frustrating thing is there are no AppVM or dom0 logs that
give me any clue as to where to start chasing down the problem. I have
grep'ed all logs for "qubesdb-daemon" in dom0, and all I can find is
qubes.log is saying it could not start the daemon. Not very helpful.

All log files in /var/log/qubes/qubesdb.VMNAME.log are zero bytes for
all AppVM's that do not run.

Putting the broken AppVM in debugging mode does not help, nor does using
the -p and -v flags on the qvm-[run,start] command lines. Why it won't
start is a total enigma since I can not see any error logs on either
side. Some AppVM's work just fine, while others simply do not. They all
use the exact same fedora-29 template and the same settings that they
had before the backup.

From dom0 qubesvm.py starting at line 1537 I see what appears to be the
relevant code giving my pop-up error message:

try:
yield from self.start_daemon(
qubes.config.system_path['qubesdb_daemon_path'],
str(self.xid),
self.name)
except subprocess.CalledProcessError:
raise qubes.exc.QubesException(
'Cannot execute qubesdb-daemon)


Is this code unable to create a system_path using the xid and appvm
name? Or just read one? A subprocess fork error, or an error deeper
inside some nameless forked process?

Most VM's start normally, so it should not be the "path" to the daemon
executable on either side, as that path and executable is in common with
other VM's that work just fine with the exact same template.

I only thing I see different between these is qvm-prefs is returning xid
== -1 for those that seem not to work, and positive integer values for
those that do work. If the above code is building a path using the -1
xid then this might be a problem?

Could this somehow also be evidence of some kind of lvm pool or qubes
database corruption from the restore process? I don't recall seeing any
error messages during the restore processing, but as it happens, the
ones that do not work were actually VM's restored later in that overall
process. Since they were deemed less important I happened to do them
last. Not sure if that could be relevant here or not, as I have the
exact same system disk space as before. It just sounds suspicious that
the later ones to be restored are the ones that seem to be not running.

Anyone have a clue where else to look for a real error messages? Perhaps
some esoteric dmesg or xl command looking for a xen system error by
another name? Or do I just need to --set a unique xid to each AppVM to
make them start properly?

thanks,

Steve

awokd

unread,
May 31, 2019, 5:34:10 PM5/31/19
to qubes...@googlegroups.com
Steve Coleman:
> After doing a reinstall and restore from backup of my Qubes 4.0 system,
> I have a number of AppVM's that do not want to start for some unknown
> reason.

Try changing their template to a different one? Doesn't have much
finesse or answer your question directly, but as a workaround you might
be able to build a replacement qube, set the private size to match the
source, then cp /dev/mapper/source-private
/dev/mapper/replacement-private (with the appropriate full names). That
could surface LVM corruption in those volumes too.

Steve Coleman

unread,
Jun 3, 2019, 1:11:58 PM6/3/19
to qubes...@googlegroups.com
Ok, I did some more testing on this, and it makes no sense...

If I clone the broken AppVM, then the clone made from it will run just
fine. But wait... there is more.

If I then delete the original AppVM and rename the new clone back to the
original name, that working AppVM fails with the same "can't start
qubesdb-daemon" error.

If I clone the working clone, and give that new clone the original name,
again it fails to run.

Directly renaming a broken AppVM to any other VM name appears to fix the
problem, but copying it back to the original name breaks it again.

If I delete the broken AppVM and create an brand new AppVM with the same
broken VM name then that brand new VM will not run, giving the same error.

A clone of a clone of a clone works just fine, as long as you don't give
it the original VM name.

So, it appears that just having that specific VM name will somehow
prevent it from ever running. When the AppVM has any other name then it
seems to work just fine. I just can't name any VM with the same original
VM name.



Steve


awokd

unread,
Jun 3, 2019, 2:18:23 PM6/3/19
to qubes...@googlegroups.com
Steve Coleman wrote on 6/3/19 5:11 PM:

> So, it appears that just having that specific VM name will somehow
> prevent it from ever running. When the AppVM has any other name then it
> seems to work just fine. I just can't name any VM with the same original
> VM name.

Rename the problem VMs to something else, then check your
/var/lib/qubes/qubes.xml. Confirm the new names are in there, then
delete any bad entries. Reboot. Cross fingers.

Steve Coleman

unread,
Jun 3, 2019, 3:36:53 PM6/3/19
to qubes...@googlegroups.com
After renaming/deleting the broken VM's there are no references left in
the qubes.xml file.

However, after renaming the VM's to another name and deleting the log
files in:

/var/log/qubes/*.VMNAME*.log
/var/log/xen/console/*VMNAM*.log
/var/log/libvirt/libxl/VMNAME.log

then moving the VM name back, everything ran correctly. My guess is that
there was a file permission or ownership that got messed up and therefor
qubesdb-daemon was unable to open that resource so it bailed out with no
error message.

The ownership of the qrexec.VMNAME.log and qubesdb.VMNAME.log are now
username.qubes where they had previously been root.qubes. Apparently
when you delete or rename a VM the logfiles are persistent so the
permissions would prevent any new VM from writing to that same file as
well.

In furthur testing I found that simply removing the file
/var/log/qubes/qubesdb.VMNAME.log would allow that VM to start correctly
even without renaming anything. It's clearly a file
ownership/permissions problem.

The one mystery left is how did the logfile permissions get messed up
during the restore process in the first place. Perhaps from running the
qvm-backup-restore as root? I don't remember doing that but its possible.

Steve

Reply all
Reply to author
Forward
0 new messages