Qubes locks up half the time on startup with filenotfound error trying to run qubes manager

80 views
Skip to first unread message

Guy Frank

unread,
Sep 11, 2018, 2:44:13 PM9/11/18
to qubes-users
Was a bit premature thinking that my qubes installation was stable. About half the time I start the system, it locks up and I am only able to access Dom0 (qubes manager will not open, nor will any qubes, even from command line). The system gives a serious 'filenotfound' error msg. I've looked at previous posts on problems like this, but my problem doesn't seem to fit what others reported--qubes.xml is not empty and disk utilization is minimal (or near 50% in one case). The error message is:

#
Whoops. A critical error has occurred. This is most likely a bug in Qubes Manager
FileNotFoundError: [Errno 2]
No such file or directory
at line 9
of file /usr/bin/qubes-qube-manager
#

Line 9 reads: load_entry_point('qubesmanager==4.0.16', 'console_scripts', 'qubes-qube-manager')()

Ok, so the weird thing is that this works fine half the time. On half of my boot ups, I don't encounter this problem. So if there is no such file or directory, it's not there half the time. qubes.xml looks good (to my untrained eyes), and df -h shows nothing at more than 1% utilization except for /dev/nvme0n1p1 mounted on /boot/efi which is 56% of 200MB. nvme0n1p1 is, I believe, the GPT table?

I'm worried about coming to rely on this installation if at some point the error doesn't go away every other reboot and becomes permanent. Am trying updates now--maybe that will help.

Guy

Guy Frank

unread,
Sep 11, 2018, 3:10:15 PM9/11/18
to qubes-users

Updating the software in dom0 doesn't make the problem disappear, though now the main error message is:

QubesDaemonCommunicationError: Failed to connect to qubesd service: [Errno 2] No such file or directory

awokd

unread,
Sep 11, 2018, 5:29:02 PM9/11/18
to Guy Frank, qubes-users
Nothing related earlier in the "sudo journalctl -e" log? Try "sudo
systemctl restart qubesd"?


Guy Frank

unread,
Sep 11, 2018, 6:12:00 PM9/11/18
to qubes-users

Thanks awokd! I'll give these a try next time I run into the problem

Guy Frank

unread,
Sep 13, 2018, 1:43:57 PM9/13/18
to qubes-users

Ok, so on my next reboot, it ran into this problem again. I made a copy of the journalctl log and tried to restart qubesd, to no effect.

The attached file, jnlctlErr.txt, if you scroll down to 09:24:43, I think you can see where the Qubes OS daemon fails. It is immediately preceded by the 1d.2 pci device worker failing, suggesting that something about this failure is causing the daemon from starting (which occurs below the blank line I added to the log). 1d.2 is a PCI Bridge, Intel Corp Device a332. No idea what exactly this is or how to find out (not a hardware person).

One thing I thought of is the fact that there's a PS/2 card in the machine to which a PS/2 keyboard & mouse are attached. Neither has ever worked in Qubes (though they worked in Windows), so maybe that's what's triggering the problem? Will do some testing.

When I attempt to start qubes daemon w/ sudo systemctl restart qubesd, journalctl log shows other errors. The qubes daemon doesn't get started and I can't use the system.

What I can do is reboot. And about every other time, Qubes comes up and is fine. My concern is that at some point it'll stop doing this, so I'd really like to figure out how to solve this problem.

Guy

AttemptToStartQubesDaemon.txt
jnlctlErr.txt

awokd

unread,
Sep 14, 2018, 2:49:00 AM9/14/18
to Guy Frank, qubes-users
On Thu, September 13, 2018 5:43 pm, Guy Frank wrote:

> Ok, so on my next reboot, it ran into this problem again. I made a copy
> of the journalctl log and tried to restart qubesd, to no effect.
>
> The attached file, jnlctlErr.txt, if you scroll down to 09:24:43, I think
> you can see where the Qubes OS daemon fails. It is immediately preceded
> by the 1d.2 pci device worker failing, suggesting that something about
> this failure is causing the daemon from starting (which occurs below the
> blank line I added to the log). 1d.2 is a PCI Bridge, Intel Corp Device
> a332. No idea what exactly this is or how to find out (not a hardware
> person).

What is 0000:06:00.0 (and 0000:05:00.0 for that matter), one of your USB
controllers? Check with lspci. Try unplugging all USB devices except
keyboard and mouse, and seeing if that error still shows. If so, try
moving your keyboard and mouse to a different controller, and disable
0000:06:00.0.

> One thing I thought of is the fact that there's a PS/2 card in the
> machine to which a PS/2 keyboard & mouse are attached. Neither has ever
> worked in Qubes (though they worked in Windows), so maybe that's what's
> triggering the problem? Will do some testing.
>
> When I attempt to start qubes daemon w/ sudo systemctl restart qubesd,
> journalctl log shows other errors. The qubes daemon doesn't get started
> and I can't use the system.

Only qubesd errors I see in that log are:
Sep 13 09:52:19 dom0 systemd[1]: qubesd.service: Start operation timed out.
Terminating.

I don't know how to get more detailed logs to see why it's doing that-
maybe it has a systemd dependency on udev?

Marcus Linsner

unread,
Sep 14, 2018, 5:48:17 AM9/14/18
to qubes-users

Looking the the relevant errors, in context (and the time between them):

...
Sep 13 09:20:23 localhost kernel: usb 1-10.1: New USB device found, idVendor=413c, idProduct=2002
Sep 13 09:20:23 localhost kernel: usb 1-10.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Sep 13 09:20:23 localhost kernel: usb 1-10.1: Product: Dell USB Keyboard Hub
Sep 13 09:20:23 localhost kernel: usb 1-10.1: Manufacturer: Dell
Sep 13 09:20:23 localhost kernel: input: Dell Dell USB Keyboard Hub as /devices/pci0000:00/0000:00:14.0/usb1/1-10/1-10.1/1-10.1:1.0/0003:413C:2002.0001/input/input3
Sep 13 09:20:23 localhost kernel: hid-generic 0003:413C:2002.0001: input,hidraw0: USB HID v1.10 Keyboard [Dell Dell USB Keyboard Hub] on usb-0000:00:14.0-10.1/input0
Sep 13 09:20:23 localhost kernel: input: Dell Dell USB Keyboard Hub as /devices/pci0000:00/0000:00:14.0/usb1/1-10/1-10.1/1-10.1:1.1/0003:413C:2002.0002/input/input4
Sep 13 09:20:23 localhost kernel: usb 4-3: new low-speed USB device number 2 using ohci-pci
...
Sep 13 09:20:23 localhost kernel: hid-generic 0003:413C:2002.0002: input,hidraw1: USB HID v1.10 Device [Dell Dell USB Keyboard Hub] on usb-0000:00:14.0-10.1/input1
...
Sep 13 09:21:43 dom0 kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
...
Sep 13 09:21:44 dom0 kernel: acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 13 09:21:44 dom0 kernel: wmi_bus wmi_bus-PNP0C14:04: WQBC data block query control method not found
Sep 13 09:21:44 dom0 kernel: acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 13 09:21:44 dom0 kernel: input: PC Speaker as /devices/platform/pcspkr/input/input12
Sep 13 09:21:44 dom0 kernel: input: Dell AIO WMI hotkeys as /devices/virtual/input/input13
...
Sep 13 09:21:44 dom0 kernel: dell-wmi 9DBB5994-A997-11DA-B012-B622A1EF5492: Dell descriptor buffer has invalid buffer length (32768)
Sep 13 09:21:44 dom0 kernel: dell-wmi 9DBB5994-A997-11DA-B012-B622A1EF5492: Detected Dell WMI interface version 1
Sep 13 09:21:44 dom0 kernel: input: Dell WMI hotkeys as /devices/platform/PNP0C14:04/wmi_bus/wmi_bus-PNP0C14:04/9DBB5994-A997-11DA-B012-B622A1EF5492/input/input14
Sep 13 09:21:44 dom0 systemd[1]: Found device /dev/disk/by-uuid/A482-5EDF.
Sep 13 09:21:44 dom0 systemd-udevd[1677]: Error calling EVIOCSKEYCODE on device node '/dev/input/event14' (scan code 0x150, key code 190): Invalid argument
...
Sep 13 09:21:49 dom0 kernel: snd_hda_codec_realtek hdaudioC0D0: Failed to find dell wmi symbol dell_micmute_led_set
...
Sep 13 09:22:00 dom0 kernel: input: HDA Intel PCH HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input21
Sep 13 09:22:43 dom0 systemd-udevd[1652]: seq 2961 '/devices/pci0000:00/0000:00:1d.2/0000:05:00.0/0000:06:00.0/usb4' is taking a long time
Sep 13 09:23:44 dom0 systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Sep 13 09:23:44 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 13 09:23:44 dom0 systemd[1]: Failed to start udev Wait for Complete Device Initialization.
Sep 13 09:23:44 dom0 kernel: audit: type=1130 audit(1536848624.020:69): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 13 09:23:44 dom0 systemd[1]: systemd-udev-settle.service: Unit entered failed state.
Sep 13 09:23:44 dom0 systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
...
Sep 13 09:23:44 dom0 systemd-logind[2748]: Watching system buttons on /dev/input/event14 (Dell WMI hotkeys)
Sep 13 09:23:44 dom0 systemd-logind[2748]: Watching system buttons on /dev/input/event13 (Dell AIO WMI hotkeys)
...
Sep 13 09:23:45 dom0 qmemman.systemstate[2758]: stat: xenfree=29212956823 memset_reqs=[]
Sep 13 09:24:43 dom0 systemd-udevd[1652]: seq 2961 '/devices/pci0000:00/0000:00:1d.2/0000:05:00.0/0000:06:00.0/usb4' killed
Sep 13 09:24:43 dom0 systemd-udevd[1652]: worker [1674] terminated by signal 9 (Killed)
Sep 13 09:24:43 dom0 systemd-udevd[1652]: worker [1674] failed while handling '/devices/pci0000:00/0000:00:1d.2/0000:05:00.0/0000:06:00.0/usb4'
#pci device 1d.2 is: 00:1d.2 PCI bridge: Intel Corporation Device a332 (rev f0), which I imagine might be related to: 00:17.0 SATA controller: Intel Corporation Device a352 (rev 10)

Sep 13 09:25:14 dom0 systemd[1]: qubesd.service: Start operation timed out. Terminating.
Sep 13 09:25:14 dom0 systemd[1]: Failed to start Qubes OS daemon.


If this guy is correct https://bugs.freedesktop.org/show_bug.cgi?id=75875#c1 about `you can safely ignore these errors` then the following workaround attempts are probably not going to make any difference(but hey, I tried):


What's the output of `lsmod|grep -i wmi` ? I'm guessing there should be something like `dell_wmi` which means it's possible to get it blacklisted (if even temporarily), so with something like `modprobe.blacklist=dell_wmi` as a kernel parameter (ie. cat /proc/cmdline) (or should it be dell-wmi ? that is, with dash instead of underscore). There's a list of ways on how to do that here https://askubuntu.com/questions/110341/how-to-blacklist-kernel-modules
but if you're using UEFI, that would mean appending that to the `kernel=` line (of the default= kernel) in /boot/efi/EFI/qubes/xen.cfg then rebooting; however I do recommend having a working copy first, as a backup, just in case your xen.cfg modifications render the system unbootable, for inspiration on how to do that, maybe see from here: https://groups.google.com/d/msg/qubes-users/CZ5vMNL_c7k/btiRvk9eBAAJ

If you don't want to blacklist that dell_wmi(guessing) module, or can't, then maybe consider temporarily commenting out this whole block of lines:
# Dell Latitude microphone mute
evdev:name:Dell WMI hotkeys:dmi:bvn*:bvr*:bd*:svnDell*:pnLatitude*
# Dell Precision microphone mute
evdev:name:Dell WMI hotkeys:dmi:bvn*:bvr*:bd*:svnDell*:pnPrecision*
KEYBOARD_KEY_150=f20 # Mic mute toggle, should be micmute

in dom0 file: /usr/lib/udev/hwdb.d/60-keyboard.hwdb
which would mean that some multimedia keys won't work but it should also not stall your boot process. Inspiration for this solution is from: https://ubuntuforums.org/showthread.php?t=2250210&p=13153308#post13153308
(or maybe more/different lines need to be blacklisted? unsure)

Another solution would be to use a newer kernel (like 4.18.7 from the unstable repo, see file /etc/yum.repos.d/qubes-dom0.repo if you want to `enabled = 1` it, under section [qubes-dom0-unstable]), but before you do I still recommend having another UEFI entry with the normal kernel(s) just in case the new one will not boot at all (tho unlikely), so you can use your BIOS boot menu to select between the two (tho be aware the currently running one will be updated(xen.cfg -wise) when you `sudo qubes-dom0-update` to a newer kernel). However I do note that your UEFI partition is only 200MiB so you might need to only copy the running kernel (the one referenced by `default=`) instead of everything, or you may not have enough space on it; this only makes sense in this context: https://groups.google.com/d/msg/qubes-users/CZ5vMNL_c7k/btiRvk9eBAAJ


Guy Frank

unread,
Sep 17, 2018, 7:10:15 PM9/17/18
to qubes-users

Thanks awokd and Marcus! The points you made got me to unplugging the non-working PS/2 keyboard and mouse on my computer (which plug into a PCI card). I've rebooted 5 times since and have not run into an error. So it looks like something about the PS/2 peripherals were causing the problem. Which of course leads to the next question of why does this cause problems and why don't the keyboard & mouse work. Well, questions for another thread.

Reply all
Reply to author
Forward
0 new messages