sys-firewall freezing on resume from suspend

6 views
Skip to first unread message

qtpie

unread,
May 27, 2022, 7:20:30 PM5/27/22
to qubes...@googlegroups.com
Hi,

I have a really annoying issue with resume from suspend. On resume,
sys-firewall is crashed/freezed/unresponsive. So on every resume from
suspend, I need to kill and restart this VM if I want to use networking.
Other qubes are fine, except that sys-whonix also freezes, but this is
because it can't get a network connection to sys-firewall.

The VM is based on the default debian 11 template without any special
modifications. It has worked fine this way for years. Qubes is the
latest version. Kernel used 4.10.112.

Symptoms:
- High reported ram/cpu use, cpu hovering around 10-20%
- vm terminal: shows a blank window, no input/output shown
- xen console in dom0: no output
- does not pass networktraffic from connected VM's
- stopped connected VM's can't start because of failed vif (network
connection) creation.
- sometimes, after a shorter suspend, the VM still works, or it does
pass networktraffic while the vm still can't open a terminal window.

I've tried:
- checking both before and after suspend the VM console and syslog, dom0
journal, dmesg, xen logs. It doesn't show any relevant error as far as I
can tell.
- creating a fresh sys-firewall VM. No change.
- switching the VM to a fedora 35 template, fully upgraded. No change
- checking possibly related issues on qubes github. But those are all
either fixed with updates, or about VM's with PCI devices connected,
which this VM doesn't.

What is this problem? Why does it only occur with sys-firewall VM? Which
logs to doublecheck? Any suggestions welcome.

qtpie

unread,
Jun 3, 2022, 10:06:11 AM6/3/22
to qubes...@googlegroups.com
So, apparently, this is not a sys-firewall, but a clocksync issue. To
root out any causes, I moved the clocksync service to a separate, brand
new qube (named sys-clock). And voila: sys-firewall no longer 'crashes'
on resume from suspend, now it's sys-clock.

The cause is probably somewhere in some logfile, but with the many
moving parts, Qubes really needs a better bugfixing howto. With
relatively many minor bugs like this, bugfixing takes too much time. I
don't mind spending some time fixing bugs, but lately it is really
becoming too much, to the extend that I am considering switching back to
an easier regular Linux distro. I have been a paid Linux sysadmin, no
total expert, but that is also not a requirement to use Qubes. I should
be able to diagnose bugs on my own laptop (and contribute to the project
by properly reporting them).

Mike Keehan

unread,
Jun 3, 2022, 12:28:01 PM6/3/22
to qubes...@googlegroups.com
My clocksync service runs in sysnet, not in sys-firewall. This is
Qubes 4.1

Mike.

tetra...@danwin1210.de

unread,
Jun 4, 2022, 5:56:47 AM6/4/22
to qtpie, qubes...@googlegroups.com
On Fri, Jun 03, 2022 at 04:00:20PM +0200, 'qtpie' via qubes-users wrote:
>So, apparently, this is not a sys-firewall, but a clocksync issue. To
>root out any causes, I moved the clocksync service to a separate,
>brand new qube (named sys-clock). And voila: sys-firewall no longer
>'crashes' on resume from suspend, now it's sys-clock.

This should probably be filed as an issue:
github.com/QubesOS/qubes-issues

qtpie

unread,
Jun 6, 2022, 3:15:15 PM6/6/22
to tetra...@danwin1210.de, qubes-users
Someone else filed an issue where this was solved for me:
https://github.com/QubesOS/qubes-issues/issues/7510. Briefly put:

Manually applying the patch from
https://github.com/QubesOS/qubes-core-admin/pull/473 to
dom0:/usr/lib/python3.8/site-packages/qubes/vm/qubesvm.py and then
restarting seems to have solved the issue. Also clock syncing trouble
after suspend seem to have improved. So this was a suspend and not a
clock or firewall issue.

This should come soon to dom0 as an update I guess

Demi Marie Obenour

unread,
Jun 10, 2022, 5:45:39 PM6/10/22
to qtpie, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Jun 03, 2022 at 04:00:20PM +0200, Qubes OS Users Mailing List wrote:
> So, apparently, this is not a sys-firewall, but a clocksync issue. To root
> out any causes, I moved the clocksync service to a separate, brand new qube
> (named sys-clock). And voila: sys-firewall no longer 'crashes' on resume
> from suspend, now it's sys-clock.
>
> The cause is probably somewhere in some logfile, but with the many moving
> parts, Qubes really needs a better bugfixing howto. With relatively many
> minor bugs like this, bugfixing takes too much time. I don't mind spending
> some time fixing bugs, but lately it is really becoming too much, to the
> extend that I am considering switching back to an easier regular Linux
> distro. I have been a paid Linux sysadmin, no total expert, but that is also
> not a requirement to use Qubes. I should be able to diagnose bugs on my own
> laptop (and contribute to the project by properly reporting them).

Indeed, you should be able to. The fact that you cannot is itself a
bug. Please report it.

- --
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmKju30ACgkQsoi1X/+c
IsGmFw/+J44uxpsEAEXAD1bVrxv1xA4sOtcZC+EjyWEVRGc61GTy5/uh69pb2ROi
9x0+ynfuxOTn2+9SZHNPyJtt5L5gNC7hSVrwNzzsMQX3OUySBjjh867DRdHHg7Zn
EvESFmPorOm+3LVdSEbA3QssOPTLSoHPrENuEnQ5yfsYURrNONIvSfwB+zCNexfP
Ay8IgyAaH8HKEhigu7VKCIn2XRV1JlRNzSC53fKJPhtwZq/NXc0JX+2NnX6KeOvE
VmxCQwF2Fvr6K2cMLdlm019XGVHifTqMnwi3/qdkbsxD2ntcovTs6kpMC0knWsT5
tMfgN0jS1CXyu+dGbupSjxtt1O9FEoXt45eOnA55oqc1jSZcYqgtaCbeq0QiPBi3
pXOf9ErJJjtrrxHr7CxoNRR3pyyXGb3gGKy2gElsT4Pm1qkioHSb/lEZMNqaU4YT
bI+R5sAE/DgTry8+S1uAwMNZthpS8FG5WjUwQ9SYr1a7jZOQCbDKyuw91n9/sU+6
a0bNoxk2cEnGngZpbQq0oXCT1V18WnU/JdfRCP9+I7NI7WAlVDOjB8QKlen0d1+y
d8WS1BBrfhxuDco1T+XNQCOtftdtuKV1PhzUGjR0hMQtBsuJWvfbsLIIs00/H+ET
x30l8RXDQw/+4s4cnSsd/IKN0d7KCpr18pQkyEWeE2Bp136pLpw=
=fc6P
-----END PGP SIGNATURE-----

Demi Marie Obenour

unread,
Jun 10, 2022, 5:49:08 PM6/10/22
to qtpie, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Jun 03, 2022 at 04:00:20PM +0200, Qubes OS Users Mailing List wrote:
> So, apparently, this is not a sys-firewall, but a clocksync issue. To root
> out any causes, I moved the clocksync service to a separate, brand new qube
> (named sys-clock). And voila: sys-firewall no longer 'crashes' on resume
> from suspend, now it's sys-clock.

https://github.com/QubesOS/qubes-core-admin/pull/473 will (hopefully)
fix this.

- --
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmKjvE8ACgkQsoi1X/+c
IsEWOxAArQT0n0LBpJhoQcDsGhiB0ute7pr5jkGTv2QW02EmSXXrKwdr9+QwZS7P
6xMA1mdcfPHNZfA4Y4z3ZRdUbd943sSn2IYhZ2s75dbVkHEp7hSaL1SJfT1cdSlW
j/CeF7nc2S1OA+sEJcWxKlwiX8gTRgMW3PeDNcRL/v2vxG7fte3Pj/2GSZ8wJFFR
+x19nrmp8KEJ4/Nx5JYSKLs74M75JFJPXrVOsPN4NiPa/e/0r77MY6+KytbXHm9L
i9CitXzVZnAp5voVX/CC4XkPa13M2YGAmQ+WQQFTKy35qUK9Ae97x18M+llxqsLN
eIZIvAGYmwTWPOXYMTadGL8xAveA6lnGIMA4EgYolqWGzxGCg1WyKnoOAXWhe7F+
BHeuTlt7YVwCaENa5XeweYy7gH6cnFXiqh5fcFFUPb9UJmjftlowLwUI4Ggnz112
7aDDOTPz73v0nLdAriRKxMqbf5n259eozsaZDnq1yrg2hdZGV8ptPxs+xvniSB4F
GpCvLBblAAXKDjZWyOeTNGGWA8YjBS4+9ExDTHo1vYU9yVob3Y3QfgFW364EeFeG
wYv/sqiRv6sDSBjw9fsHPX09tx8zRV9pAGy3wHvSWO6+bqsyl8o8usT6NRGNq4Tl
WaGar1+jiNvuEJ4DEtckMnKiO6UhSEsigYLjWlDZkEitO6Tq0sw=
=Bysk
-----END PGP SIGNATURE-----

qtpie

unread,
Jun 12, 2022, 5:22:28 PM6/12/22
to qubes...@googlegroups.com

qtpie

unread,
Jun 12, 2022, 5:38:54 PM6/12/22
to qubes...@googlegroups.com


On 6/10/22 23:45, Demi Marie Obenour wrote:
> On Fri, Jun 03, 2022 at 04:00:20PM +0200, Qubes OS Users Mailing List wrote:
>> So, apparently, this is not a sys-firewall, but a clocksync issue. To root
>> out any causes, I moved the clocksync service to a separate, brand new qube
>> (named sys-clock). And voila: sys-firewall no longer 'crashes' on resume
>> from suspend, now it's sys-clock.
>
>> The cause is probably somewhere in some logfile, but with the many moving
>> parts, Qubes really needs a better bugfixing howto. With relatively many
>> minor bugs like this, bugfixing takes too much time. I don't mind spending
>> some time fixing bugs, but lately it is really becoming too much, to the
>> extend that I am considering switching back to an easier regular Linux
>> distro. I have been a paid Linux sysadmin, no total expert, but that is also
>> not a requirement to use Qubes. I should be able to diagnose bugs on my own
>> laptop (and contribute to the project by properly reporting them).
>
> Indeed, you should be able to. The fact that you cannot is itself a
> bug. Please report it.
>

To prevent soiling the issues list, and make it a little more
actionable, let's first discuss this here.

What I need is a little more help with fixing or adequately diagnosing
bugs, as a sysadmin level person, no programmer or Xen or Qubes expert.
As said, to be able to fix or report & diagnose bugs and other issues
better. For instance, a list of logfiles added to standard fedora by
qubes/zen would be helpfull. So just a list, no further explanation of
how to use logfiles. I don't have more ideas currently, but there
probably are.

What worries me a little bit is that documentation like this might
encourage less skilled people to start doing things above their level of
ability (although this is also a good start to become more skilled).
Like, in the case of logfiles, soiling communication channels with
non-relevant information. So it should come with a clear warning.

Suggestions (or critique) welcome.
Reply all
Reply to author
Forward
0 new messages