2/3 of VMs randomly lose network access; sys-net, sys-firewall, and others normal

179 views
Skip to first unread message

Andrew David Wong

unread,
Nov 26, 2016, 12:42:43 PM11/26/16
to qubes...@googlegroups.com, Marek Marczykowski-Górecki
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

A strange networking problem just started in the past day or so:

Every few hours, around 2/3 of my VMs will suddenly lose network
access. I can still ping websites from sys-net and sys-firewall,
and some VMs still have normal network access, even though all of
them are using the same sys-firewall. (Other devices on my LAN are
also fine.)

The weird part is, if I create a new, additional "sys-firewall1"
ProxyVM and switch over one of the non-working VMs to it
*without restarting* the non-working VM, network access gets
successfully restored. So, the problem must be in sys-firewall
or the AppVMs, I think.

I've tried basing sys-firewall on fedora-24 and fedora-24-minimal
with the same results. Also double-checked NetVM assignments
and firewall rules, of course.

Any ideas for logs or tools I should check to find out what's
failing, or where it's failing?

- -----------------------------------------------------------------

I can't imagine what caused this problem to suddenly start,
except maybe a dom0 or template update, so here are the packages
I've updated in dom0 recently as part of normal qubes-dom0-update:

libsndfile
sudo
bind99-libs
bind99-license
ghostscript-core
hswdata
perf
ntfs-3g
ntfsprogs
perl
perl-libs
perl-macros

And here are the packages I've updated in my fedora-24 template
(again, as normal updates):

libicu
libidn2
gnome-abrt
gnome-software
libdmapsharing
libmetalink
lz4
lz4-r131
rpm
rpm-build-libs
rpm-libs
rpm-plugin-selinux
rpm-plugin-systemd-inhibit
rpm-python
rpm-python3

Any ideas?

- --
Andrew David Wong (Axon)
Community Manager, Qubes OS
https://www.qubes-os.org
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJYOcmCAAoJENtN07w5UDAwiqUP/3CylKymAzATJE5e7wyG98GZ
1pByD7hTfgs5X4X56emHgO5enCZbhugQ/JJrQZj4q8vPdur2WVG99cWfi9cVnPNJ
wBWo4r2O1sS0HS85o8ya6Jv93XJ+rsmScBwBobq9P/D3x5PL8petLVbpGgd02Kaw
C76vPmy00ZBKaTVpGtV90bcasF6vMVLT3osymRkwOPqxbimVMUqz0tfzD3s1PI5O
PpYK8Im18xjCxNhrdjY/K+jhG7mOkVssK0qc31LCK0HZ/jnaDM7gyFAb2NPKOG+w
EmpuvPU6TnzUEoLhPgj9k9RlNojwqy2OuClnefN/iqvp582oIZtN4OHSaXqAsU3U
Eo/MIFZqDOn9SZkyKF2lRb7Ro3DvfEXQHOHNVDbtlH/Jk1GgZ07UhaYLkRPK/m/L
N2qpV9zwzeRDlBVtP0BtbdiQzQdLmUVXvcz4FxONXfARDhLMUALakXpbV8UDRqfG
2r1wNa4DrTXtL7wf0wgy9mCxYzm2IXfIISQ9t3pfXeLemu3cY5Khwz/9kB/9iKRC
86xH0j75S5YJw+caOyO4q/3AVoGbsMCseRQyKDvdeiau7jEv2Jvaf60li8nwjAgv
pF1Ygq590P+WcDPGFnAqwjYc/0CyKtasWuFoAOlCXxMbZ3CLaBJdjh6XDre7wXBp
Tg3rZomPyMw/9crPQVwf
=gGCn
-----END PGP SIGNATURE-----

Andrew David Wong

unread,
Nov 26, 2016, 12:47:55 PM11/26/16
to qubes...@googlegroups.com, Marek Marczykowski-Górecki
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 2016-11-26 09:42, Andrew David Wong wrote:
> A strange networking problem just started in the past day or so:
>
> Every few hours, around 2/3 of my VMs will suddenly lose network
> access. I can still ping websites from sys-net and sys-firewall,
> and some VMs still have normal network access, even though all of
> them are using the same sys-firewall. (Other devices on my LAN are
> also fine.)
>
> [...]

Apparently, if I just wait 5-15 minutes, network access gets
restored to the affected VMs. (Note: This is not a solution for me.
I'm just noting it here in case it's a relevant clue to figuring
out the root cause.)

- --
Andrew David Wong (Axon)
Community Manager, Qubes OS
https://www.qubes-os.org
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJYOcq/AAoJENtN07w5UDAwFQMQALKWIzDdBkaC+nKtOR5oysaj
f89zy8TbfFj7BYOzfzmtqTZdHhpCFFtuNy/BV/Im8FcAkASFpxmyvUMpb4LLzAJi
mrpYCm4cgiekDGvZ7/Z7+HP7layGnD1afRmGk2yuExeZZdLjfA2ukxYbGjptnVSL
1AA/HL2LFgXhbEwauuzlJOvn/1EivXb81b5LqgIHqAwcPeEtrtJR5AJJslQTyMcJ
AvzHEUjHKvMAyvR9YYU4pdP+4uOmD7j0n430iBW2K1XiYqy3E3/XwTYb4461l99D
AwUk9vPbmmi8hX6+pJc7dGMuSxNUawkRbVTMZNYazQ8cc9+BqvP5ZGLY+PeT4NuO
3WspLob19QD9ELGDjC2z40V9+sD2ufbcbFEsHKIVRiUXWWCWCcUMG9BfOKNnI59L
AQFag5MN+GSlFZzVuLXW9TriTZhYG81FlBYyvVFXsqatsC2GBexJ1825JZiV2m7M
MPrNMhsAFoVhd9LHeGLIWzX60LtDI9/6voSEYurMqVIjduYKk/uqjuulQBgEF6d3
Xvo74fm7sScBCN18hvqHP8r6z3TQcc16RbGXBquW6zjBao1K3yrDEJVQHgBDb5Mm
qHN0Q41gr0YYV4qkwhLT6PMCAq9qs5Xz3xDstnGpZXMezzP74vCQzY9n7Kgz9UIS
S4ZZZljEEiKXeVa39ia+
=oDOf
-----END PGP SIGNATURE-----

Jean-Philippe Ouellet

unread,
Nov 26, 2016, 2:26:26 PM11/26/16
to Andrew David Wong, qubes...@googlegroups.com, Marek Marczykowski-Górecki
On Sat, Nov 26, 2016 at 12:42 PM, Andrew David Wong <a...@qubes-os.org> wrote:
> Any ideas for logs or tools I should check to find out what's
> failing, or where it's failing?

I'd start with: dmesg, ifconfig -a -v, tcpdump, iptables-save.

Jean-Philippe Ouellet

unread,
Nov 26, 2016, 2:27:14 PM11/26/16
to Andrew David Wong, qubes...@googlegroups.com, Marek Marczykowski-Górecki
Particularly tcpdump on both sides to see where the packets are being dropped.

Marek Marczykowski-Górecki

unread,
Nov 26, 2016, 6:04:50 PM11/26/16
to Andrew David Wong, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Nov 26, 2016 at 09:47:46AM -0800, Andrew David Wong wrote:
> On 2016-11-26 09:42, Andrew David Wong wrote:
> > A strange networking problem just started in the past day or so:
> >
> > Every few hours, around 2/3 of my VMs will suddenly lose network
> > access. I can still ping websites from sys-net and sys-firewall,
> > and some VMs still have normal network access, even though all of
> > them are using the same sys-firewall. (Other devices on my LAN are
> > also fine.)
> >
> > [...]
>
> Apparently, if I just wait 5-15 minutes, network access gets
> restored to the affected VMs. (Note: This is not a solution for me.
> I'm just noting it here in case it's a relevant clue to figuring
> out the root cause.)

Do you see some correlation with:
- starting/stopping another VM?
- affected VMs have or not firewall rules?

Also, check if restarting qubes-firewall service in sys-firewall helps
(and check it status first).

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJYOhUMAAoJENuP0xzK19cs6fYH/1kn6ZYkJI4aXhBj3qN+pTKT
yKT9LLSu1Cc5SP/fx4Yi5RinJ2W5++lzhqImsWgeDekN4VdFJuAoaGPSuumyUgzn
2vnttfm8QaBZhftqeU/Sp524Yoodo0GNzLY/uUDwahLvrjiGo/h8SquwI2hQbX61
oPxN0S6Rd6rv2CA4PUVhQeoj5ksSXDrAcP6MndxAZr2O8cYsYN5wndDPy1kF7pIm
Bb0DUFE0+Ntd53EKFd5FyiGkJai8GxSoCmAEluDPjJn2AuXgeqPQGBsrBLoga34h
lc9/eNhLmUte91BQHOQra5mBajcat2u7eVw7+AOCMVJuDm9Ki/QrVuTJaPtrk4U=
=1JzG
-----END PGP SIGNATURE-----

Chris Laprise

unread,
Nov 26, 2016, 6:29:12 PM11/26/16
to Andrew David Wong, qubes...@googlegroups.com, Marek Marczykowski-Górecki
Check out this thread:
https://groups.google.com/d/msgid/qubes-users/3aa66b77-9a06-83d8-d965-6583ef10d2a9%40gmail.com

Author claims its dependent on running Qubes in a VM, but the symptoms
are about the same and the trigger is a switch to fedora 24.

My own problem with fedora 24 is that the minimal template seems
incapable of acting as a simple Qubes firewall. No time to troubleshoot it.

You may want to switch to debian for your service VMs... Versions 8 and
9 are working well for me.

Chris

Andrew David Wong

unread,
Nov 26, 2016, 8:41:57 PM11/26/16
to qubes...@googlegroups.com, Marek Marczykowski-Górecki, Jean-Philippe Ouellet, Chris Laprise
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 2016-11-26 09:42, Andrew David Wong wrote:
> A strange networking problem just started in the past day or so:
> [...]

Thanks for the tips, Jean-Philippe, Marek, and Chris!


On 2016-11-26 11:26, Jean-Philippe Ouellet wrote:
>> I'd start with: dmesg, ifconfig -a -v, tcpdump, iptables-save.
>
> Particularly tcpdump on both sides to see where the packets are being dropped.
>

Ok, thanks. Will do.


On 2016-11-26 15:04, Marek Marczykowski-Górecki wrote:
> Do you see some correlation with:
> - starting/stopping another VM?
> - affected VMs have or not firewall rules?
>
> Also, check if restarting qubes-firewall service in sys-firewall helps
> (and check it status first).

I didn't notice any, but I'll check again if/when it recurs.


On 2016-11-26 15:28, Chris Laprise wrote:
> Check out this thread: https://groups.google.com/d/msgid/qubes-users/3aa66b77-9a06-83d8-d965-6583ef10d2a9%40gmail.com
>
> Author claims its dependent on running Qubes in a VM, but the symptoms are about the same and the trigger is a switch to fedora 24.
>
> My own problem with fedora 24 is that the minimal template seems incapable of acting as a simple Qubes firewall. No time to troubleshoot it.
>
> You may want to switch to debian for your service VMs... Versions 8 and 9 are working well for me.
>
> Chris
>

I did notice that other read, but at a glance I thought it was about
a different issue. I'll give it a second look. The funny thing is that
fedora-24-minimal had been working fine as a firewall (at least as far
as I could tell) until just very recently, and fedora-24 (full) also
exhibited the same problem. If I can't get it resolved quickly on
Fedora, I'll certainly give Debian a try! :)

- --
Andrew David Wong (Axon)
Community Manager, Qubes OS
https://www.qubes-os.org
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJYOjnKAAoJENtN07w5UDAwziEP/A699pbXl884HraKrFCnP2oH
dYwL81u5zj0y+wuPmB4HjQojVGTPrG4WCGunN6gPtwfbPP3+MpigXNj97HfPG/iK
8n79KfROJI2EooEMxMG8pGq3+8egSZj6ZZrlAricyt82HcO2WLeN/TGMSVArrhR/
kw31OmWZN1r1si2tn+XsM9kzvxkI+WnYZts+MtNi+iPiN9qGXi8VBhDSe/5ZETm8
VzE50avSFeCoyDVtmYJVIO1DzI5JyQHZ2G0pHPCp0CcEgjdL22FuWKUoXotEbYvO
iavRN2W8SxG2K37TdKmTjJf72ZoHVKKdTlzsQHSVNcMfeTRRvv4D3O5pFoTCMIFz
MCA0/EsZIAZ7XEVHgxIOjBL/xUoq9ubmbfr2JVLTbr/ZcGS86fw/4nLGNTP2ASDo
Kpa83lhkMGzWBfDTZF65SucBYUUId6nqNDXedcRj9ejsAaNCQEIVH0Djt7Wo6RpF
2gAp6WOjsNZpS1chM9L4Dl/BdkSTFO45XhVTcu/3wLOt2Mn92N6mhrrex3o2CrSu
26k1D8iiwu8L71ovhr8DqQF/jhREjcewW81PNSKTqvP524vnogHpeHKAICo7VUT/
5h+rTexkpZ/ejqs59PS9z4GNVvLtkmP1jhs7iaVCy1IS+gGlPBBJYlQuecJq0ugP
NXKsfLF+UYtj67UlkLrj
=PEC4
-----END PGP SIGNATURE-----

Me

unread,
Nov 27, 2016, 3:56:12 AM11/27/16
to qubes...@googlegroups.com, Marek Marczykowski-Górecki
Andrew David Wong:
> A strange networking problem just started in the past day or so:
>
> Every few hours, around 2/3 of my VMs will suddenly lose network
> access. I can still ping websites from sys-net and sys-firewall,
> and some VMs still have normal network access, even though all of
> them are using the same sys-firewall. (Other devices on my LAN are
> also fine.)
>
> The weird part is, if I create a new, additional "sys-firewall1"
> ProxyVM and switch over one of the non-working VMs to it
> *without restarting* the non-working VM, network access gets
> successfully restored. So, the problem must be in sys-firewall
> or the AppVMs, I think.
>
> I've tried basing sys-firewall on fedora-24 and fedora-24-minimal
> with the same results. Also double-checked NetVM assignments
> and firewall rules, of course.
>
> Any ideas for logs or tools I should check to find out what's
> failing, or where it's failing?
>
I had networking issues after downloading Fedora 24. I've ditched that
and gone back to Fedora 23 - all is well again >


Eva Star

unread,
Nov 30, 2016, 1:19:17 PM11/30/16
to qubes...@googlegroups.com
On 11/27/2016 02:04 AM, Marek Marczykowski-Górecki wrote:

> Do you see some correlation with:
> - starting/stopping another VM?
> - affected VMs have or not firewall rules?
>
> Also, check if restarting qubes-firewall service in sys-firewall helps
> (and check it status first).
>

Seems I have the same issue! (Maybe) I think it's correlate with
CHECKING UPDATES on dom0 or templates. When Qubes do that check - other
VM still not responsive. I wrote about this at the Xen 4.6.3 thread.


--
Regards

Marek Marczykowski-Górecki

unread,
Dec 4, 2016, 9:19:33 AM12/4/16
to Eva Star, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

More troubleshooting steps:

1. When problem appears, try in sys-firewall:

qubesdb-read /qubes-iptables-error

This should print last error of firewall reload. I guess it may be about
some DNS resolution failure (if any rule use DNS name instead of IP).
This shouldn't affect all the VMs - only the one for which name
resolution failed, but maybe something is wrong here.

2. Check status and logs of qubes-firewall service:

sudo systemctl status qubes-firewall

Should be "active (running)" and a series of "qubes-firewall[xxx]:
/qubes-iptables" messages. If you see anything else, let me know.

3. Restart qubes-firewall service and see whether it helps:

sudo systemctl restart qubes-firewall

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJYRCXvAAoJENuP0xzK19cshjgH/iMyhpSkETZP6FyNKTuhlQaZ
T04mPxIpGtkVYnDythtCkhqmJXg+mrtJPOF8SUuVTh4cMC6SyFfkTdkjCjj+yBPo
elMvBV6OX5H261Jds2tmmBJ3OGQDWrLexp/q5aP/y6qehG/9Bc7rAo2Aj8YjeV0T
i5EwAG6b+YHfKfWDPR4HWSpfaBIHHfrvPABgVmWAcg8Onj5sKFGYEe5pYFMMe8k0
zjzEuZiUNZrQwygnaZHFI26vGpc8WMnavXDXXeMzo1yndhZKs9fcyptYvvu2dNSQ
kvm6akyArHpmv3bYetfx/cTKNLGClSG/XW5cQXPheHCZH0RHWtEUm0lQxeP8uhc=
=PQ/6
-----END PGP SIGNATURE-----

Eva Star

unread,
Dec 4, 2016, 2:46:27 PM12/4/16
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
On 12/04/2016 05:19 PM, Marek Marczykowski-Górecki wrote:

> More troubleshooting steps:
>
> 1. When problem appears, try in sys-firewall:
>
> qubesdb-read /qubes-iptables-error
>
> This should print last error of firewall reload. I guess it may be about
> some DNS resolution failure (if any rule use DNS name instead of IP).
> This shouldn't affect all the VMs - only the one for which name
> resolution failed, but maybe something is wrong here.
>
> 2. Check status and logs of qubes-firewall service:
>
> sudo systemctl status qubes-firewall
>
> Should be "active (running)" and a series of "qubes-firewall[xxx]:
> /qubes-iptables" messages. If you see anything else, let me know.
>
> 3. Restart qubes-firewall service and see whether it helps:
>
> sudo systemctl restart qubes-firewall
>

Okey, I made offline notes for this.



Eva Star

unread,
Dec 6, 2016, 12:01:26 PM12/6/16
to qubes...@googlegroups.com
On 12/04/2016 05:19 PM, Marek Marczykowski-Górecki wrote:
One my VM loss network access again.
> qubesdb-read /qubes-iptables-error
blank

> sudo systemctl status qubes-firewall
give me this https://i.imgur.com/KUkHODf.png
it's the last call.
before all calls show me the same:
https://i.imgur.com/UwfdUSI.png (5 mins later, 9 mins later only
difference)

Then (sorry) I forget about step 3 from instruction and restart my VM.
It helps. And firewall after that show me 15 lines output (vs 14 lines
before)
https://i.imgur.com/JoIaZxN.png
See the last line! It's show after I shutdown problem VM.

As I'm already wrote I still think the problem with background
updates... Maybe race condition or something like this that freeze
update task and all network access?



--
Regards

Andrew David Wong

unread,
Dec 6, 2016, 11:49:04 PM12/6/16
to Eva Star, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
FWIW, `systemctl restart qubes-firewall` fixed it for me last time.

- --
Andrew David Wong (Axon)
Community Manager, Qubes OS
https://www.qubes-os.org
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJYR5S1AAoJENtN07w5UDAwHLEQAIa4xd2D1aRU262ZQ7wHrQVX
eyE6BdipuYKeJ8CmfEP7eo5pACFgyECjGhEZk3Zm/nsWnuIJvFYXhiej/iFWdtOh
CUvIpZ/lYzPWY+RgnJmxf2JwJ8mwwdT5WewTPqEJKmMGHeOyfV5l5NsB1s4ChBUX
dlR1Q0e4r2XWEnDeoeIRMdvVQJezWQ3GiMKvDTHBVwxt8jkbQxVwP2f1RuhvMk39
e7+luxLdSj3BAAt5jQUCIVTivVuUFBwQNrHU7ZOq+m5TrxsYN4srJOTVvMXFAXDb
aF21vAIvm+H3dxnQUEGm7UvbjVoM/KfABWHe9ROGm7mk6F/imF3E9AHbJ39yWWmY
vsWRE558N8qmNqMtaqLfW8aqtmnJfhq9bC8pAVjrtClKFnRzhPho6LG6QqmWbbn0
gPp6T3y5Qp0tom54MTte14jRVCF2HkLbOSUQ9g8M1RIc+A0eLP3eeIP0HMsXxAr5
s4++USG05dftv0mQ44zf2zmXR1NGSHSLi1TNHi4DZaSWAQQiGqSE7enYTHlj7Iu/
k89naxECNYbqnCKk/Nxh54jBv1gI1AxqeZ7oWGv0y084Zdx8H2sbglDerW6D3xrn
DLaSGOuQqRY5eVTN0PyJbpFhzBijLE1b4jf3SuB+XlNhvZ8A5vDQUUJDo6K8T9OL
BF5OM2TkSj6+sHhuW81H
=YV5S
-----END PGP SIGNATURE-----

Eva Star

unread,
Dec 9, 2016, 5:56:48 PM12/9/16
to qubes...@googlegroups.com
On 12/07/2016 07:48 AM, Andrew David Wong wrote:

FWIW, `systemctl restart qubes-firewall` fixed it for me last time.

Today one my VM losses network 2 times. Every time I tried `systemctl
restart qubes-firewall` and it not fix the issue. It's looks like the
issue at VM. Maybe, fedora-25 will fix it...

--
Regards

Chris Laprise

unread,
Dec 9, 2016, 11:53:16 PM12/9/16
to Eva Star, qubes...@googlegroups.com
Debian 8 & 9 have been working fine, BTW, and Debian has a more secure
update than Fedora.

Chris
Reply all
Reply to author
Forward
0 new messages