Kernel regression affecting VM network interface attach/detach operations

64 views
Skip to first unread message

M. Vefa Bicakci

unread,
Feb 11, 2018, 5:07:15 PM2/11/18
to qubes...@googlegroups.com
Hello all,

I noticed that the following commit causes a regression in QubesOS when
a VM's network interface is detached and re-attached a few times in a row:

5b5971df3bc2775107ddad164018a8a8db633b81
("xen-netfront: remove warning when unloading module")

The regression exhibits itself via VM unresponsiveness to qrexec commands
and heavier than usual CPU utilization involving the xenbus and xenwatch
kernel threads in the VM whose network interface had been detached/re-attached.

This commit has been back-ported to the kernel trees for versions 4.9.80 and
4.14.17, which is why QubesOS's testing kernel (based on 4.14.18) is also
affected.

I am currently using QubesOS R3.2, but I believe this issue affects R4.0 as well.

Reverting this commit appears to resolve the regression for the time being. I
haven't had the time to formulate a proper correction.

Vefa

Note: I can reproduce the issue with the following shell script in dom0.

Eventually, one of the 'qvm-run ... true' commands in the while loop
fail, and the VM in question does not respond to any further qrexec commands
(for example, to start new applications).

=== 8< ===
#!/bin/bash

netvm="sys-firewall"
vm="untrusted"

if qvm-ls "${vm}" --raw-data state | grep -iqw running; then
qvm-kill "${vm}"
fi

qvm-run -a -p -u root "${vm}" "echo 'file drivers/xen/xenbus/xenbus_probe*.c +pflmt' > /sys/kernel/debug/dynamic_debug/control"
qvm-run -a -p -u root "${vm}" "echo 'file drivers/net/xen-netfront.c +pflmt' > /sys/kernel/debug/dynamic_debug/control"
qvm-run -a -q -u root "${vm}" 'xterm -geometry 150x70 -e dmesg -w'

sleep 5

while true; do
echo "Set netvm to none"
qvm-prefs "${vm}" -s netvm none
echo "Done"

sleep 10

# This command is expected to eventually fail
# due to the regression.
qvm-run -p "${vm}" true || break

echo "Set netvm to ${netvm}"
qvm-prefs "${vm}" -s netvm "${netvm}"
echo "Done"

sleep 10

# This command is expected to eventually fail
# due to the regression.
qvm-run -p "${vm}" true || break
done

brenda...@gmail.com

unread,
Mar 2, 2018, 10:53:36 AM3/2/18
to qubes-devel
On Sunday, February 11, 2018 at 5:07:15 PM UTC-5, m.v.b wrote:
> Hello all,
>
> I noticed that the following commit causes a regression in QubesOS when
> a VM's network interface is detached and re-attached a few times in a row:
>
> 5b5971df3bc2775107ddad164018a8a8db633b81
> ("xen-netfront: remove warning when unloading module")

Hi m.v.b:

Have you submitted this to the qubes-issues git project?
https://github.com/QubesOS/qubes-issues

I quickly searched for xenbus (from your POC) and network, but I didn't find anything in the right timeframe. Probably worth submitting.

Brendan

Simon Gaiser

unread,
Mar 2, 2018, 1:51:28 PM3/2/18
to M. Vefa Bicakci, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

M. Vefa Bicakci:
> Hello all,
>
> I noticed that the following commit causes a regression in QubesOS when
> a VM's network interface is detached and re-attached a few times in a row:
>
> 5b5971df3bc2775107ddad164018a8a8db633b81
> ("xen-netfront: remove warning when unloading module")
>
> The regression exhibits itself via VM unresponsiveness to qrexec commands
> and heavier than usual CPU utilization involving the xenbus and xenwatch
> kernel threads in the VM whose network interface had been detached/re-attached.
>
> This commit has been back-ported to the kernel trees for versions 4.9.80 and
> 4.14.17, which is why QubesOS's testing kernel (based on 4.14.18) is also
> affected.

Thanks, for tracking this down! And sorry that I missed your report
until now (Otherwise I would already have included it in the latest
kernel PR).

> I am currently using QubesOS R3.2, but I believe this issue affects R4.0 as well.

I'm observing this behavior on R4.0.

Will try if reverting fixes the problem for me.

> Reverting this commit appears to resolve the regression for the time being. I
> haven't had the time to formulate a proper correction.

Did you find out yet why this change is broken?
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE3E8ezGzG3N1CTQ//kO9xfO/xly8FAlqZnTsACgkQkO9xfO/x
ly+XJBAAq3D4TT3+k6v7U5zWe6VqyxuSZOYHMrdUmNUTRp9Q+6+48jK7/gn1QkLX
VY8i/eIqgqn+OAIWi2oFmT9gtWtP+++cml90OY+BLeQg6C0ZnszHgHzISDFYWeAQ
7aHDBBmP+b3B4Jf2aWFHLeZjbyURJIF9z00xGfb7NAcHjFHlSJXAt19GMevjcy0H
G/rxpsW2ebgrnN3YVIJhqgCJrnpr3uoSWL3PozumnnFaxUDuxv/3MaVy5bJJTdz8
3ndcAN7rn35j8hB+Ionc47IBbzfRsfLSBYPUA01iaQIEnkK+55tEQlLDW8LCyrnc
dlMliCoWAqjkr1gruNI+oMnDq/Zm1d7qBzI7+ZvDPui3GFV4C+/A3Bca4bak5tVb
u/P6pfCMpE/StFN9/CttImfjqEG72rIOq4lHKYOZ7pdC4N3nUYVlLOYDvmVOoEwy
SuLhVoMAn+QWk48FSSioA72N7hh/Y70GjDb+I1JhfHnkas9AJ62b0dw46TNgo2jR
FOyCokgRTaxidGuIoKMTo6v0MljVnlC907H2Dn/et0lpuiUch71jv/S7SjCnzEGr
tExeMOjFGo5vgE6nyaoQQuT/PZXJMUHHOLtYwljM/MslqaW1R04WbDxKX389x8uF
5QAm4t334lk6JgIQTEz+Zk92Wvchd78WqWGwUE6RcWIOxb+rtVo=
=az44
-----END PGP SIGNATURE-----

M. Vefa Bicakci

unread,
Mar 4, 2018, 6:06:02 PM3/4/18
to Simon Gaiser, qubes...@googlegroups.com
On 03/02/2018 01:51 PM, Simon Gaiser wrote:
>> Reverting this commit appears to resolve the regression for the time being. I
>> haven't had the time to formulate a proper correction.
>
> Did you find out yet why this change is broken?

Unfortunately, no, I haven't had the time to investigate further.

Vefa

Zrubi

unread,
Mar 5, 2018, 3:53:09 AM3/5/18
to qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
My experience show that there is at least two different Xen related
problem:

- - 4.14.18
Has the high CPU effect, as Vefa reported before.

- - 4.14.13 and above
Has the network attach/detach problem as I described in a separate threa
d:
https://groups.google.com/d/msgid/qubes-devel/
05031ade-b019-986e-e378-32cc8fff916e

Qubes R3.2 affected for sure.

- --
Zrubi
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEw39Thm3rBIO+xeXXGjNaC1SPN2QFAlqdBWIACgkQGjNaC1SP
N2Tc0A//dfrBVj+aet87jf2C4uca2O1eclpGb8QUKTAk0TzKc09Rja0H0+pNtMh6
4J8kI9OBg4Qr1UXIK7GqW57mjrQ+RSl1yg8Pvc+kmDzBzPWX2JI4KfDqWXhSuK+j
jRzqwHEJojkmQfcdiBDzkpBXWNFoum4UY34iovCQcqBXlfthQ5sNHTL5tgYSVvbl
TIMih1gM+Bwee9vRwV8wWW6OxiIxaWrsZOB54m6zfJAMOgRh9WjhQe3Agfjm7eWv
wveRMmk1FkANOMiE+uo9me60BgL6rbqQKtX8gx6umY7+PMwPMtfnwu4AghyHVXzN
eoCshVbPWBIaEiI4TyqO/pJV1wPa1joUJZnDGe9myHHcjbqoFGE0v5xpyBpIt00P
nudJddRoyD8wPEZcTEZ9/KLfzq3FTi3y9qTjIeuhsIhNSWrOtIVahRS4TlsWePoU
Ru9SrqFjXABAwjjjAzVcuLjD8nr6s/57GU4WfCmE2m5C70yQk8HhBQuQTAPSmSiF
pxJJvsIZpF+sOba/8U0XMFtXPtNezsYH4UVMYL4fjk3QaLXFc6L4XwoDWTopfHWj
XuJPmoDZdQ1jYXIutZt9+eqLZD0+Nk577kDEqJln3bCg9AYCqRz65k31UvtIAQ6y
ZdtTkAiq8CMLlegEDnTa59wXSxcnBPOx51bBqDyYFXlDRYD0I6k=
=62AC
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages