High dom0 CPU usage by qubesd

47 views
Skip to first unread message

Vít Šesták

unread,
Jan 4, 2021, 5:51:23 PMJan 4
to qubes-users
Hello,
I have dual-core i7 7500U with disabled hyperthreading. In dom0, I often have total CPU usage in tens of percents (often about 50 %, i.e., about fully utilized single core). When I look at htop in dom0, it is clearly caused by qubesd, which clearly uses the vast majority of CPU during these peaks. Note that these peaks look rather random, I see no relation to any activity. But they are quite frequent.

When looking at the process tree, it has many child processes, probably one for each domU qube. But they utilize near zero CPU.

The column TIME+ confirms my CPU% observation in long term.

I am not sure where to find any relevant log. Maybe journalctl, but I have seen nothing suspicious there.

Do you have any idea about the cause, solution or even a suggestion for debugging?

Regards,
Vít Šesták 'v6ak'

Vít Šesták

unread,
Jan 6, 2021, 5:03:34 AMJan 6
to qubes-users
Hi,
I have some further info. I partially know cause and have a workaround.

There is my investigation. Some minor inaccuracies might be caused by retrospective writing:

1. I have tried to debug using strace. (Prerequisite: sudo qubes-dom0-install strace) After finding pid of qubesd, I ran:
sudo strace -s 256 -p PID_OF_QUBESD -o /tmp/qubesd.log

It looks like few seconds is enough to get a reasonable sample, see below.

2. I ran sort /tmp/qubesd.log | uniq -c | sort -n (one can also add “ -r | head -n 50”).

I have noticed an interesting line that repeats frequently:
sendto(270, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25

3. Look closer:
$ grep --before=5 --after=5 QubesNoSuchPropertyError /tmp/qubesd.log
The output contains many repeated occurrences of this, just with a different VM name. It seems to iterate over all the VMs (even those that are not running):
--
epoll_wait(3, [], <some_number>, 0)                = 0
getpid()                                = <some_number>
epoll_wait(3, [], <some_number>, 0)                = 0
epoll_wait(3, [], <some_number>, 0)                = 0
sendto(<some_number>, "2\0", 2, 0, NULL, 0)       = 2
sendto(<some_number>, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25
sendto(<some_number>, "\0", 1, 0, NULL, 0)        = 1
sendto(<some_number>, "Invalid property 'internal' of <some-vm-name>\0", 38, 0, NULL, 0) = 38
shutdown(<some_number>, SHUT_WR)                  = 0
epoll_wait(3, [{EPOLLIN, {u32=<some_number>, u64=<some_number>}}], 18, 0) = 1
close(<some_number>)                              = 0

4. WTF, what would iterate over all the VMs? Maybe some script repeatedly runs qvm-ls? Let's ps aux | grep qvm-ls that! During increased CPU workload, I have identified:

qvm-ls --no-spinner --raw-data --fields NAME,FLAGS

5. During the current random CPU workload, I cannot reliably verify if it is the cause of the increased CPU usage, but at least I can verify if it is the cause of the error messages. So, I have tried the command while running this:

(sudo strace -s 256 -p PID_OF_QUBESD 2>&1) | grep 'Invalid property'

And yes, it seems to be the cause of the error messages and maybe also the source of increased CPU load.

6. Let's identify the script that runs the command: I ran htop, switched to tree mode (key t), waited for the qvm-ls (using watch + ps aux + grep) and typed “/qvm-ls”.

And the script to blame is – qubes-i3status

7. And yes, killing qubes-i3status has helped to decrease the CPU load. After doing that, I was able to confirm that qvm-ls --no-spinner --raw-data --fields NAME,FLAGS also causes the CPU load.


So, there are multiple causes combined:

* I have many VMs in my computer.
* I use i3 with qubes-i3status
* The script qubes-i3status calls command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS quite frequently.
* The command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS seems to cause high CPU load. Unfortunately, the process that shows the high CPU usage is qubesd, not qvm-ls.

What can be improved:

a. Don't use qubes-i3status. Problem solved.
b. Optimize qvm-ls. Not sure how hard it is.
c. Optimize qubes-i3status. I am not sure about the ideal way of doing that, but clearly running qvm-ls --no-spinner --raw-data --fields NAME,FLAGS just to compute the number of running qubes is far from optimal. One could add --running. And maybe it could have been written without flags. The script just ignores VMs with the first flag being “0” (maybe in order to ignore dom0) and the second flag being “r” (probably not needed with --running).

Regards,
Vít Šesták 'v6ak'

Jarrah

unread,
Jan 6, 2021, 5:34:42 AMJan 6
to qubes...@googlegroups.com
This is some really nice tracing work. I'm sure it would be appreciated
as an issue in the qubes-issues repository so it can be tracked properly.

While I haven't gone to the same depth, I can confirm that `qubesd`
jumps to ~25% CPU regularly on my (albeit much beefier) system with i3.
This does correlate with qubes-i3status running on my system as well.


As a temporary work around, you could modify the script
(/usr/bin/qubes-i3status:123) to run every minute or longer. This would
have the downside of the clock updating slower, but otherwise should not
be a problem.

Alternatively, if the number of running VMs doesn't interest you, you
could comment out line 113 and modify 122 to suit this.

Vít Šesták

unread,
Jan 14, 2021, 1:49:02 PMJan 14
to qubes-users
OK, reported, some optimization attempts included: https://groups.google.com/g/qubes-users/c/uTi3QHuhdy8

Also, I have some temptation to reimplement qubes-i3status as a Python wrapper around the original i3status. We would probably also resolve some other problems. For example, I had to fix reading of the battery status.

BTW, if you have ~25% CPU load, I guess you just have quad-core CPU (or maybe dual-core with hyperthreading).

Regards,
Vít Šesták 'v6ak'

David Hobach

unread,
Jan 15, 2021, 11:40:38 AMJan 15
to Vít Šesták, qubes-users
Hi Vit,

> * I have many VMs in my computer.
> * I use i3 with qubes-i3status
> * The script qubes-i3status calls command qvm-ls --no-spinner --raw-data
> --fields NAME,FLAGS quite frequently.
> * The command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS seems to
> cause high CPU load. Unfortunately, the process that shows the high CPU
> usage is qubesd, not qvm-ls.
>
> What can be improved:
>
> a. Don't use qubes-i3status. Problem solved.
> b. Optimize qvm-ls. Not sure how hard it is.

This issue is really old (back from at least 3.2) and caused by each qvm-ls line relating to one request to qubesd. Actually it was even worse with 3.2.

It should improve with 4.1 though, see [1].

[1] https://github.com/QubesOS/qubes-issues/issues/3293

> c. Optimize qubes-i3status. I am not sure about the ideal way of doing
> that, but clearly running qvm-ls --no-spinner --raw-data --fields
> NAME,FLAGS just to compute the number of running qubes is far from optimal.
> One could add --running. And maybe it could have been written without
> flags. The script just ignores VMs with the first flag being “0” (maybe in
> order to ignore dom0) and the second flag being “r” (probably not needed
> with --running).

Filtering might work in the meantime, yes.

BR
David

Vít Šesták

unread,
Jan 18, 2021, 8:39:09 AMJan 18
to qubes-users
BTW, I've started the reimplementation of qubes-i3status as a Python wrapper around i3status. I am trying to be quite conservative – in the default settings, there should be no visible difference except CPU load, periodic freezes and bug fixes (battery status).

* Some indicators (battery, load and time) are already present, they just need some adjustments of the format in order to be a drop-in replacement.
* Disk status was easy to implement. I just need to verify that it can properly handle the change of default pool.
* Running qubes: I need to study the events deeper…
* NetVM status – currently, it is disabled and discouraged. I might decide to reimplement this, but I am not 100% sure right now.

Regards,
Vít Šesták 'v6ak'

Vít Šesták

unread,
Jan 19, 2021, 12:48:09 PMJan 19
to qubes-users
Although my implementation is not fully complete, I have decided to share my progress: https://github.com/v6ak/qubes-i3status-dir

It is available under a WTFPL-like license.

Regards,
Vít Šesták 'v6ak'
Reply all
Reply to author
Forward
0 new messages