Systemd terminating qubesd during backup?

8 views
Skip to first unread message

Steve Coleman

unread,
Oct 11, 2021, 11:53:11 AM10/11/21
to qubes-users
I seem to have an intermittent problem when my backup scripts are
running late at night.

My qubesd is apparently being shutdown (sent a sigterm signal) by
systemd during my long running backup sessions which then causes an eof
pipe close exception and qvm-backup then receives a socket exception and
immediately receives a second exception while still handling the first
exception, thus the qvm-backup process gets forcibly terminated mid
stream. Just prior to the qubesd shutdown I can clearly see that systemd
had also shutdown/restarted the qubes memory manager (qubes-qmemman) too.

Q: What kind of background maintenance processing would actually require
qubesd or the memory manager to be restarted?

Q: Could this processing be put on hold during backups?

Q: Or, how could I at least know when this maintenance is scheduled to
happen so I could defer my own processing?

My scripts can certainly trap this error, erase the incomplete backup
file, then loop and check for qubesd to complete its restart, and then
finally restart my own backup processing, but why should this even be
necessary?

When this happens its almost always during the backup of my largest VM
which can take well over 90 minutes to complete.  If I can somehow
block/defer this kind of system maintenance until after my backups are
complete that would be better than having to deal with trapping random
restarts.

thanks,

Steve
backup_error.txt
journalctl.txt

Steve Coleman

unread,
Oct 11, 2021, 12:22:26 PM10/11/21
to qubes-users
Sorry for the amount of text below but googlegroups would not let me
send this as *.txt attachments:

qvm-backup error:

Making a backup... 56.28%
Connection to qubesd terminated, reconnecting in 1.0 seconds

Backup error: Got empty response from qubesd. See journalctl in dom0 for
details.

Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 580,
in qubesd_call
    client_socket.connect(qubesadmin.config.QUBESD_SOCKET)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/qvm-run", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py",
line 199, in main
    args = parser.parse_args(args, app=app)
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/__init__.py",
line 384, in parse_args
    action.parse_qubes_app(self, namespace)
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/__init__.py",
line 170, in parse_qubes_app
    namespace.domains += [app.domains[vm_name]]
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 91,
in __getitem__
    if not self.app.blind_mode and item not in self:
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 114,
in __contains__
    self.refresh_cache()
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 63,
in refresh_cache
    'admin.vm.List'
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 583,
in qubesd_call
    'Failed to connect to qubesd service: %s', str(e))
qubesadmin.exc.QubesDaemonCommunicationError: Failed to connect to
qubesd service: [Errno 2] No such file or directory


journalctl output:
Oct 11 03:15:02 dom0 systemd[1]: Stopping Qubes memory management daemon...
Oct 11 03:15:02 dom0 systemd[1]: Stopped Qubes memory management daemon.
Oct 11 03:15:02 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=qubes-qmemman comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:02 dom0 kernel: audit: type=1131 audit(1633936502.556:250):
pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Oct 11 03:15:02 dom0 systemd[1]: Starting Qubes memory management daemon...
Oct 11 03:15:07 dom0 qmemman.daemon[18836]: MIN_PREFMEM=199229440
DOM0_MEM_BOOST=349175808 CACHE_FACTOR=1.3
Oct 11 03:15:07 dom0 systemd[1]: Started Qubes memory management daemon.
Oct 11 03:15:07 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=qubes-qmemman comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:07 dom0 kernel: audit: type=1130 audit(1633936507.784:251):
pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53353416961,
total_mem_pref=0, total_available_memory=53353416961)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat:
xenfree=53405845761 memset_reqs=[]
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53353416961,
total_mem_pref=0, total_available_memory=53353416961)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat:
xenfree=53405845761 memset_reqs=[]
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53353416961,
total_mem_pref=2416060825.6000004, total_available_memory=55215546215.4)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4278190080 pref=2416060825.6000004 last_target=4278190080
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat:
xenfree=53405845761 memset_reqs=[('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53336639745,
total_mem_pref=2854994739.2000003, total_available_memory=58954139085.8)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4294967296 pref=2416060825.6000004 last_target=4294967296
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '3'
act=4177526784 pref=438933913.6 last_target=4177526784
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat:
xenfree=53389068545 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 3 to
4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:07 dom0 systemd[1]: Stopping Qubes OS daemon...
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53319862529,
total_mem_pref=2769393254.4, total_available_memory=59039740570.6)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4294967296 pref=2330459340.8 last_target=4294967296
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '3'
act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat:
xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 3 to
4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:07 dom0 qubesd[3180]: caught SIGTERM, exiting
Oct 11 03:15:08 dom0 widget-wrapper[7630]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 kernel: audit: type=1131 audit(1633936508.097:252):
pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Oct 11 03:15:08 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=qubesd comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53319862529,
total_mem_pref=2620735488.0, total_available_memory=59188398337.0)
Oct 11 03:15:09 dom0 qubesd[3180]: Traceback (most recent call last):
Oct 11 03:15:09 dom0 qubesd[3180]:   File
"/usr/lib64/python3.5/runpy.py", line 193, in _run_module_as_main
Oct 11 03:15:09 dom0 qubesd[3180]:     "__main__", mod_spec)
Oct 11 03:15:09 dom0 qubesd[3180]:   File
"/usr/lib64/python3.5/runpy.py", line 85, in _run_code
Oct 11 03:15:09 dom0 qubesd[3180]:     exec(code, run_globals)
Oct 11 03:15:09 dom0 qubesd[3180]:   File
"/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 179, in <module>
Oct 11 03:15:09 dom0 qubesd[3180]:     main()
Oct 11 03:15:09 dom0 qubesd[3180]:   File
"/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 169, in main
Oct 11 03:15:09 dom0 qubesd[3180]: copy_sparse_data(input_file, output,
sparse_map)
Oct 11 03:15:09 dom0 qubesd[3180]:   File
"/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 126, in
copy_sparse_data
Oct 11 03:15:09 dom0 qubesd[3180]: output_stream.write(buf[:read])
Oct 11 03:15:09 dom0 qubesd[3180]: BrokenPipeError: [Errno 32] Broken pipe
Oct 11 03:15:09 dom0 widget-wrapper[7637]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 systemd[1]: Stopped Qubes OS daemon.
Oct 11 03:15:09 dom0 libvirtd[3204]: 2021-10-11 07:15:08.211+0000: 3204:
error : virNetSocketReadWire:1808 : End of file while reading data:
Input/output error
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4294967296 pref=2181801574.4 last_target=4294967296
Oct 11 03:15:08 dom0 systemd[1]: Starting Qubes OS daemon...
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3'
act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat:
xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to
4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53319862529,
total_mem_pref=2570187161.6, total_available_memory=59238946663.4)
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4294967296 pref=2131253248.0 last_target=4294967296
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3'
act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat:
xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:09 dom0 widget-wrapper[7630]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7637]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to
4194304000
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]:
balance_when_enough_memory(xen_free_memory=53319862529,
total_mem_pref=2436657152.0, total_available_memory=59372476673.0)
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0'
act=4294967296 pref=1997723238.4 last_target=4294967296
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3'
act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat:
xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to
4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to
4294967296
Oct 11 03:15:10 dom0 widget-wrapper[7630]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7637]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7630]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7637]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7625]: Connection to qubesd
terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 systemd[1]: Started Qubes OS daemon.
Oct 11 03:15:11 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=qubesd comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:11 dom0 kernel: audit: type=1130 audit(1633936511.288:253):
pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'

Rusty Bird

unread,
Oct 11, 2021, 2:43:42 PM10/11/21
to Steve Coleman, qubes-users
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Steve Coleman:
> I seem to have an intermittent problem when my backup scripts are running
> late at night.
>
> My qubesd is apparently being shutdown (sent a sigterm signal) by systemd
> during my long running backup sessions which then causes an eof pipe close
> exception and qvm-backup then receives a socket exception and immediately
> receives a second exception while still handling the first exception, thus
> the qvm-backup process gets forcibly terminated mid stream. Just prior to
> the qubesd shutdown I can clearly see that systemd had also
> shutdown/restarted the qubes memory manager (qubes-qmemman) too.
>
> Q: What kind of background maintenance processing would actually require
> qubesd or the memory manager to be restarted?

It's crond running logrotate. Fixed in R4.1 but not yet in R4.0:
https://github.com/QubesOS/qubes-issues/issues/5004

> Q: Could this processing be put on hold during backups?

I always sandwich backup runs between 'systemctl stop crond' and
'systemctl start crond'.

Rusty
-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEEhLWbz8YrEp/hsG0ERp149HqvKt8FAmFkfVZfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDg0
QjU5QkNGQzYyQjEyOUZFMUIwNkQwNDQ2OUQ3OEY0N0FBRjJBREYACgkQRp149Hqv
Kt/dHQ/+LXkv+7GT64IgMXpwQjKMuDMswRPsEvXpGihJYohSioeGm6HvUNQ1uh99
zwih5PqT1OnJrIfgzRvqLPfTbTpO3YrkQZi32TY848M14rg1A5NtTHIzOthJP2+x
puZy7hMcFvZR4awv3g/4WVLRsMrUVuQ5x3n/xzId5MAycWrsQtt8bRH+kc3VHgMn
Jul0TPzD1jnmN36Ngpgr5UGe4LrNfR4IcRaG5mzoI/4tIA4mtkcmm1AwqBnR3KrQ
0Gclbk/F5ud/Nj4EwIQDFWnrBa70Hru7HZ+wj7V7shJD2fMuQVsBcys8EVJjpGQp
qFzy6AfvdIHJ1BmtoJM+v8cqo2AU+kVdRx0n2cTXCkHM5eN1nCpc6EO81HoB6ypE
8773aw8l/Y5fCowT0p/y4c8jTUd3/YiNFpJSPiRfLJDAk4PunnynM9IPVY+GnJUX
6HB8tXDn6vx5Gq90f0UjNkcWonJ/bt5GMYGpWBtHGQyQ0dh+mxhmwnT5hYnfNMW+
+pAcBB4fRclebEroW1RcLqe7FVZ1dldNqZQIL6WlQgU8/I06/JNNZkhjmlpdBDX5
FwNGFw7b2ss+4A+LdC6Q4IJ1S2LyOeIQtCCX3UIffhcvOEMmJSxkHBb65e1kGAxe
5Lsfcjmq6rK1jCYMC5or177Mo9u+PPxWxvr5p4zocBu7VyUUCdM=
=zaUv
-----END PGP SIGNATURE-----

Steve Coleman

unread,
Oct 11, 2021, 4:24:48 PM10/11/21
to qubes-users
On 10/11/21 14:07, Rusty Bird wrote:
> Steve Coleman:
> > I seem to have an intermittent problem when my backup scripts are
> running
> > late at night.
>
> > My qubesd is apparently being shutdown (sent a sigterm signal) by
> systemd
> > during my long running backup sessions which then causes an eof pipe
> close
> > exception and qvm-backup then receives a socket exception and
> immediately
> > receives a second exception while still handling the first
> exception, thus
> > the qvm-backup process gets forcibly terminated mid stream. Just
> prior to
> > the qubesd shutdown I can clearly see that systemd had also
> > shutdown/restarted the qubes memory manager (qubes-qmemman) too.
>
> > Q: What kind of background maintenance processing would actually require
> > qubesd or the memory manager to be restarted?
>
> It's crond running logrotate. Fixed in R4.1 but not yet in R4.0:
> https://github.com/QubesOS/qubes-issues/issues/5004
>
> > Q: Could this processing be put on hold during backups?
>
> I always sandwich backup runs between 'systemctl stop crond' and
> 'systemctl start crond'.
>
Great!

I was hoping for an answer like that. In my case the system will be
shutting down after the unattended backup completes, so this is actually
twice as easy!

thanks!

> Rusty
>


Marek Marczykowski-Górecki

unread,
Oct 12, 2021, 3:31:27 PM10/12/21
to Steve Coleman, qubes-users
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Mon, Oct 11, 2021 at 11:52:59AM -0400, Steve Coleman wrote:
> I seem to have an intermittent problem when my backup scripts are running
> late at night.
>
> My qubesd is apparently being shutdown (sent a sigterm signal) by systemd
> during my long running backup sessions which then causes an eof pipe close
> exception and qvm-backup then receives a socket exception and immediately
> receives a second exception while still handling the first exception, thus
> the qvm-backup process gets forcibly terminated mid stream. Just prior to
> the qubesd shutdown I can clearly see that systemd had also
> shutdown/restarted the qubes memory manager (qubes-qmemman) too.
>
> Q: What kind of background maintenance processing would actually require
> qubesd or the memory manager to be restarted?

I guess that's logrorate (but it isn't clear to me why qubesd too, not
just qubes-qmemman service...).

> Q: Could this processing be put on hold during backups?
>
> Q: Or, how could I at least know when this maintenance is scheduled to
> happen so I could defer my own processing?

If that's indeed logrotate, see `systemctl status logrotate.timer`

> My scripts can certainly trap this error, erase the incomplete backup file,
> then loop and check for qubesd to complete its restart, and then finally
> restart my own backup processing, but why should this even be necessary?
>
> When this happens its almost always during the backup of my largest VM which
> can take well over 90 minutes to complete.  If I can somehow block/defer
> this kind of system maintenance until after my backups are complete that
> would be better than having to deal with trapping random restarts.

Again, if that's logrotate, you can stop the timer before, and restart it
afterwards.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAmFl4ocACgkQ24/THMrX
1ywC1gf8ChUCgVxe/SIHnVF8+hoA8/JoKlQTyddVGfjf5qp3a49vMtUbvMJ0hqWH
C5Ejz5dF+UxkqEkcXS1/oKlgnBIwal+GYEe1u2VwZyLmB9WV725ukzi8MYCOVKvM
amJuLGcD7vT2T6akTxnOLta+ofByJhSWgup7eEnXYfuOQ3hp8fKVWJ451QWbOafV
oY1TPJ/ptfl2ynUePyX8R990RnHS6V7LoBShbI5CB4e4g+6ogm9ibVC94nk9d+pD
PcIF9c+iC8jhVXMzkoP73KR7WEJ74WmrcefigoffwDLIjXwNPiQqwC4kGmq32rfG
7TVjl9Rv8DpUNfVTtDq+8xjOx3Ck3Q==
=6Alm
-----END PGP SIGNATURE-----

Andrew David Wong

unread,
Oct 12, 2021, 4:00:51 PM10/12/21
to Marek Marczykowski-Górecki, Steve Coleman, qubes-users
This sounds like:

https://github.com/QubesOS/qubes-issues/issues/5004

I agree that it's a serious bug. It makes no sense for logrotate to
interrupt backups. Backups completing successfully and reliably is
infinitely more important than rotating log files.

It looks like the issue has been fixed in 4.1, but I'm still
experiencing on 4.0, as well. I've just gotten in the habit of trying
not to let my backups run between ~1-6am. :\

--
Andrew David Wong
Community Manager
The Qubes OS Project
https://www.qubes-os.org

OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages