Qubes R4.0 broken by "TypeError: not enough arguments..." for most qvm-* commands

85 views
Skip to first unread message

Pablo Di Noto

unread,
Apr 10, 2018, 4:29:48 PM4/10/18
to qubes-users
Hello,

I am running a Qubes R4.0 install on Thinkpad X250, fully `*-testing` updated, and after restoring most of my R3.2 qubes and getting almost ready to switch fully to the new version, something happened with `qubesd` or `qubes-db-dom0`.

Most `qvm-*` command fail with:

```
Traceback (most recent call last):
File "/usr/bin/qvm-start", line 5, in <module>
sys.exit(main())
File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_start.py", line 179, in main
domain.start()
File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 100, in start
self.qubesd_call(self._method_dest, 'admin.vm.Start')
File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
payload_stream)
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 483, in qubesd_call
return self._parse_qubesd_response(return_data)
File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 102, in _parse_qubesd_response
raise exc_class(format_string, *args)
File "/usr/lib/python3.5/site-packages/qubesadmin/exc.py", line 29, in __init__
message_format % tuple(int(d) if d.isdigit() else d for d in args),
TypeError: not enough arguments for format string
```

The first occurrence was when attempting to restore a backup:
```
qubesadmin.backup: Error restoring VM p-android, skipping: not enough arguments for format string
```

When the issue started, all qubes that were active were still operating. And `qvm-run <qube> <command>` worked fine on them. Other commands like `qvm-ls` work ok.

After a reboot, all ServiceVMs are not starting, so the complete system is broken.

I did some analysis on the traceback and the error seems to be simply that an Exception is raised without any data. But I do not know how to enable more debugging or provide more specific data.

Tryed to restart `dom0` daemons, look into /var/log/qubes/* and /var/log/libvirt/* without getting any clues about what is failing.

Only change from stock install is that I had to `lvextend` the default `pool00` volume to have enough disk space, and the extension segment is on a second, permanent SSD.

Should I raise an issue?
Any debugging tips?

Cheers,
///Pablo

Pablo Di Noto

unread,
Apr 10, 2018, 8:44:35 PM4/10/18
to qubes-users
It seems that there are other users with similar issues: https://github.com/QubesOS/qubes-issues/issues/3810
(not 100% sure is the same issue, but I have seen that message from UI tools while having this problem)

Pablo Di Noto

unread,
Apr 10, 2018, 9:16:09 PM4/10/18
to qubes-users
And also, there is possible reason for this to happen as stated [here](https://github.com/QubesOS/qubes-issues/issues/3809).

techg...@gmail.com

unread,
Apr 11, 2018, 11:20:36 PM4/11/18
to qubes-users
On Wednesday, 11 April 2018 08:29:48 UTC+12, Pablo Di Noto wrote:
> Hello,
>
> Any debugging tips?

You have almost exactly described my scenario.

You can work around the TypeError exception by editing /usr/lib/python3.5/site-packages/qubesadmin/exc.py, as I did. Please note, however, that the lines _must_ be indented with spaces (not tabs); as Python3 is very particular about indentation style.

If, as I suspect, the root cause of your problem is a lack of metadata space on pool00; you can confirm this by typing "sudo lvs" into a console. You will then need to figure out a way to enlarge that metadata volume.

awokd

unread,
Apr 12, 2018, 5:21:12 AM4/12/18
to techg...@gmail.com, qubes-users
That's certainly a non-intuitive failure mode. How did you find that? I'm
not experiencing it myself, but what would one look for in "sudo lvs"-
Meta% at 100% on one of the pools?


Pablo Di Noto

unread,
Apr 12, 2018, 11:19:10 AM4/12/18
to qubes-users
> techg...@gmail.com

> If, as I suspect, the root cause of your problem is a lack of metadata space on pool00; you can confirm this by typing "sudo lvs" into a console. You will then need to figure out a way to enlarge that metadata volume.

Yes, you are right, the `pool00` volume metadata was >96% when this happened. The thing is that the volume metadata was set to a quite small size after install (96mb on a 46gb pool) and after install was on ~20% usage. I started to use the system, testing stuff with DispVMs, restoring my debian templates and some work VMs. After a couple of days of usage the metadata climbed very little, to 27-28%.

I tried to have a second pool to hold my machines, precisely to avoid issues with thin provisioning on the pool holding `root` and `swap` and services vms. But the lack of support for cloning/moving between pools made that effort moot.

So I `lvextend`ed `pool00` and forgot to properly enlarge it's `pool00_tmeta` counterpart.

When doing some more customization, including restoring more larger sized qubes and cloning/renaming qubes it seems the metadata usage climbed really fast and hit this bug.

Unfortunately, could not recover from that.

It looks like qubes lvm actions while metadata was full may have corrupted the metadata somehow, since I could enlarge and repair the thin metadata from a live cd, but many of the volumes that where in use where never available again. The -private and -snap for the qubes that were running (not sure how to discard them) and also all the volumes of the qubes being restored and services vms are lost ("NOT available" as lvm status)

I remember there was some Saltstack magic to recreate the services vms, but could not find anything for R4.0... So I had to revert to R3.2 for the time being.

I will keep the failing install for debugging, or may be able to recover if someone can provide any tips about:

- How to recreate sys-net, sys-firewall and sys-usb on a R4.0 system
- how to recover a qube whose -snap volumes are no longer available (I have no problem losing these short-term data)

Thanks for pointing to the right direction!

Marek Marczykowski-Górecki

unread,
Apr 12, 2018, 2:44:40 PM4/12/18
to Pablo Di Noto, qubes-users
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Thu, Apr 12, 2018 at 08:19:10AM -0700, Pablo Di Noto wrote:
> > techg...@gmail.com
> > If, as I suspect, the root cause of your problem is a lack of metadata space on pool00; you can confirm this by typing "sudo lvs" into a console. You will then need to figure out a way to enlarge that metadata volume.
>
> Yes, you are right, the `pool00` volume metadata was >96% when this happened. The thing is that the volume metadata was set to a quite small size after install (96mb on a 46gb pool) and after install was on ~20% usage. I started to use the system, testing stuff with DispVMs, restoring my debian templates and some work VMs. After a couple of days of usage the metadata climbed very little, to 27-28%.
>
> I tried to have a second pool to hold my machines, precisely to avoid issues with thin provisioning on the pool holding `root` and `swap` and services vms. But the lack of support for cloning/moving between pools made that effort moot.
>
> So I `lvextend`ed `pool00` and forgot to properly enlarge it's `pool00_tmeta` counterpart.

What sizes you have there?
For me tmeta is 118MB for a ~450GB pool00. And after few months of usage
it's still at 33%...

> When doing some more customization, including restoring more larger sized qubes and cloning/renaming qubes it seems the metadata usage climbed really fast and hit this bug.
>
> Unfortunately, could not recover from that.
>
> It looks like qubes lvm actions while metadata was full may have corrupted the metadata somehow, since I could enlarge and repair the thin metadata from a live cd, but many of the volumes that where in use where never available again. The -private and -snap for the qubes that were running (not sure how to discard them) and also all the volumes of the qubes being restored and services vms are lost ("NOT available" as lvm status)

You could also try to revert to earlier revision using "qvm-volume
revert sys-net:private" for example.

> I remember there was some Saltstack magic to recreate the services vms, but could not find anything for R4.0... So I had to revert to R3.2 for the time being.

https://www.qubes-os.org/doc/salt/
Especially links at the bottom:
https://github.com/QubesOS/qubes-mgmt-salt-dom0-virtual-machines/blob/master/README.rst

> I will keep the failing install for debugging, or may be able to recover if someone can provide any tips about:
>
> - How to recreate sys-net, sys-firewall and sys-usb on a R4.0 system
> - how to recover a qube whose -snap volumes are no longer available (I have no problem losing these short-term data)
>
> Thanks for pointing to the right direction!
>


- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAlrMuusACgkQ24/THMrX
1yy3lQf/cO0oe9uOUviiKwgdf6+fEzhCbn6XUkmAU7MLLAkYC1uCAwE3DoT8MBGt
bbGkpmWq9gijUCJeWzUD0Z2k1QkZWDdiMgEE8nSgiqyS1O6uNxqqO0ucozWe69Ud
FWwmxkCATwX+FK239+HJSO9Jq6/Izb59qbvB1kwewQheqGkZVF9ISNE3AopkMjG8
4RBy1J0dVjHH3wxHtl9N3Z6/4mVwFquLwlE7cM+kTRpFfPtwvrBrNavfYTrEX5lz
ALBvsh/eunXBOmc4FNSGHj2yaKnNZibfBVDOoBGaexXt1G0ykpu9aou8tQrKv0zl
FqhhNHp9DeOdHm3kP0h1d6PZW1EGiw==
=5Ksa
-----END PGP SIGNATURE-----

Pablo Di Noto

unread,
Apr 12, 2018, 4:20:36 PM4/12/18
to qubes-users
> > > If, as I suspect, the root cause of your problem is a lack of metadata space on pool00; you can confirm this by typing "sudo lvs" into a console. You will then need to figure out a way to enlarge that metadata volume.
> >
> > Yes, you are right, the `pool00` volume metadata was >96% when this happened. The thing is that the volume metadata was set to a quite small size after install (96mb on a 46gb pool) and after install was on ~20% usage. I started to use the system, testing stuff with DispVMs, restoring my debian templates and some work VMs. After a couple of days of usage the metadata climbed very little, to 27-28%.
> >
> > I tried to have a second pool to hold my machines, precisely to avoid issues with thin provisioning on the pool holding `root` and `swap` and services vms. But the lack of support for cloning/moving between pools made that effort moot.
> >
> > So I `lvextend`ed `pool00` and forgot to properly enlarge it's `pool00_tmeta` counterpart.
>
> What sizes you have there?
> For me tmeta is 118MB for a ~450GB pool00. And after few months of usage
> it's still at 33%...

An install on a 60G disk partition had a pool00 of 43G created with a pool00_tmeta of 44M (11 extents). Later, the pool00 was extended to 147.7G and the pool00_tmeta left as is by mistake.

The metadata became full, and after that the pool00_tmeta was extended to 300M by adding 256M.

> > When doing some more customization, including restoring more larger sized qubes and cloning/renaming qubes it seems the metadata usage climbed really fast and hit this bug.
> >
> > Unfortunately, could not recover from that.
> >
> > It looks like qubes lvm actions while metadata was full may have corrupted the metadata somehow, since I could enlarge and repair the thin metadata from a live cd, but many of the volumes that where in use where never available again. The -private and -snap for the qubes that were running (not sure how to discard them) and also all the volumes of the qubes being restored and services vms are lost ("NOT available" as lvm status)
>
> You could also try to revert to earlier revision using "qvm-volume
> revert sys-net:private" for example.

Will try that tonight.

> > I remember there was some Saltstack magic to recreate the services vms, but could not find anything for R4.0... So I had to revert to R3.2 for the time being.
>
> https://www.qubes-os.org/doc/salt/
> Especially links at the bottom:
> https://github.com/QubesOS/qubes-mgmt-salt-dom0-virtual-machines/blob/master/README.rst

Thanks. Will also try that tonight.

Brendon Green

unread,
Apr 12, 2018, 5:20:24 PM4/12/18
to aw...@danwin1210.me, qubes-users
Check for Meta% above 80%

Pablo Di Noto

unread,
Apr 12, 2018, 8:33:34 PM4/12/18
to qubes-users

> > > > You could also try to revert to earlier revision using "qvm-volume
> > revert sys-net:private" for example.
>
> Will try that tonight.
>

Unfortunately, that was not possible for the service VMs.
For the only important AppVM that had -back and -back missing, the command `qvm-block revert p-2018d:private` was enough.

> > > I remember there was some Saltstack magic to recreate the services vms, but could not find anything for R4.0... So I had to revert to R3.2 for the time being.
> >
> > https://www.qubes-os.org/doc/salt/
> > Especially links at the bottom:
> > https://github.com/QubesOS/qubes-mgmt-salt-dom0-virtual-machines/blob/master/README.rst
>
> Thanks. Will also try that tonight.

The way I recreated all sys-* service VMs was:

# You need to unset sys-firewall as netvm for AppVMs
# and unset sys-net as netvm for sys-firewall:
dom0$ qvm-prefs --set <qubes*> netvm ''
dom0$ qvm-prefs --set sys-firewall netvm ''

# The delete all service VMs qubes:
dom0$ qvm-remove sys-usb
dom0$ qvm-remove sys-firewall
dom0$ qvm-remove sys-net

# Make sure the right salt top files are enabled:
dom0$ sudo qubesctl top.enable qvm.sys-net
dom0$ sudo qubesctl top.enable qvm.sys-firewall
dom0$ sudo qubesctl top.enable qvm.sys-usb
dom0$ sudo qubesctl state.highstate

and that should be all. You have to restore 'sys-firewall' as netvm for all the qubes you want networking and thing should be pretty much as they were after install. Note that all new service VMs will have different IP address than their deceased counterparts, but firewall would be able to cope with that.

Ah! The joy! Now back to R4.0 without massive reinstall.

Thanks Marek, awokd and techg...!

awokd

unread,
Apr 12, 2018, 9:34:14 PM4/12/18
to Pablo Di Noto, qubes-users
On Fri, April 13, 2018 12:33 am, Pablo Di Noto wrote:

> Ah! The joy! Now back to R4.0 without massive reinstall.
>
>
> Thanks Marek, awokd and techg...!

I think you taught me more than I helped on this one! Thanks for sharing
your resolution.



Brendon Green

unread,
Apr 17, 2018, 1:48:16 AM4/17/18
to aw...@danwin1210.me, Pablo Di Noto, qubes-users
According to the LVM documentation, the thin pool will become read-only for operations that require a change to metadata in the event of the metadata volume becoming exhausted. It seems I was lucky enough to experience the system failure prior to data loss occurring (although that would have cost only another reinstall, as it was my second day using the new OS).

--
You received this message because you are subscribed to a topic in the Google Groups "qubes-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qubes-users/SV_BlWTvB2g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qubes-users...@googlegroups.com.
To post to this group, send email to qubes...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-users/04dac88f740f67b95a41f3d69060abe7.squirrel%40tt3j2x4k5ycaa5zt.onion.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages