Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Bug#998090: libvirt-daemon-system: Please defer starting libvirtd.socket until libvirtd.service dependencies can be met

85 views

Skip to first unread message

Ron

unread,

Oct 30, 2021, 3:30:04 AM10/30/21

Package: libvirt-daemon-system
Version: 7.0.0-3
Severity: important

Hi,

Systemd has a class of boot-time races which can result in deadlock,
which I learned more than I ever wanted to know about when Buster to
Bullseye upgrades started leaving me with machines that were off the
network when they were rebooted ... The reason for that is a bit of a
tangle of otherwise unrelated packages, and there are many ways this
*could* happen, but the root of it in my particular case was the libvirt
package switching to use socket activation instead of letting the daemon
create its own socket when it is ready to respond to requests on it.

The race occurs because the .socket unit creates the libvirt control
socket very early in the boot, before even the network-pre target is
reached, and so long before the libvirtd.service dependencies are
satisfied and the daemon itself can be started to handle requests.

The deadlock in my case occurs when a udev rule for a device already
attached at boot tries to assign that device to a VM.

Prior to Bullseye, what would occur is:

The udev rule calls a small script on device hot/cold plug which
checks a config file, and if the device is allocated to a VM, then
calls virsh to attach it to that VM.

This 'immediately' either succeeds, fails because the desired VM
is not actually running (yet), or fails because libvirtd is not
running and virsh did not find its socket present.

If either of the failure cases occur, the calling script fails
gracefully, and a QEMU hook will later handle attaching the device
if/when libvirtd and the desired VM is actually started.

But in Bullseye there's a three-way race, and if the zombie socket is
created before the udev rule runs, then virsh connects to it, but hangs
indefinitely waiting for libvirtd.service to be able to start and
respond to the request.

The deadlock in this specific case then happens when ifupdown-pre (but
it could be any of many other things) calls udevadm settle to give the
initial network devices a chance to be fully set up and available before
the networking.service brings them up.

Which in turn then hangs waiting for the (otherwise unrelated) udev rule
above to complete, which won't happen until libvirtd is started, which
won't happen until the udev rule returns (or udevadm settle times out)
and network.target (among others) can be reached.

Everything stops for two minutes until the systemd "bug solver" of
arbitrary timeouts starts killing things, and the machine finishes
booting without any network devices.

The latter can be avoided (in most cases at least) with a tweak to
the networking.service dependencies (the bug I've reported here
https://bugs.debian.org/998088 has more of the gory details of this
problem from the perspective of ifupdown's entanglement in it).

But we can avoid this specific incarnation of it completely if the
libvirtd.socket unit declared the same ordering dependencies as the
libvirtd.service does, so that anything calling virsh, at any time,
can reasonably expect an answer in finite time instead of blocking
indefinitely to wait for a service (that systemd already knows
does not even have the basic preconditions to make it eligible to
start yet but ignores that to create the socket anyway).

Unless systemd gets smarter about this, there may always be a race
with the possibility of circular deadlocks if creation of the
socket and responding to requests for it are not atomic with the
creation of the service using it - so it may actually be better to
just go back to letting the daemon create and manage the socket
itself (as its "activation" signal to users of that socket) - but
we can at least narrow the window for losing it significantly if
we defer creation of the socket until at least the point where
systemd thinks it can attempt to start the daemon (though with
no guarantee of success at that still ...)

I hope I haven't missed anything that makes this make sense in the
context of libvirt ... trying to look at and describe this from four
entirely independent points of view, each that doesn't directly care
about any of the others, is a bit of a hall of mirrors with small
parts of the problem stuck to each of them!

Cheers,
Ron

-- System Information:
Debian Release: 11.1
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-9-amd64 (SMP w/12 CPU threads)
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages libvirt-daemon-system depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.77
ii gettext-base 0.21-4
ii iptables 1.8.7-1
ii libvirt-clients 7.0.0-3
ii libvirt-daemon 7.0.0-3
ii libvirt-daemon-config-network 7.0.0-3
ii libvirt-daemon-config-nwfilter 7.0.0-3
ii libvirt-daemon-system-systemd 7.0.0-3
ii logrotate 3.18.0-2
ii policykit-1 0.105-31

Michael Biebl

unread,

Nov 2, 2021, 10:40:05 AM11/2/21

On Sat, 30 Oct 2021 17:39:45 +1030 Ron <r...@debian.org> wrote:
> Package: libvirt-daemon-system
> Version: 7.0.0-3
> Severity: important
>
> Hi,
>
> Systemd has a class of boot-time races which can result in deadlock,
> which I learned more than I ever wanted to know about when Buster to
> Bullseye upgrades started leaving me with machines that were off the
> network when they were rebooted ... The reason for that is a bit of a
> tangle of otherwise unrelated packages, and there are many ways this
> *could* happen, but the root of it in my particular case was the libvirt
> package switching to use socket activation instead of letting the daemon
> create its own socket when it is ready to respond to requests on it.
>
> The race occurs because the .socket unit creates the libvirt control
> socket very early in the boot, before even the network-pre target is
> reached, and so long before the libvirtd.service dependencies are
> satisfied and the daemon itself can be started to handle requests.

There is nothing to fix on the libvirt / ifupdown side here.

The bug is in bit-babbler which triggers the start of a long running process
from a udev rules file (which it shouldn't do), which causes the dead lock
in the end.

I tried to explain this to Ron on IRC, but he decided to ignore my advice.

Please ignore this bug report.
If you have further questions, feel free to contact me.

Regards,
Michael

signature.asc

Guido Günther

unread,

Nov 3, 2021, 2:20:03 PM11/3/21

Hi Ron,

Sorry for the broken boot. That's always annoying.

On Sat, Oct 30, 2021 at 05:39:45PM +1030, Ron wrote:
> Package: libvirt-daemon-system
> Version: 7.0.0-3
> Severity: important
>
> Hi,
>
> Systemd has a class of boot-time races which can result in deadlock,
> which I learned more than I ever wanted to know about when Buster to
> Bullseye upgrades started leaving me with machines that were off the
> network when they were rebooted ... The reason for that is a bit of a
> tangle of otherwise unrelated packages, and there are many ways this
> *could* happen, but the root of it in my particular case was the libvirt
> package switching to use socket activation instead of letting the daemon
> create its own socket when it is ready to respond to requests on it.
>
> The race occurs because the .socket unit creates the libvirt control
> socket very early in the boot, before even the network-pre target is
> reached, and so long before the libvirtd.service dependencies are
> satisfied and the daemon itself can be started to handle requests.
>
> The deadlock in my case occurs when a udev rule for a device already
> attached at boot tries to assign that device to a VM.
>
> Prior to Bullseye, what would occur is:
>
> The udev rule calls a small script on device hot/cold plug which
> checks a config file, and if the device is allocated to a VM, then
> calls virsh to attach it to that VM.

Is that a sync call to virsh from udev via RUN ?

> This 'immediately' either succeeds, fails because the desired VM
> is not actually running (yet), or fails because libvirtd is not
> running and virsh did not find its socket present.
>
> If either of the failure cases occur, the calling script fails
> gracefully, and a QEMU hook will later handle attaching the device
> if/when libvirtd and the desired VM is actually started.
>
> But in Bullseye there's a three-way race, and if the zombie socket is
> created before the udev rule runs, then virsh connects to it, but hangs
> indefinitely waiting for libvirtd.service to be able to start and
> respond to the request.

So far sounds like expected behaviour for socket activation.

it sounds like the problem comes about because:

- there's sync call for a (potentially) long runnig program invocation
in a udev rule which udev recommends against in it's man page

This can only be used for very short-running foreground tasks. Running an event process for a long period of time may block all further events for this or a dependent device.

(virsh can take a long time even when libvirtd is up)

- ifupdown invokes udevadm settle which waits for the above and things
go downhill.

I'm not into systemd's details of socket activation but from the above
I'm not yet convinced there's anything on the libvirt side to fix. The
best argument I would by is: worked before, don't regress on upgrades
(which could also be fixed by a mention in the NEWS file.

>
> I hope I haven't missed anything that makes this make sense in the
> context of libvirt ... trying to look at and describe this from four
> entirely independent points of view, each that doesn't directly care
> about any of the others, is a bit of a hall of mirrors with small
> parts of the problem stuck to each of them!
>
> Cheers,
> Ron

[..following up on another mail in this thread..]

> On Tue, 02 Nov 2021 15:34:01 +0100 Michael Biebl <bi...@debian.org> wrote:
> > I tried to explain this to Ron on IRC, but he decided to ignore my advice.
>

> Oh Please Michael, now you're just sounding like a child whose lolly has
> fallen in the dirt ...

Can we please keep the discussion on a purely technical level?

Cheers,
-- Guido

Ron

unread,

Nov 4, 2021, 11:20:04 PM11/4/21

Hi Guido,

On Wed, Nov 03, 2021 at 07:15:37PM +0100, Guido Günther wrote:
> Hi Ron,
>
> Sorry for the broken boot. That's always annoying.

Thanks for looking at the details. I filed these bugs because even
though I can step around the problem in the permutation that involves
something I maintain - if we don't fill in *all* of the contributing
potholes, then I can't prevent some other combination which I have no
control over making a reboot after some future upgrade crash on the
same sharp corner case.

So it really would be nice if we can make this as naturally robust
as possible. That we have "three corner" accidents like this, where
the problem would not have occurred if any one of the contributors
did not have some window for trouble, and that nobody detected and
reported this through the stable release cycle, says to me that we
ought to close every window for this that we see when we see it ...

> On Sat, Oct 30, 2021 at 05:39:45PM +1030, Ron wrote:
> > The race occurs because the .socket unit creates the libvirt control
> > socket very early in the boot, before even the network-pre target is
> > reached, and so long before the libvirtd.service dependencies are
> > satisfied and the daemon itself can be started to handle requests.
> >
> > The deadlock in my case occurs when a udev rule for a device already
> > attached at boot tries to assign that device to a VM.
> >
> > Prior to Bullseye, what would occur is:
> >
> > The udev rule calls a small script on device hot/cold plug which
> > checks a config file, and if the device is allocated to a VM, then
> > calls virsh to attach it to that VM.
>
> Is that a sync call to virsh from udev via RUN ?

It's a call to virsh attach-device - which unless I'm missing something
has no option except to be a "sync" call?

But also unless I'm really missing something, there really is no reason
that particular operation should ever block or be "long running" when
called for a local domain. Either it fails out immediately because
the local libvird is not running (prior to socket activation), it fails
out immediately because the requested domain is not running, or it
succeeds or fails "almost immediately" because the device could (not)
be attached to it.

I agree there are cases where virsh *could* be "long running" (I
wouldn't try to spin up a new VM from udev RUN :), and pathological
cases where *any* process, even the most trivial, could become
"long running" - but neither of those are involved in the failure
mode I'm currently looking at avoiding.

> > This 'immediately' either succeeds, fails because the desired VM
> > is not actually running (yet), or fails because libvirtd is not
> > running and virsh did not find its socket present.
> >
> > If either of the failure cases occur, the calling script fails
> > gracefully, and a QEMU hook will later handle attaching the device
> > if/when libvirtd and the desired VM is actually started.
> >
> > But in Bullseye there's a three-way race, and if the zombie socket is
> > created before the udev rule runs, then virsh connects to it, but hangs
> > indefinitely waiting for libvirtd.service to be able to start and
> > respond to the request.
>
> So far sounds like expected behaviour for socket activation.

Yes, I think that's the crux here. I understand the wishful thinking in
the design of socket activation, where you just fire the starting gun
early and let everything race for the finish line with no explicit
ordering, hoping that it will just shake out as an emergent property ...

But that only works if none of the dependency relationships are cyclic,
as soon as there's a cycle (which is what we have here), the emergent
property is you can't predict what it will break ... and the only
tie-breaker is to time-out and kill something.

In the case I got to analyse here, the problem doesn't depend on the
bit-babbler package, or even anything being called by udev RUN. Any
early-start service, with an ordering dependency that requires it first
started before any of libvirtd.service's After list, which calls virsh
for any reason would also fall into the same trap.

So the problem in libvirt's circle of influence isn't "a long running
service" spawned by udev, it's that it's now not safe for *anything*
to call virsh without an explicit dependency somehow declaring it must
not be required to complete before libvirtd.service is successfully
started ...

Otherwise libvirtd will never start, and virsh will never return.
Which might seem like a trivial problem to avoid, but as we see here,
it's quite easy to create it accidentally in the tangle of multiple
independent requirements created by multiple independent authors.

And I assume this would be true for even the most seemingly benign
uses of virsh (like version --daemon or sysinfo et al.) - there's
certainly many things being called by RUN rules which I'd expect
as more probable to become "long running" than virsh version ...

I'm not clinging tight to any particular solution to this, but it's
evidently a real trap, and the "obvious" answer to me to restore the
previous graceful and intuitive failure modes of virsh is to not have
the socket present before the daemon can actually service it - whether
that is via unit dependencies, or returning to libvirtd creating it
by itself ... maybe there are better answers for that, but I haven't
seen any suggested yet to consider?

The problem here is not the time it takes a local domain attach-device
operation to complete though - it's that if you attempt it between when
the socket it created and libvirt is started, other things you don't
(and can't) control, could create a situation where that might deadlock
and just never complete. It never times out for "running too long" in
the case where it could complete ...

If you boil that part of this right down, it's equivalent to the classic
multi-threaded deadlock conditions created by taking multiple locks in
the wrong order. It's a symptomless problem, until there's actually
contention for the resource. But it's a real latent problem, with a
known fatal result when the race is lost.

> - ifupdown invokes udevadm settle which waits for the above and things
> go downhill.

Which in this case is the "innocent victim" - because if its involvement
hadn't meant the network entirely failed to come up, this could have
quite easily gone unnoticed for a lot longer as just a "slow boot" with
a long, pointless, but otherwise harmless delay where it all stalled.

But again that's part of the issue here, if we don't smooth over all the
rough edges, ifupdown here is just a placeholder for any other process
that might reasonably become collateral damage in the future - and even
if nothing fails, I'd rather not be adding 2-3 useless minutes to a
normal boot time.

I'm just pointing at all the places where this analysis showed an easily
fixable problem, that doesn't even seem to be a dilemma where we need to
choose between two bad things, we can just avoid this problem without
creating some other one in its place, can't we?

Have I really missed something, or just failed at explaining some part
of it still?

> I'm not into systemd's details of socket activation but from the above
> I'm not yet convinced there's anything on the libvirt side to fix. The
> best argument I would by is: worked before, don't regress on upgrades
> (which could also be fixed by a mention in the NEWS file.

Yeah, we could put an orange cone next to the pothole. But "we told you
so" isn't my personal favourite answer to people getting repeatedly bitten
by a problem that is fairly easily avoidable ... I'm not the first to
get hit by this by a long shot, and none of the other cases I found were
using the bit-babbler package ... so I am genuinely concerned this can
even bite *me* again though someone else's accidental actions on some
future upgrade if we don't act to become robust against it. Me knowing
it can happen is not enough to prevent it happening on my systems again
if we don't stop it at all the root causes.

Or is there really some advantage to having the zombie socket exist
before we even know whether libvirtd *can* be started?

As I say, I understand the hope in the socket activation design, but in
the case of libvirtd, I don't see it really participating meaningfully
in managing the start up ordering?

There might be some advantage to using the systemd access controls et al.
but we can still have that even if the socket unit is not activated
until libvird.service is clear to start, can't we?

Is there some other consideration I'm missing here?

> Can we please keep the discussion on a purely technical level?

I appreciate that very much, thank you!

Cheers,
Ron

0 new messages