Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#959867: ifup kills dhclient on if-up.d script failure

37 views
Skip to first unread message

Sergio Gelato

unread,
May 6, 2020, 6:30:03 AM5/6/20
to
Package: ifupdown
Version: 0.8.35
Severity: important

I've now run into this problem on several systems running buster. Whenever
a script in /etc/network/if-up.d/ fails (see, e.g., #959864) the dhclient
instance dies. This behaviour may actually predate buster, but is much more
noticeable now that dhclient-script sets the interface's valid_lft to
the actual, finite lifetime of the lease (cf. #834532). The result is that
the system drops off the network when the initial DHCP lease expires.

I'm not sure how well specified the behaviour of ifup is on script failure;
I couldn't find it documented. Maybe this needs to be clarified? That's also
why I've filed a bug against postfix, whose if-up.d script really shouldn't
be failing so casually.

Still, doing things halfway (killing dhclient but leaving the interface up)
doesn't feel right. I'd rather deal with an immediate failure than with a
delayed one.

Here is information about the same incident as in #959864, but with a focus
on ifup rather than postfix.

From systemctl status:

● if...@eno1.service - ifup for eno1
Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since xxxxxxxxxxxxxx 11:56:04 UTC; x days ago
Process: 713 ExecStart=/bin/sh -ec ifup --allow=hotplug eno1; ifquery --state eno1 (code=exited, status=1/FAILURE)
Main PID: 713 (code=exited, status=1/FAILURE)

11:56:04 HOST dhclient[729]: DHCPACK of 192.168.1.68 from 192.168.1.67
11:56:04 HOST dhclient[729]: bound to 192.168.1.68 -- renewal in 11512 seconds.
11:56:04 HOST sh[713]: bound to 192.168.1.68 -- renewal in 11512 seconds.
11:56:04 HOST sh[713]: Sending network state change signal to nslcd...done.
11:56:04 HOST sh[713]: run-parts: /etc/network/if-up.d/postfix exited with return code 69
11:56:04 HOST sh[713]: ifup: failed to bring up eno1
11:56:04 HOST systemd[1]: if...@eno1.service: Main process exited, code=exited, status=1/FAILURE
11:56:04 HOST systemd[1]: if...@eno1.service: Failed with result 'exit-code'.

From process accounting:

dhclient-script |v3| 0.00| 0.00| 0.00| 0| 0| 2388.00| 0.00| 1138| 1131| F | 0|__ |xxxxxxxxxx 11:56:04 2020
dhclient-script |v3| 0.00| 0.00| 23.00| 0| 0| 2388.00| 0.00| 1131| 729| | 0|__ |xxxxxxxxxx 11:56:04 2020
dhclient |v3| 0.00| 0.00| 519.00| 0| 0| 8456.00| 0.00| 725| 723| | 0|__ |xxxxxxxxxx 11:55:59 2020
sh |v3| 0.00| 0.00| 519.00| 0| 0| 2388.00| 0.00| 723| 716| | 0|__ |xxxxxxxxxx 11:55:59 2020
postfix |v3| 0.00| 0.00| 0.00| 0| 0| 2388.00| 0.00| 1219| 1217| F | 0|__ |xxxxxxxxxx 11:56:04 2020
postqueue |v3| 1.00| 0.00| 3.00| 0| 0| 45576.00| 0.00| 1222| 1217| | 69|__ |xxxxxxxxxx 11:56:04 2020
postfix |v3| 0.00| 0.00| 4.00| 0| 0| 2388.00| 0.00| 1217| 1194| | 69|__ |xxxxxxxxxx 11:56:04 2020
run-parts |v3| 0.00| 0.00| 13.00| 0| 0| 2284.00| 0.00| 1194| 1193| | 1|__ |xxxxxxxxxx 11:56:04 2020
sh |v3| 0.00| 0.00| 13.00| 0| 0| 2388.00| 0.00| 1193| 716| | 1|__ |xxxxxxxxxx 11:56:04 2020
ifup |v3| 0.00| 0.00| 533.00| 0| 0| 2344.00| 0.00| 716| 713| | 1|__ |xxxxxxxxxx 11:55:59 2020
sh |v3| 0.00| 0.00| 534.00| 0| 0| 2388.00| 0.00| 713| 1|S | 1|__ |xxxxxxxxxx 11:55:59 2020
dhclient |v3| 0.00| 0.00| 520.00| 0| 0| 11416.00| 0.00| 729| 1|SF X| 0|__ |xxxxxxxxxx 11:55:59 2020

Examination of the raw process accounting data (dump-acct doesn't print out
this information) reveals that process 729 was killed by signal 15 (SIGTERM).

Guus Sliepen

unread,
May 6, 2020, 6:30:03 PM5/6/20
to
On Wed, May 06, 2020 at 12:25:57PM +0200, Sergio Gelato wrote:

> I've now run into this problem on several systems running buster. Whenever
> a script in /etc/network/if-up.d/ fails (see, e.g., #959864) the dhclient
> instance dies. This behaviour may actually predate buster, but is much more
> noticeable now that dhclient-script sets the interface's valid_lft to
> the actual, finite lifetime of the lease (cf. #834532). The result is that
> the system drops off the network when the initial DHCP lease expires.
>
> I'm not sure how well specified the behaviour of ifup is on script failure;
> I couldn't find it documented. Maybe this needs to be clarified? That's also
> why I've filed a bug against postfix, whose if-up.d script really shouldn't
> be failing so casually.
>
> Still, doing things halfway (killing dhclient but leaving the interface up)
> doesn't feel right. I'd rather deal with an immediate failure than with a
> delayed one.

I can certainly document the behaviour on script failure (as well as
documenting the order in which things are executed). Cleaning up after a
failure is currently not done. I can partially add that; the problem is
that there is no clearly defined order of what (post-)down commands/scripts should
be executed given a failure at a given stage of bringing the interface
up.

However, I can ensure that at least the interface's built-in methods
clean up properly after an error, so that a failing script after the
DHCP client is started will cause the client to be stopped orderly by
ifupdown itself, and the link brought down as well.

--
Met vriendelijke groet / with kind regards,
Guus Sliepen <gu...@sliepen.org>
signature.asc

Daniel Richard G.

unread,
Feb 6, 2024, 5:00:05 PM2/6/24
to
I was recently bitten by this as well.

It was particularly bad in my case: on a headless server, with the
network connection being my only way in---and the problem was not with
the main Ethernet interface, but with a separate WireGuard one that I
was not using (both marked as "auto").

The eno1 interface got a DHCP connection without issue. But the wg0
config had a post-up routing-setup bug that I'd missed in earlier
testing. As the syslog excerpt below shows, the failure to bring up wg0
caused *all* of the "Raise network interfaces" task to be walked back.
And as this only killed dhclient and left eno1 configured, I thought all
was fine... until the system dropped off the network twelve hours later.

(It's also clear that dhclient was killed uncleanly, because the next
time it runs, you get a "Removed stale PID file" message.)

The "FIB table does not exist" error is where things start to go wrong:

2024-02-05T22:19:06.343712-05:00 darkstar systemd[1]: Starting networking.service - Raise network interfaces...
2024-02-05T22:19:06.377118-05:00 darkstar dhclient[620]: Internet Systems Consortium DHCP Client 4.4.3-P1
2024-02-05T22:19:06.377147-05:00 darkstar ifup[620]: Internet Systems Consortium DHCP Client 4.4.3-P1
2024-02-05T22:19:06.377154-05:00 darkstar ifup[620]: Copyright 2004-2022 Internet Systems Consortium.
2024-02-05T22:19:06.377159-05:00 darkstar ifup[620]: All rights reserved.
2024-02-05T22:19:06.377164-05:00 darkstar ifup[620]: For info, please visit https://www.isc.org/software/dhcp/
2024-02-05T22:19:06.377178-05:00 darkstar dhclient[620]: Copyright 2004-2022 Internet Systems Consortium.
2024-02-05T22:19:06.377190-05:00 darkstar dhclient[620]: All rights reserved.
2024-02-05T22:19:06.377201-05:00 darkstar dhclient[620]: For info, please visit https://www.isc.org/software/dhcp/
2024-02-05T22:19:06.377213-05:00 darkstar dhclient[620]:
2024-02-05T22:19:06.675691-05:00 darkstar dhclient[620]: Listening on LPF/eno1/XX:XX:XX:XX:XX:XX
2024-02-05T22:19:06.675765-05:00 darkstar ifup[620]: Listening on LPF/eno1/XX:XX:XX:XX:XX:XX
2024-02-05T22:19:06.675783-05:00 darkstar ifup[620]: Sending on LPF/eno1/XX:XX:XX:XX:XX:XX
2024-02-05T22:19:06.675796-05:00 darkstar ifup[620]: Sending on Socket/fallback
2024-02-05T22:19:06.675831-05:00 darkstar dhclient[620]: Sending on LPF/eno1/XX:XX:XX:XX:XX:XX
2024-02-05T22:19:06.675863-05:00 darkstar dhclient[620]: Sending on Socket/fallback
2024-02-05T22:19:06.675919-05:00 darkstar dhclient[620]: DHCPREQUEST for 192.168.1.2 on eno1 to 255.255.255.255 port 67
2024-02-05T22:19:06.675956-05:00 darkstar ifup[620]: DHCPREQUEST for 192.168.1.2 on eno1 to 255.255.255.255 port 67
2024-02-05T22:19:09.219179-05:00 darkstar kernel: [ 6.436538] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
2024-02-05T22:19:09.219201-05:00 darkstar kernel: [ 6.436674] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
2024-02-05T22:19:11.467192-05:00 darkstar dhclient[620]: DHCPREQUEST for 192.168.1.2 on eno1 to 255.255.255.255 port 67
2024-02-05T22:19:11.467252-05:00 darkstar ifup[620]: DHCPREQUEST for 192.168.1.2 on eno1 to 255.255.255.255 port 67
2024-02-05T22:19:11.470106-05:00 darkstar dhclient[620]: DHCPACK of 192.168.1.2 from 192.168.1.1
2024-02-05T22:19:11.470172-05:00 darkstar ifup[620]: DHCPACK of 192.168.1.2 from 192.168.1.1
2024-02-05T22:19:11.509438-05:00 darkstar dhclient[620]: bound to 192.168.1.2 -- renewal in 21284 seconds.
2024-02-05T22:19:11.509497-05:00 darkstar ifup[620]: bound to 192.168.1.2 -- renewal in 21284 seconds.
2024-02-05T22:19:11.575017-05:00 darkstar kernel: [ 8.793246] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
2024-02-05T22:19:11.575027-05:00 darkstar kernel: [ 8.793252] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Ja...@zx2c4.com>. All Rights Reserved.
2024-02-05T22:19:11.589364-05:00 darkstar ifup[689]: Error: ipv4: FIB table does not exist.
2024-02-05T22:19:11.589373-05:00 darkstar ifup[689]: Flush terminated
2024-02-05T22:19:11.589765-05:00 darkstar ifup[583]: ifup: failed to bring up wg0
2024-02-05T22:19:11.599118-05:00 darkstar systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
2024-02-05T22:19:11.663575-05:00 darkstar systemd[1]: networking.service: Failed with result 'exit-code'.
2024-02-05T22:19:11.664060-05:00 darkstar systemd[1]: Failed to start networking.service - Raise network interfaces.
2024-02-05T22:19:11.664996-05:00 darkstar systemd[1]: Reached target network.target - Network.
2024-02-05T22:19:11.665371-05:00 darkstar systemd[1]: Reached target network-online.target - Network is Online.
0 new messages