Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#561622: Race condition between ntp and NetworkManager at entry to runlevel 2

31 views
Skip to first unread message

Carl Mascott

unread,
Dec 18, 2009, 3:00:02 PM12/18/09
to
Subject: Race condition between ntp and NetworkManager at entry to runlevel 2
Package: ntp
Version: 1:4.2.4p4+dfsg-8lenny3
Severity: important

*** Please type your report below this line ***
Related bug: 535049
Symptom: On some startups ntpq -p reports "No association ID's...".
    On other startups ntpq -p reports servers & status normally.
    I.E., problem is intermittent.
Cause: When ntpq -p reports "No association ID's..." it is because
    NetworkManager has not yet brought up the network interface
    (eth0 for me) when ntpd is trying to resolve server IP
    addresses.  If ntpd gives up on resolving addresses at
    this point it will never try again later.
Effect: ntpd does not provide time unless restarted manually.
Scope: The symptom I see is a problem with ntp.  The underlying
    cause could affect other network services as well.
Non-solution: Reorder startup scripts in /etc/rc2.d.
Correct solution: ??

Workaround: Use modified version of etch's /etc/network/if-up.d/ntp
    script as follows:
----------
#!/bin/sh

# 12/18/09  Carl Mascott    adapt script from etch to work around
#                ntpd/NetworkManager race condition upon
#                entering runlevel 2

# remove (or comment out) the next line if your network addresses change
#exit 0

case $ADDRFAM in
    NetworkManager)
        if [ -x "/etc/init.d/ntp" ]; then
            if [ -x "`which invoke-rc.d 2>/dev/null`" ]; then
                invoke-rc.d ntp restart || exit $?
            else
                /etc/init.d/ntp restart || exit $?
            fi
        fi
        ;;
esac
----------

Excerpt from syslog showing ntp giving up on name resolution 5 seconds
(in this particular example) before NetworkManager brings up eth0:
----------
Dec 18 09:26:08 ganymede ntpd[2732]: ntpd 4.2...@1.1520-o Sun Nov 22 16:14:34 UTC 2009 (1)
Dec 18 09:26:08 ganymede ntpd[2733]: precision = 1.000 usec
Dec 18 09:26:08 ganymede ntpd[2733]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled
Dec 18 09:26:08 ganymede ntpd[2733]: Listening on interface #1 wildcard, ::#123 Disabled
Dec 18 09:26:08 ganymede ntpd[2733]: Listening on interface #2 lo, ::1#123 Enabled
Dec 18 09:26:08 ganymede ntpd[2733]: bind() fd 19, family 10, port 123, scope 2, addr fe80::216:17ff:fe6c:7baa, in6_is_addr_multicast=0 flags=0x11 fails: Cannot assign requested address
Dec 18 09:26:08 ganymede ntpd[2733]: unable to create socket on eth0 (3) for fe80::216:17ff:fe6c:7baa#123
Dec 18 09:26:08 ganymede ntpd[2733]: failed to initialize interface for address fe80::216:17ff:fe6c:7baa
Dec 18 09:26:08 ganymede ntpd[2733]: Listening on interface #4 lo, 127.0.0.1#123 Enabled
Dec 18 09:26:08 ganymede ntpd[2733]: kernel time sync status 0040
Dec 18 09:26:08 ganymede ntpd[2733]: frequency initialized -109.140 PPM from /var/lib/ntp/ntp.drift
Dec 18 09:26:08 ganymede kernel: [   23.369025] warning: `ntpd' uses 32-bit capabilities (legacy support in use)
Dec 18 09:26:08 ganymede ntpd[2741]: signal_no_reset: signal 17 had flags 4000000
Dec 18 09:26:08 ganymede avahi-daemon[2352]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::216:17ff:fe6c:7baa.
Dec 18 09:26:08 ganymede avahi-daemon[2352]: New relevant interface eth0.IPv6 for mDNS.
Dec 18 09:26:08 ganymede avahi-daemon[2352]: Registering new address record for fe80::216:17ff:fe6c:7baa on eth0.*.
Dec 18 09:26:08 ganymede dhcdbd: Started up.
Dec 18 09:26:09 ganymede ntpd[2733]: Listening on interface #5 eth0, fe80::216:17ff:fe6c:7baa#123 Enabled
Dec 18 09:26:09 ganymede acpid: client connected from 2781[105:113]
Dec 18 09:26:10 ganymede NetworkManager: <info>  starting...
Dec 18 09:26:10 ganymede NetworkManager: <info>  eth0: Device is fully-supported using driver 'uli526x'.
Dec 18 09:26:10 ganymede NetworkManager: <info>  nm_device_init(): waiting for device's worker thread to start
Dec 18 09:26:10 ganymede NetworkManager: <info>  nm_device_init(): device's worker thread started, continuing.
Dec 18 09:26:10 ganymede NetworkManager: <info>  Now managing wired Ethernet (802.3) device 'eth0'.
Dec 18 09:26:10 ganymede NetworkManager: <info>  Deactivating device eth0.
Dec 18 09:26:10 ganymede avahi-daemon[2352]: Withdrawing address record for fe80::216:17ff:fe6c:7baa on eth0.
Dec 18 09:26:10 ganymede avahi-daemon[2352]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::216:17ff:fe6c:7baa.
Dec 18 09:26:10 ganymede avahi-daemon[2352]: Interface eth0.IPv6 no longer relevant for mDNS.
Dec 18 09:26:10 ganymede NetworkManager: <info>  Will activate wired connection 'eth0' because it now has a link.
Dec 18 09:26:10 ganymede NetworkManager: <info>  SWITCH: no current connection, found better connection 'eth0'.
Dec 18 09:26:10 ganymede anacron[2821]: Anacron 2.3 started on 2009-12-18
Dec 18 09:26:10 ganymede anacron[2821]: Will run job `cron.daily' in 5 min.
Dec 18 09:26:10 ganymede anacron[2821]: Jobs will be executed sequentially
Dec 18 09:26:10 ganymede ntpd_initres[2741]: host name not found: 0.debian.pool.ntp.org
Dec 18 09:26:10 ganymede ntpd_initres[2741]: couldn't resolve `0.debian.pool.ntp.org', giving up on it
Dec 18 09:26:10 ganymede ntpd_initres[2741]: host name not found: 1.debian.pool.ntp.org
Dec 18 09:26:10 ganymede ntpd_initres[2741]: couldn't resolve `1.debian.pool.ntp.org', giving up on it
Dec 18 09:26:10 ganymede ntpd_initres[2741]: host name not found: 2.debian.pool.ntp.org
Dec 18 09:26:10 ganymede ntpd_initres[2741]: couldn't resolve `2.debian.pool.ntp.org', giving up on it
Dec 18 09:26:10 ganymede ntpd_initres[2741]: host name not found: 3.debian.pool.ntp.org
Dec 18 09:26:10 ganymede ntpd_initres[2741]: couldn't resolve `3.debian.pool.ntp.org', giving up on it
Dec 18 09:26:10 ganymede /usr/sbin/cron[2848]: (CRON) INFO (pidfile fd = 3)
Dec 18 09:26:10 ganymede /usr/sbin/cron[2849]: (CRON) STARTUP (fork ok)
Dec 18 09:26:10 ganymede /usr/sbin/cron[2849]: (CRON) INFO (Running @reboot jobs)
Dec 18 09:26:11 ganymede dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth0 for sub-path eth0.dbus.get.reason
Dec 18 09:26:11 ganymede NetworkManager: <info>  Will activate connection 'eth0'.
Dec 18 09:26:11 ganymede NetworkManager: <info>  Device eth0 activation scheduled...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) started...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) started...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) scheduled...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) complete.
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) starting...
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) successful.
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled.
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
Dec 18 09:26:11 ganymede NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Dec 18 09:26:11 ganymede acpid: client connected from 2871[0:0]
Dec 18 09:26:12 ganymede NetworkManager: <info>  Activation (eth0) Beginning DHCP transaction.
Dec 18 09:26:12 ganymede NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
Dec 18 09:26:12 ganymede NetworkManager: <info>  DHCP daemon state is now 12 (successfully started) for interface eth0
Dec 18 09:26:13 ganymede kernel: [   28.139223] nvidia: module license 'NVIDIA' taints kernel.
Dec 18 09:26:13 ganymede kernel: [   28.403588] ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 25 (level, low) -> IRQ 25
Dec 18 09:26:13 ganymede kernel: [   28.403597] PCI: Setting latency timer of device 0000:01:00.0 to 64
Dec 18 09:26:13 ganymede kernel: [   28.403725] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  173.14.09  Wed Jun  4 23:40:50 PDT 2008
Dec 18 09:26:14 ganymede NetworkManager: <info>  DHCP daemon state is now 1 (starting) for interface eth0
Dec 18 09:26:14 ganymede dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
Dec 18 09:26:14 ganymede dhclient: DHCPOFFER from 192.168.1.1
Dec 18 09:26:14 ganymede dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 18 09:26:14 ganymede dhclient: DHCPACK from 192.168.1.1
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.128.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: New relevant interface eth0.IPv4 for mDNS.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Registering new address record for 192.168.1.128 on eth0.IPv4.
Dec 18 09:26:14 ganymede NetworkManager: <info>  DHCP daemon state is now 2 (bound) for interface eth0
Dec 18 09:26:14 ganymede NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) scheduled...
Dec 18 09:26:14 ganymede NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) started...
Dec 18 09:26:14 ganymede dhclient: bound to 192.168.1.128 -- renewal in 40863 seconds.
Dec 18 09:26:14 ganymede dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth0 for sub-path eth0.dbus.get.domain_search
Dec 18 09:26:14 ganymede dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth0 for sub-path eth0.dbus.get.nis_domain
Dec 18 09:26:14 ganymede dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth0 for sub-path eth0.dbus.get.nis_servers
Dec 18 09:26:14 ganymede NetworkManager: <info>  Retrieved the following IP4 configuration from the DHCP daemon:
Dec 18 09:26:14 ganymede NetworkManager: <info>    address 192.168.1.128
Dec 18 09:26:14 ganymede NetworkManager: <info>    netmask 255.255.255.0
Dec 18 09:26:14 ganymede NetworkManager: <info>    broadcast 192.168.1.255
Dec 18 09:26:14 ganymede NetworkManager: <info>    gateway 192.168.1.1
Dec 18 09:26:14 ganymede NetworkManager: <info>    nameserver 192.168.1.1
Dec 18 09:26:14 ganymede NetworkManager: <info>    hostname 'ganymede'
Dec 18 09:26:14 ganymede NetworkManager: <info>    domain name 'local'
Dec 18 09:26:14 ganymede dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth0 for sub-path eth0.dbus.get.interface_mtu
Dec 18 09:26:14 ganymede NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled...
Dec 18 09:26:14 ganymede NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) complete.
Dec 18 09:26:14 ganymede NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) started...
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Withdrawing address record for 192.168.1.128 on eth0.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.128.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Interface eth0.IPv4 no longer relevant for mDNS.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.128.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: New relevant interface eth0.IPv4 for mDNS.
Dec 18 09:26:14 ganymede avahi-daemon[2352]: Registering new address record for 192.168.1.128 on eth0.IPv4.
Dec 18 09:26:15 ganymede dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 12
Dec 18 09:26:15 ganymede dhclient: DHCPOFFER from 192.168.1.1
Dec 18 09:26:15 ganymede dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 18 09:26:15 ganymede dhclient: DHCPACK from 192.168.1.1
Dec 18 09:26:15 ganymede dhclient: bound to 192.168.1.128 -- renewal in 35149 seconds.
Dec 18 09:26:15 ganymede acpid: client connected from 2871[0:0]
Dec 18 09:26:15 ganymede NetworkManager: <WARN>  nm_system_device_set_ip4_route(): Failed to set IPv4 default route on 'eth0': File exists
Dec 18 09:26:15 ganymede NetworkManager: <info>  Activation (eth0) Finish handler scheduled.
Dec 18 09:26:15 ganymede NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.
Dec 18 09:26:15 ganymede NetworkManager: <info>  Activation (eth0) successful, device activated.
---------


-- System Information:
Debian Release: 5.0.3
  APT prefers stable
  APT policy: (990, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages ntp depends on:
ii  adduser                 3.110            add and remove users and groups
ii  libc6                   2.7-18           GNU C Library: Shared libraries
ii  libcap1                 1:1.10-14        support for getting/setting POSIX.
ii  libedit2                2.11~20080614-1  BSD editline and history libraries
ii  libncurses5             5.7+20081213-1   shared libraries for terminal hand
ii  libssl0.9.8             0.9.8g-15+lenny5 SSL shared libraries
ii  lsb-base                3.2-20           Linux Standard Base 3.2 init scrip
ii  netbase                 4.34             Basic TCP/IP networking system

Versions of packages ntp recommends:
ii  perl                     5.10.0-19lenny2 Larry Wall's Practical Extraction

Versions of packages ntp suggests:
ii  ntp-doc           1:4.2.4p4+dfsg-8lenny3 Network Time Protocol documentatio

-- no debconf information


Kurt Roeckx

unread,
Dec 18, 2009, 5:20:02 PM12/18/09
to
On Fri, Dec 18, 2009 at 11:40:15AM -0800, Carl Mascott wrote:
> Subject: Race condition between ntp and NetworkManager at entry to runlevel 2
> Package: ntp
> Version: 1:4.2.4p4+dfsg-8lenny3
> Severity: important
>
> *** Please type your report below this line ***
> Related bug: 535049
> Symptom: On some startups ntpq -p reports "No association ID's...".
> On other startups ntpq -p reports servers & status normally.
> I.E., problem is intermittent.
> Cause: When ntpq -p reports "No association ID's..." it is because
> NetworkManager has not yet brought up the network interface
> (eth0 for me) when ntpd is trying to resolve server IP
> addresses. If ntpd gives up on resolving addresses at
> this point it will never try again later.
> Effect: ntpd does not provide time unless restarted manually.
> Scope: The symptom I see is a problem with ntp. The underlying
> cause could affect other network services as well.
> Non-solution: Reorder startup scripts in /etc/rc2.d.
> Correct solution: ??

I have no idea why ntpd is saying that resolving failed. But that
means it got a permanent error resolving from somewhere. I have
no idea how your configuration is set up, but I never get such
permanent error.

I can perfectly bring up my wireless after some time, and ntpd
will pick up the servers after some time. Note that it doesn't
directly get them when the network gets up, but it does get them.

Anyway, the current ntp init script has this in it:
# Required-Start: $network

I have no idea how that interacts with network manager, but I
would expect atleast some basic things to work.

Do you know what's in your /etc/resolv.conf before the network
is brought up by network manager? Do you have some sort of
local nameserver? What does "grep ^hosts /etc/nsswitch.conf"
return?

What I find weird in your log is:


> Dec 18 09:26:08 ganymede ntpd[2741]: signal_no_reset: signal 17 had flags 4000000

I've never seen that message before. It seems the child sees
an old signal halder for SIGCHLD with SA_RESTORER set. (I have
no idea why ntpd is logging this.) This is probably unrelated
to your problem.

> Workaround: Use modified version of etch's /etc/network/if-up.d/ntp
> script as follows:

There should not be a need to restart ntpd, I think something
else is broken.


Kurt


--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Carl Mascott

unread,
Dec 18, 2009, 6:30:02 PM12/18/09
to
To answer your questions:

/etc/resolv.conf looks like this when I boot up single-user:

domain local
search local
nameserver 192.168.1.1

Note that in single-user the network interfaces are up
and configured, and dhclient is running (but NetworkManager
is not running).  (Is it possible to boot Debian without
networking?)

There is no nameserver running on the Debian machine, but
there is one (DNSMasq) in my router (192.168.1.1).

The hosts line of /etc/nsswitch.conf:

hosts:    files mdns4_minimal [NOTFOUND=return] dns mdns4

And some extra information: the servers lines from /etc/ntp.conf:

server 0.debian.pool.ntp.org iburst dynamic
server 1.debian.pool.ntp.org iburst dynamic
server 2.debian.pool.ntp.org iburst dynamic
server 3.debian.pool.ntp.org iburst dynamic

Also note that I am using dhcp for network configuration.
The dhcp server is in my router (192.168.1.1).

---

First, I don't think that ntpd is getting a permanent error
from the router's nameserver: syslog indicates that the
network interface is down at the time, so the router is
unreachable.

Second, it seems that the 'dynamic' option in ntp.conf
should tell ntpd to keep trying to resolve the server
addresses and contact the servers.  Yet I can assure you
that if ntpq -p says "No association ID's..." right
after boot, it will still say "No association ID's..."
an hour later.  One of us should probably look at the
ntp source code.

Third, I'm mystified why you don't have this problem,
but I'm not the only one who has it (see #535049).

The good news is that /etc/network/if-up.d/ntp seems
to be a good workaround for me.



From: Kurt Roeckx <ku...@roeckx.be>
To: Carl Mascott <cmas...@yahoo.com>; 561...@bugs.debian.org
Sent: Fri, December 18, 2009 5:03:12 PM
Subject: Re: [pkg-ntp-maintainers] Bug#561622: Race condition between ntp and NetworkManager at entry to runlevel 2

Kurt Roeckx

unread,
Dec 19, 2009, 9:30:01 AM12/19/09
to
reassign 561622 libnss-mdns
thanks

On Fri, Dec 18, 2009 at 03:22:35PM -0800, Carl Mascott wrote:
> The hosts line of /etc/nsswitch.conf:
>
> hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

Mine only had "files dns". Changing it to that breaks things
for me. Changing it to:


hosts: files mdns4_minimal [NOTFOUND=return] dns

Makes things work again.

So I believe that this is a bug in libnss-mdns, and am reassigning
it.

Kurt Roeckx

unread,
Dec 25, 2009, 8:20:01 AM12/25/09
to
On Sat, Dec 19, 2009 at 03:16:59PM +0100, Kurt Roeckx wrote:
> On Fri, Dec 18, 2009 at 03:22:35PM -0800, Carl Mascott wrote:
> > The hosts line of /etc/nsswitch.conf:
> >
> > hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
>
> Mine only had "files dns". Changing it to that breaks things
> for me. Changing it to:
> hosts: files mdns4_minimal [NOTFOUND=return] dns
>
> Makes things work again.
>
> So I believe that this is a bug in libnss-mdns, and am reassigning
> it.

Afaik, ntpd behaves that way in case EAI_NONAME (authoritive
answer that there is no such name) is returned by getaddrinfo(),
not in case of something like EAI_FAIL or EAI_NODATA. I believe
that EAI_NONAME is returned when the DNS returned NXDOMAIN.

0 new messages