Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#900028: unbound: Fails to start

83 views
Skip to first unread message

cl...@jhcloos.com

unread,
May 24, 2018, 4:20:03 PM5/24/18
to
Package: unbound
Version: 1.7.1-1
Severity: important

** I'm reporting this on a different machine that on which it occurs. **

On several -- but not all -- of my kvm instances, unbound 1.7.1-1 fails to start.

I tried manually invoking start-stop-daemon via:

:; (start-stop-daemon --start --pidfile /run/unbound.pid --name unbound --startas /usr/sbin/unbound -- "" )

(replicating what /etc/init.d/unbound does) and unbound complained:

usage: unbound [options]
start unbound daemon DNS resolver.
-h this help
-c file config file to read instead of /etc/unbound/unbound.conf
file format is described in unbound.conf(5).
-d do not fork into the background.
-v verbose (more times to increase verbosity)
Version 1.7.1
linked libs: libevent 2.1.8-stable (it uses epoll), OpenSSL 1.1.0h 27 Mar 2018
linked modules: dns64 python subnetcache respip validator iterator
BSD licensed, see LICENSE in source package for details.
Report bugs to unboun...@nlnetlabs.nl

I tried again adding "-v" as the $DAEMON_OPTS value, and that avoided
the above notice, but unbound did not run. Switching to '-dvv' got:

[1527191667] unbound[9076:0] notice: Start of unbound 1.7.1.
[1527191667] unbound[9076:0] debug: switching log to syslog
Killed

The syslog output was:

May 24 19:54:27 fine unbound: [9076:0] warning: did not exit gracefully last time (9071)
May 24 19:54:27 fine unbound: [9076:0] debug: chdir to /etc/unbound
May 24 19:54:27 fine unbound: [9076:0] debug: drop user privileges, run as unbound
May 24 19:54:27 fine unbound: [9076:0] debug: module config: "subnetcache validator iterator"
May 24 19:54:27 fine unbound: [9076:0] notice: init module 0: subnet
May 24 19:54:27 fine unbound: [9076:0] debug: subnet: option registered (8)
May 24 19:54:27 fine unbound: [9076:0] notice: init module 1: validator
May 24 19:54:27 fine unbound: [9076:0] info: adding trusted key dlv.isc.org. DNSKEY IN
May 24 19:54:27 fine unbound: [9076:0] notice: init module 2: iterator
May 24 19:54:27 fine unbound: [9076:0] debug: target fetch policy for level 0 is 3
May 24 19:54:27 fine unbound: [9076:0] debug: target fetch policy for level 1 is 2
May 24 19:54:27 fine unbound: [9076:0] debug: target fetch policy for level 2 is 1
May 24 19:54:27 fine unbound: [9076:0] debug: target fetch policy for level 3 is 0
May 24 19:54:27 fine unbound: [9076:0] debug: target fetch policy for level 4 is 0

One of the kvms didn't have unbound installed. Even there 1.7.1-1, once installed, will not run.

-- System Information:
Debian Release: buster/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.5.0-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages unbound depends on:
ii adduser 3.117
ii dns-root-data 2018013001
ii libc6 2.27-3
ii libevent-2.1-6 2.1.8-stable-4
ii libfstrm0 0.3.0-1+b1
ii libprotobuf-c1 1.2.1-2
ii libpython3.6 3.6.5-9
pn libssl1.1 <none>
ii libsystemd0 238-4
ii lsb-base 9.20170808
pn openssl <none>
ii unbound-anchor 1.7.1-1

unbound recommends no packages.

Versions of packages unbound suggests:
pn apparmor <none>

-- no debconf information

James Cloos

unread,
May 24, 2018, 4:30:02 PM5/24/18
to
I tried running it with strace(8) and got this:

sendto(9, "<29>May 24 20:17:10 unbound: [10"..., 70, MSG_NOSIGNAL, NULL, 0) = 70
getpid() = 10144
getpid() = 10144
getrandom(0x7fff9321c450, 40, 0) = -1 ENOSYS (Function not implemented)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 10144
gettid() = 10144
tgkill(10144, 10144, SIGKILL) = ?
+++ killed by SIGKILL +++


If the lack of getrandom(2) is killing it, that is a grave bug.

-JimC
--
James Cloos <cl...@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6

James Cloos

unread,
Apr 18, 2022, 3:40:03 PM4/18/22
to
closing the report like that makes no sense whatsoever.

unbound as installed did not work. at all.

blowing off that report is incomprehensible.

the bug looks to have been fixed since then, though, so closing
as fixed would have been reasonable.

Michael Tokarev

unread,
Apr 18, 2022, 5:20:04 PM4/18/22
to
18.04.2022 22:25, James Cloos wrote:
> closing the report like that makes no sense whatsoever.

Please feel free to reopen this bug report and be ready to provide
some more information to debug it and to find and fix the problem.
Lacking that it makes no sense whatsoever to keep it.

FWIW, 1.7.1-1 version of unbound does not have a code that corresponds
to the strace you provided (which I initially haven't noticed at all).
It is interesting to understand what actually happened there.
Also, getrandom() appears to be working on that kernel (4.5.0-1-amd64).
And also, when getrandom() fails for unbound with ENOSYS (this errno
is handled in a special way there), it goes nest to trying reading
bytes from /dev/urandom (this is why I said there's no code corresponding
to that strace), and only after that one fails *too*, it errors out.

Not killed, though. Here's the code:

getrandom call:
https://salsa.debian.org/dns-team/unbound/-/blob/debian/1.7.1-1/compat/getentropy_linux.c#L123

if that fails with ENOSYS here's next urandom way:
https://salsa.debian.org/dns-team/unbound/-/blob/debian/1.7.1-1/compat/getentropy_linux.c#L136

and only after everything else failed, we have:

https://salsa.debian.org/dns-team/unbound/-/blob/debian/1.7.1-1/compat/getentropy_linux.c#L187

#undef FAIL_INSTEAD_OF_TRYING_FALLBACK
#ifdef FAIL_INSTEAD_OF_TRYING_FALLBACK
raise(SIGKILL);
#endif

note the #undef here. This code (raise, which is tgkill syscall)
is not compiled in.

So it goes to a fallback and.. works.
And this is something I can confirm after downloading 1.7.1-1 version
of unbound from snapshot.debian.org and trying it. It works even when
I disable getrandom() syscall for it.

So you have quite a situation in there. Everything seem to be out of
order. Feel free to reopen it and be ready to perform some debugging.

Thanks,

/mjt
0 new messages