how to log all recursive query responses?

David Sparks

unread,

Aug 1, 2008, 8:10:57 PM8/1/08

to

I wanted to turn on the responses BIND is giving out to recursive queries. I
added the below to my named.conf but it didn't do anything. What does
resolver log do?

channel resolver_log {
file "/var/log/resolver.log" versions 3 size 100M;
severity info;
print-category yes;
print-severity yes;
print-time yes;
};
category resolver { resolver_log; };

Kevin Darcy

unread,

Aug 6, 2008, 12:30:55 AM8/6/08

to

I think the "resolver" category is only intended for debugging, so you'd
need "severity dynamic" and a fairly high debug level to capture any
information, and the information you capture would basically only be of
interest to programmers wanting to know about the internal workings of
named's "resolver" code, e.g.

06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
answer_response
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
cache_message
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
clone_results
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
cancelquery
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'): done
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
stopeverything
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
cancelqueries
06-Aug-2008 00:22:35.053 resolver: debug 3: fctx 10054e420(./NS'):
sendevents
06-Aug-2008 00:22:35.053 resolver: debug 3: fetch 1003c66e0 (fctx
10054e420(./NS)): destroyfetch

If you want to capture the contents of the actual *packets* that named
is generating, I'd recommend a packet capture utility such as "tcpdump".
It's not too hard to restrict the captures to responses only, where the
RD flag in the header is set to 1 (indicating that the original query
was recursive). For the PC platform, there's also WireShark, but to be
honest, I haven't played much with its filtering capabilities.

- Kevin

David Sparks

unread,

Aug 8, 2008, 3:24:33 PM8/8/08

to

> logging {
> category lame-servers { null; };
> channel default_syslog {
> syslog local2;

> severity info;
> print-category yes;
> print-severity yes;

> };
> category queries { default_syslog; };
> };

Does the above log the responses or just the queries?

I'm trying to debug why two 1000qps BIND servers side by side are giving out
different (cached?) results (one SERVFAIL, one correct answer) from a close
(one Internet hop but in the same data centre) rbldnsd server. The SERVFAIL
is incorrect and I can't figure out how named got things wrong in the first place.

The incorrect SERVFAIL also seems to be cached but I can't see anything about
the query from rndc dumpdb output.

rndc dumpdb -cache shows that the server with the correct answer has cached
values. What I don't understand is why the named that doesn't have a cached
answer doesn't resolve the query, instead it returns SERVFAIL immediately?

This only happens after named has been running hard for several days. I've
pasted an example below, ns1 gets SERVFAIL and ns2 gets the proper answer.

daves@sentinel ~ $ host -v -t a X.X.X.213.fur.ca1.sophosxl.com. ns1
Trying "X.X.X.213.fur.ca1.sophosxl.com"
Received 49 bytes from 10.99.159.11#53 in 89 ms
Trying "X.X.X.213.fur.ca1.sophosxl.com"
Using domain server:
Name: ns1
Address: 10.99.159.11#53
Aliases:

Host X.X.X.213.fur.ca1.sophosxl.com not found: 2(SERVFAIL)
Received 49 bytes from 10.99.159.11#53 in 88 ms

daves@sentinel ~ $ host -v -t a X.X.X.213.fur.ca1.sophosxl.com. ns2
Trying "X.X.X.213.fur.ca1.sophosxl.com"
Using domain server:
Name: ns2
Address: 10.99.159.12#53
Aliases:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36177
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; QUESTION SECTION:
;X.X.X.213.fur.ca1.sophosxl.com. IN A

;; ANSWER SECTION:
X.X.X.213.fur.ca1.sophosxl.com. 2100 IN A 127.0.4.2

;; AUTHORITY SECTION:
fur.ca1.sophosxl.com. 352 IN NS ca1.sophosxl.com.

;; ADDITIONAL SECTION:
ca1.sophosxl.com. 569 IN A 209.17.179.166

Received 95 bytes from 10.99.159.12#53 in 26 ms

> If you want to capture the contents of the actual *packets* that named
> is generating, I'd recommend a packet capture utility such as "tcpdump".
> It's not too hard to restrict the captures to responses only, where the
> RD flag in the header is set to 1 (indicating that the original query
> was recursive). For the PC platform, there's also WireShark, but to be
> honest, I haven't played much with its filtering capabilities.

I'm not sure how to filter on the RD flag? Will this filter be sufficient or
do I also need the query packet to figure out what happened?:

tcpdump -s 1024 src port 53 and not src host ns1

Thanks for any help!

ds

Kevin Darcy

unread,

Aug 8, 2008, 6:44:14 PM8/8/08

to

Ugh, this is a bit of a difficult problem, especially if you're at the
1000qps level (lots of data to wade through, eyeballing is not really an
option).

Looking at the data between the client and the BIND server is probably
not going to be very useful, you'll just see a question come in, and, at
some point, a SERVFAIL response going back.

To get to the root cause, you'll probably want to look at the data
passing back and forth between the BIND boxes and rbldnsd, to pinpoint
why BIND caches a SERVFAIL in the first place. Is it a timeout? Is it a
SERVFAIL response from rbldnsd? Something else?

If there is a *specific* name you want to focus on, it's possible to do
that with tcpdump, but it's rather painful, e.g.

tcpdump -v -x udp and port 53 and 'udp[20] == 3' and 'udp[21] == 102'
and 'udp[22] == 111' and 'udp[23] == 111'

would limit the capture to only packets with a Question Section
containing a first label of "foo" (3 is the label size, 102 is the ASCII
code for "f", 111 is the ASCII code for "o"). The Question Section is
copied from the original query to the response, so this should catch
responses too.

If, on the other hand, you're trying to answer the question "why do I
get a SERVFAIL, some of the time, for some names, seemingly at random?",
then I don't know that a targeted tcpdump is going to help. You might
have to capture *everything*, detect the error, and then wade through
the data later.

- Kevin

David Sparks

unread,

Aug 8, 2008, 7:56:25 PM8/8/08

to

> If, on the other hand, you're trying to answer the question "why do I
> get a SERVFAIL, some of the time, for some names, seemingly at random?",
> then I don't know that a targeted tcpdump is going to help. You might
> have to capture *everything*, detect the error, and then wade through
> the data later.

This is exactly the problem. Is there some way to get named to dump its
internal state so you can see why its returning a (cached?) SERVFAIL for what
should be a working query? When I run rndc dumpdb and look at the output I
can't find anything about the query its returning SERVFAIL for, is there some
extra cache or rndc dumpdb flag I need to use?

This problem usually only happens on the busier of two named instances and
only after a week or so of uptime. If no-one else knows of this problem I
guess I need to start collecting more data.

Thanks for you reply,

ds

JINMEI Tatuya / 神明達哉

unread,

Aug 8, 2008, 8:19:31 PM8/8/08

to

At Fri, 08 Aug 2008 16:56:25 -0700,

Which version of BIND are you using?

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.

Paul Vixie

unread,

Aug 8, 2008, 10:48:11 PM8/8/08

to

k...@chrysler.com (Kevin Darcy) writes:

> If there is a *specific* name you want to focus on, it's possible to do
> that with tcpdump, but it's rather painful, e.g.
>
> tcpdump -v -x udp and port 53 and 'udp[20] == 3' and 'udp[21] == 102'
> and 'udp[22] == 111' and 'udp[23] == 111'
>
> would limit the capture to only packets with a Question Section
> containing a first label of "foo" (3 is the label size, 102 is the ASCII
> code for "f", 111 is the ASCII code for "o"). The Question Section is
> copied from the original query to the response, so this should catch
> responses too.

yow. looks WAY painful. have you tried dnscap? its CLI language has not
changed in the last six months, so if you were waiting for it to settle
out, now's your moment. https://www.dns-oarc.net/tools/dnscap has sources.

here's the top of "man dnscap":

---

NAME
dnscap -- DNS network traffic capture utility

SYNOPSIS
dnscap [-pd1g?6f] [-i if ...] [-r file ...] [-l vlan ...] [-x pat ...]
[-X pat ...] [-m [qun]] [-s [ir]] [-h [ir]] [-e [nytfsxir]]
[-a host ...] [-z host ...] [-A host ...] [-Z host ...] [-u port]
[-w base [-k cmd]] [-t lim] [-c lim]

DESCRIPTION
dnscap is a network capture utility designed specifically for DNS traf-
fic. It normally produces binary data in pcap(3) format, either on stan-
dard output or in successive dump files (based on the -w command line
option.) This utility is similar to tcpdump(1), but has finer grained
packet recognition tailored to DNS transactions and protocol options.
dnscap is expected to be used for gathering continuous research or audit
traces.

---

the output of "dnscap -\?" is below.

---

dnscap: version V1.0-RC6 (October 2007)

usage: dnscap
[-?pd1g6f] [-i <if>]+ [-r <file>]+ [-l <vlan>]+
[-u <port>] [-m [qun]] [-e [nytfsxir]]
[-h [ir]] [-s [ir]]
[-a <host>]+ [-z <host>]+ [-A <host>]+ [-Z <host>]+
[-w <base> [-k <cmd>]] [-t <lim>] [-c <lim>]
[-x <pat>]+ [-X <pat>]+
[-B <datetime>]+ [-E <datetime>]+

options:
-? or -? print these instructions and exit
-p do not put interface in promiscuous mode
-d dump verbose trace information to stderr
-1 flush output on every packet
-g dump packets dig-style on stderr
-6 compensate for PCAP/BPF IPv6 bug
-f include fragmented packets
-i <if> select this live interface(s)
-r <file> read this pcap file
-l <vlan> select only these vlan(s)
-u <port> dns port (default: 53)
-m [qun] select messages: query, update, notify
-s [ir] select sides: initiations, responses
-h [ir] hide initiators and/or responders
-e [nytfsxir] select error/response code
n = no error
y = any error
t = truncated response
f = format error (rcode 1)
s = server failure (rcode 2)
x = nxdomain (rcode 3)
i = not implemented (rcode 4)
r = refused (rcode 5)
-a <host> want messages from these initiator(s)
-z <host> want messages from these responder(s)
-A <host> want messages not from these initiator(s)
-Z <host> want messages not from these responder(s)
-w <base> dump to <base>.<timesec>.<timeusec>
-k <cmd> kick off <cmd> when each dump closes
-t <lim> close dump or exit every/after <lim> secs
-c <lim> close dump or exit every/after <lim> pkts
-x <pat> select messages matching regex <pat>
-X <pat> select messages not matching regex <pat>
-B <datetime> begin collecting at this date and time
-X <datetime> end collecting at this date and time

---

note that dnscap is like tcpdump in that it does not reassemble IP fragments.
this is because its output format (pcap) doesn't easily represent them. if
you think you're missing something from dnscap or tcpdump because of fragments
then you need "ncap", see <https://www.dns-oarc.net/tools/ncap> for sources.
the ncaptool filtering language is deep rather than wide like dnscap. you
can still do almost all the same things, but it looks different.

---

# $Id: FILTERS,v 1.7 2008/07/29 01:20:38 edmonds Exp $

* Design notes for NCAP filter specifications

Generally, a filter specification is a sequence of filters, each being
introduced by its family name and containing one or more family-specific
rules. The point isn't to duplicate what BPF can do, but rather to make
it unnecessary to think in BPF. NCAP filter specifications are application
specific, and filter can internally generate BPF to save on context switches.

* Filter Family: "dns"

regex[=#]RE/opts
regex must match (or not) the presentation form of a message, with
options like "i" for ignore-case

initiator[=#]address[,...]
message initiator must be (or not) on this list

target[=#]address[,...]
message target must be (or not) on this list

flags[=#]tc,aa,ra,rd,qr
these message flags must be set (or unset)

rcode[=#]noerror,formerr,servfail,nxdomain,notimpl,refused
these message response codes are wanted (or unwanted)
(note, they are only checked on responses, not initiations)

opcode[=#]query,update,notify
these message opcodes are wanted (or unwanted)

* Filter Family: "icmp"

type[=#]echoreply,unreach,sourcequench,redirect,echorequest,routeradvert
type[=#]routersolicit,timxceed,paramproblem,tstamprequest,tstampreply,
type[=#]inforeq,inforeply,maskreq,maskreply,traceroute
these icmp types are wanted (or unwanted)

type[=#]<type>
where <type> is an integer in the range 0..255
this icmp type is wanted (or unwanted)

code[=#]<code>
where <code> is an integer in the range 0..255
this icmp code is wanted (or unwanted)

---

anyway the point is, tcpdump is no longer our only option for dns traffic
capture and analysis, even though libpcap and bpf is still at the heart of
everything.
--
Paul Vixie

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

David Sparks

unread,

Aug 9, 2008, 1:24:57 PM8/9/08

to

ns1 ~ # named -v
BIND 9.5.0-P2

ns1 ~ # uname -a
Linux ns1 2.6.23-gentoo-r8 #1 SMP Tue Feb 19 12:49:46 PST 2008 x86_64 Intel(R)
Xeon(R) CPU 5130 @ 2.00GHz GenuineIntel GNU/Linux

ns1 ~ # /lib/libc.so.6
GNU C Library stable release version 2.6.1, by Roland McGrath et al.
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.1.2 (Gentoo 4.1.2 p1.1).
Compiled on a Linux >>2.6.23-gentoo-r8<< system on 2008-07-02.
Available extensions:
C stubs add-on version 2.1.2
crypt add-on version 2.1 by Michael Glad and others
Gentoo patchset 1.2
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
Support for some architectures added on, not maintained in glibc core.
BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

David Sparks

unread,

Aug 12, 2008, 7:25:27 PM8/12/08

to

>> tcpdump -v -x udp and port 53 and 'udp[20] == 3' and 'udp[21] == 102'
>> and 'udp[22] == 111' and 'udp[23] == 111'
>

> yow. looks WAY painful. have you tried dnscap? its CLI language has not
> changed in the last six months, so if you were waiting for it to settle
> out, now's your moment. https://www.dns-oarc.net/tools/dnscap has sources.

dnscap is excellent! Note that for my use case it would be preferable for the
-e flag to default to showing everything (and hence not be needed) instead of
discarding errors. Right now I execute `dnscap -e nytfsxir ...` which is
certainly not as painful as the tcpdump example above but not something you
want to type in all the time.

Overall dnscap is a great tool for debugging. I recommend it for anyone who
is looking at network streams.

Cheers,

ds