Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Queries to a positively cached zone are failing (phila.gov)

4 views
Skip to first unread message

Greg Chavez

unread,
Mar 14, 2006, 12:29:20 PM3/14/06
to
I work at a large .gov gateway whose BIND servers cannot resolve any
queries for phila.gov. I see bad domains all the time, and a quick
dump of the cache and a dig here and a dig there usually point to one
or more bad name servers. I've had this problem in the past with
phila.gov, a zone that our mail servers hit very often; usually, our
mail queues will get a little high until our forwarders replace their
caches with a fresh iterative query to the zone's working name server.
Life goes on.

We are experiencing a total phila.gov blackout right now. All queries
for it time out. But this time, we have both of phila.gov's name
servers in our cache with glue:

# grep -i phila.gov named_dump.db
phila.GOV. 85957 NS DNS.phila.gov.
85957 NS DNS2.phila.gov.
DNS.phila.GOV. 85957 A 170.115.249.10
DNS2.phila.GOV. 85957 A 170.115.249.11

If I do digs @ either NS IP, I get answers. Digs using my forwarders
time out. Dig traces get me the NS records for the dot-gov servers

Clearing my cache has no effect. I am utterly stumped... everything I
have ever seen before tells me that my name server *should* be seeing
this domain. What awful assumption(s) are keeping me from seeing the
problem?

Bind is a bit crusty: 9.2.2p3.

--
--Greg Chavez
--


Barry Margolin

unread,
Mar 14, 2006, 4:56:07 PM3/14/06
to
In article <dv72aj$17ac$1...@sf1.isc.org>,
"Greg Chavez" <greg....@gmail.com> wrote:

> I work at a large .gov gateway whose BIND servers cannot resolve any
> queries for phila.gov. I see bad domains all the time, and a quick
> dump of the cache and a dig here and a dig there usually point to one
> or more bad name servers. I've had this problem in the past with
> phila.gov, a zone that our mail servers hit very often; usually, our
> mail queues will get a little high until our forwarders replace their
> caches with a fresh iterative query to the zone's working name server.
> Life goes on.
>
> We are experiencing a total phila.gov blackout right now. All queries
> for it time out. But this time, we have both of phila.gov's name
> servers in our cache with glue:
>
> # grep -i phila.gov named_dump.db
> phila.GOV. 85957 NS DNS.phila.gov.
> 85957 NS DNS2.phila.gov.
> DNS.phila.GOV. 85957 A 170.115.249.10
> DNS2.phila.GOV. 85957 A 170.115.249.11
>
> If I do digs @ either NS IP, I get answers. Digs using my forwarders
> time out. Dig traces get me the NS records for the dot-gov servers

Rather than dump your cache, you need to look at the forwarder's cache.
And what happens if you try to query the nameservers from the forwarder?

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***


Greg Chavez

unread,
Mar 14, 2006, 6:18:15 PM3/14/06
to
On 3/14/06, Barry Margolin <bar...@alum.mit.edu> wrote:
> In article <dv72aj$17ac$1...@sf1.isc.org>,
> "Greg Chavez" <greg....@gmail.com> wrote:
>
> > We are experiencing a total phila.gov blackout right now. All queries
> > for it time out. But this time, we have both of phila.gov's name
> > servers in our cache with glue:
> >
> > # grep -i phila.gov named_dump.db
> > phila.GOV. 85957 NS DNS.phila.gov.
> > 85957 NS DNS2.phila.gov.
> > DNS.phila.GOV. 85957 A 170.115.249.10
> > DNS2.phila.GOV. 85957 A 170.115.249.11
> >
> > If I do digs @ either NS IP, I get answers. Digs using my forwarders
> > time out. Dig traces get me the NS records for the dot-gov servers
>
> Rather than dump your cache, you need to look at the forwarder's cache.
> And what happens if you try to query the nameservers from the forwarder?

What you see above *is* the forwarder's cache. There are four
forwarders total and each one has similar entries. Queries sent to
the forwarders' named process for phila.gov, whether locally or from
network clients, time out. Queries sent with dig to either of
phila.gov's name servers from the forwarders result in buttery
success:

> dig ns phila.gov @170.115.249.10

; <<>> DiG 8.3 <<>> ns phila.gov @170.115.249.10
; (1 server found)
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; QUERY SECTION:
;; phila.gov, type = NS, class = IN

;; ANSWER SECTION:
phila.gov. 1D IN NS dns.phila.gov.
phila.gov. 1D IN NS dns2.phila.gov.

;; ADDITIONAL SECTION:
dns.phila.gov. 1D IN A 170.115.249.10
dns2.phila.gov. 1D IN A 170.115.249.11

;; Total query time: 54 msec
;; FROM: lsmns1o.gtwy.uscourts.gov to SERVER: 170.115.249.10 170.115.249.10
;; WHEN: Tue Mar 14 16:58:10 2006
;; MSG SIZE sent: 27 rcvd: 96

Furthermore, packet sniffs show that BIND *is* sending out DNS queries
to phila.gov's name servers when it is forwarded requests from
internal clients. But it gets no response.

This tells me that *if* an unknown upstream filter (not likely, at
least not on my end) is causing mischief, it's filtering at the
application layer not the network layer. So the question is, what
distinguishes a dig-fashioned query from a BIND-fashioned query?

Thanks to Barry and anyone else who want to chime in.


Stephane Bortzmeyer

unread,
Mar 15, 2006, 4:19:41 AM3/15/06
to
On Tue, Mar 14, 2006 at 06:18:15PM -0500,
Greg Chavez <greg....@gmail.com> wrote
a message of 62 lines which said:

> This tells me that *if* an unknown upstream filter (not likely, at
> least not on my end) is causing mischief, it's filtering at the
> application layer not the network layer.

Examining in detail the queries with ethereal may yield some clues.

> So the question is, what distinguishes a dig-fashioned query from a
> BIND-fashioned query?

A few wild guesses:

* dig sets "rd" (recursion requested) by default ("dig +norec" to test
it)

* BIND may have been configured with a source address which is not the
default address of the machine (check the query-source option)


Ronan Flood

unread,
Mar 15, 2006, 7:32:58 AM3/15/06
to
Stephane Bortzmeyer <bortz...@nic.fr> wrote:

> > So the question is, what distinguishes a dig-fashioned query from a
> > BIND-fashioned query?
>
> A few wild guesses:
>
> * dig sets "rd" (recursion requested) by default ("dig +norec" to test
> it)
>
> * BIND may have been configured with a source address which is not the
> default address of the machine (check the query-source option)

Also BIND will be using EDNS, so try dig +dnssec or dig +bufsiz=2048 or so.
BIND should spot if the remote server doesn't support EDNS and turn it off.

--
Ronan Flood <R.F...@noc.ulcc.ac.uk>
working for but not speaking for
Network Services, University of London Computer Centre
(which means: don't bother ULCC if I've said something you don't like)


Stephane Bortzmeyer

unread,
Mar 15, 2006, 8:03:00 AM3/15/06
to
On Wed, Mar 15, 2006 at 12:32:58PM +0000,
Ronan Flood <ro...@noc.ulcc.ac.uk> wrote
a message of 21 lines which said:

> Also BIND will be using EDNS,

Congratulations:

% dig @dns.phila.gov NS phila.gov

; <<>> DiG 9.2.4 <<>> @dns.phila.gov NS phila.gov
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28811


;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2

;; QUESTION SECTION:
;phila.gov. IN NS

;; ANSWER SECTION:
phila.gov. 86400 IN NS dns.phila.gov.
phila.gov. 86400 IN NS dns2.phila.gov.

;; ADDITIONAL SECTION:
dns.phila.gov. 86400 IN A 170.115.249.10
dns2.phila.gov. 86400 IN A 170.115.249.11

;; Query time: 121 msec
;; SERVER: 170.115.249.10#53(dns.phila.gov)
;; WHEN: Wed Mar 15 14:01:57 2006
;; MSG SIZE rcvd: 96

% dig +bufsize=1024 @dns.phila.gov NS phila.gov

; <<>> DiG 9.2.4 <<>> +bufsize=1024 @dns.phila.gov NS phila.gov
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 58304
;; flags: qr rd ra; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; Query time: 119 msec
;; SERVER: 170.115.249.10#53(dns.phila.gov)
;; WHEN: Wed Mar 15 14:02:35 2006
;; MSG SIZE rcvd: 12


Now, BIND should retry without EDNS, no?


Greg Chavez

unread,
Mar 15, 2006, 9:52:10 AM3/15/06
to
On 3/15/06, Stephane Bortzmeyer <bortz...@nic.fr> wrote:
> Ronan Flood <ro...@noc.ulcc.ac.uk> wrote

> > Also BIND will be using EDNS,
>
> % dig +bufsize=1024 @dns.phila.gov NS phila.gov
>
> ; <<>> DiG 9.2.4 <<>> +bufsize=1024 @dns.phila.gov NS phila.gov
> ;; global options: printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 58304
> ;; flags: qr rd ra; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>
> ;; Query time: 119 msec
> ;; SERVER: 170.115.249.10#53(dns.phila.gov)
> ;; WHEN: Wed Mar 15 14:02:35 2006
> ;; MSG SIZE rcvd: 12
>
>
> Now, BIND should retry without EDNS, no?

First thing I did when I saw Ronan's message was to slap myself on the
head. The second thing I did was add this to named.conf:

server 170.115.249.10 { edns no;};
server 170.115.249.11 { edns no;};

The third thing I did was test it and the fourth thing I did was slap
myself again when it didn't work. Same old same old. Dig queries to
the phila.gov name servers work; queries by BIND time out.

Times out: that's an important distinction. BIND doesn't get back a
FORMERR; the remote name server *never responds* to the query.

; <<>> DiG 8.3 <<>> ns phila.gov


;; res options: init recurs defnam dnsrch

;; res_nsend to server default -- 127.0.0.1: Connection timed out

Here are the results of my packet analysis using snoop. The Ether and
IP were identical. The DNS headers are mostly similar, save for the
RD flag. I don't know what else to look for.

DIG-to-phila.gov (gets NXDOMAIN response):

ETHER: ----- Ether Header -----
ETHER:
ETHER: Packet 1 arrived at 8:45:20.51
ETHER: Packet size = 74 bytes
ETHER: Destination = 0:11:43:dc:3a:38,
ETHER: Source = 0:3:ba:2:e9:e6,
ETHER: Ethertype = 0800 (IP)
ETHER:
IP: ----- IP Header -----
IP:
IP: Version = 4
IP: Header length = 20 bytes
IP: Type of service = 0x00
IP: xxx. .... = 0 (precedence)
IP: ...0 .... = normal delay
IP: .... 0... = normal throughput
IP: .... .0.. = normal reliability
IP: Total length = 60 bytes
IP: Identification = 37693
IP: Identification = 37693
IP: Flags = 0x4
IP: .1.. .... = do not fragment
IP: ..0. .... = last fragment
IP: Fragment offset = 0 bytes
IP: Time to live = 255 seconds/hops
IP: Protocol = 17 (UDP)
IP: Header checksum = 720a
IP: Source address = 10.X.X.25, 10.X.X.25
IP: Destination address = 170.115.249.10, 170.115.249.10
IP: No options
IP:
UDP: ----- UDP Header -----
UDP:
UDP: Source port = 33102
UDP: Destination port = 53 (DNS)
UDP: Length = 40
UDP: Checksum = 1076
UDP:
DNS: ----- DNS Header -----
DNS:
DNS: Query ID = 4
DNS: Opcode: Query
DNS: RD (Recursion Desired)
DNS: 1 question(s)
DNS: Domain Name: test.phila.gov.
DNS: Class: 1 (Internet)
DNS: Type: 1 (Address)
DNS:


0: 0011 43dc 3a38 0003 ba02 e9e6 0800 4500 ..C.:8........E.
16: 003c 933d 4000 ff11 720a 0ad1 c819 aa73 .<.=@...r......s
32: f90a 814e 0035 0028 1076 0004 0100 0001 ...N.5.(.v......
48: 0000 0000 0000 0474 6573 7405 7068 696c .......test.phil
64: 6103 676f 7600 0001 0001 a.gov.....

BIND-to-phila.gov (no edns, times out):

ETHER: ----- Ether Header -----
ETHER:
ETHER: Packet 3 arrived at 8:43:16.49
ETHER: Packet size = 74 bytes
ETHER: Destination = 0:11:43:dc:3a:38,
ETHER: Source = 0:3:ba:2:e9:e6,
ETHER: Ethertype = 0800 (IP)
ETHER:
IP: ----- IP Header -----
IP:
IP: Version = 4
IP: Header length = 20 bytes
IP: Type of service = 0x00
IP: xxx. .... = 0 (precedence)
IP: ...0 .... = normal delay
IP: .... 0... = normal throughput
IP: .... .0.. = normal reliability
IP: Total length = 60 bytes
IP: Identification = 51839
IP: Flags = 0x4
IP: .1.. .... = do not fragment
IP: ..0. .... = last fragment
IP: Fragment offset = 0 bytes
IP: Time to live = 255 seconds/hops
IP: Protocol = 17 (UDP)
IP: Header checksum = 3ac8
IP: Source address = 10.X.X.25, 10.X.X.25
IP: Destination address = 170.115.249.10, 170.115.249.10
IP: No options
IP:
UDP: ----- UDP Header -----
UDP:
UDP: Source port = 32768
UDP: Destination port = 53 (DNS)
UDP: Length = 40
UDP: Checksum = 7233
UDP:
DNS: ----- DNS Header -----
DNS:
DNS: Query ID = 41108
DNS: Opcode: Query
DNS:
DNS: 1 question(s)
DNS: Domain Name: test.phila.gov.
DNS: Class: 1 (Internet)
DNS: Type: 1 (Address)
DNS:


0: 0011 43dc 3a38 0003 ba02 e9e6 0800 4500 ..C.:8........E.
16: 003c ca7f 4000 ff11 3ac8 0ad1 c819 aa73 .<..@...:......s
32: f90a 8000 0035 0028 7233 a094 0000 0001 .....5.(r3......
48: 0000 0000 0000 0474 6573 7405 7068 696c .......test.phil
64: 6103 676f 7600 0001 0001 a.gov.....


These packets go through a pix firewall before they reach the wild.
One of our network engineers was able to confirm that both of these
packets were leaving the firewall. I also got a hold of the phila.gov
folks for a time, although they have yet to come close to figuring out
how to check DNS and network logs and tables. I am hoping that they
will soon provide me with reasonable troubleshooting data.

--
--Greg Chavez
--


Greg Chavez

unread,
Mar 15, 2006, 3:00:24 PM3/15/06
to
On Mar 15, 2006, at 14:52, Greg Chavez wrote:

> The third thing I did was test it and the fourth thing I did was slap
> myself again when it didn't work. Same old same old. Dig queries to
> the phila.gov name servers work; queries by BIND time out.
>
> Times out: that's an important distinction. BIND doesn't get back a
> FORMERR; the remote name server *never responds* to the query.
>

> These packets go through a pix firewall before they reach the wild.

I and our network team are concentrating on the possibility that our
PIX firewall, which performs minor surgery on DNS packets for NAT
purposes, may be having trouble accepting "We don't speak EDNS"
responses from phila.gov's name servers, which may be running BIND 4.
If anybody else has any insight as far as PIX and EDNS goes or thinks
we're barking up the wrong tree, please come forward. Otherwise, I'll
close out this thread when we reach a solution.


Jeff Reasoner

unread,
Mar 15, 2006, 3:28:01 PM3/15/06
to
PIX 6.3.3 and above allows udp datagrams >512 bytes. Upgrading will
require a reboot.
add something like:
fixup protocol dns maximum-length 4092
0 new messages