new bind 9.9 and root NS

dkol...@olearycomputers.com

unread,

Jul 31, 2012, 5:16:43 PM7/31/12

to comp-protoc...@isc.org

Hi;

I have a client who's migrating from an old bind 9.3 installation to a
new bind 9.9. I've done the migration and everything seemed to be
running fine. Before switching the internic pointers, though, the
client gave it a good thorough trashing and they're finding some
issues.

On the new system, the first time a domain outside of the client's
authoritative space is queried, the response takes longer than it
should. Obviously, non-cached searches will take longer, but these
are taking *way* longer:

# rndc flush
# time host www.olearycomputers.com.
www.olearycomputers.com has address 69.246.199.78
real 0m7.62s
user 0m0.00s
sys 0m0.00s

The old server beats that by more than 3 seconds:

[root]# rndc flush
[root]# time host www.olearycomputers.com.
www.olearycomputers.com has address 69.246.199.78
real 0m3.334s
user 0m0.003s
sys 0m0.003s

A dig trace on the old box looks resonable:

# dig +trace www.olearycomputers.com
; <<>> DiG 9.3.4 <<>> +trace www.olearycomputers.com
;; global options: printcmd
[[root ns snipped]]
;; Received 512 bytes from 143.43.32.201#53(143.43.32.201) in 1 ms
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
[[remaining .com NS snipped]]
;; Received 501 bytes from 192.5.5.241#53(f.root-servers.net) in 71 ms
olearycomputers.com. 172800 IN NS ns3.no-ip.com.
olearycomputers.com. 172800 IN NS ns1.no-ip.com.
olearycomputers.com. 172800 IN NS ns4.no-ip.com.
olearycomputers.com. 172800 IN NS ns5.no-ip.com.
;; Received 211 bytes from 192.35.51.30#53(f.gtld-servers.net) in 77
ms
www.olearycomputers.com. 60 IN A 69.246.199.78
olearycomputers.com. 86400 IN NS ns5.no-ip.com.
[[etc]]
;; Received 289 bytes from 204.16.253.33#53(ns3.no-ip.com) in 34 ms

On the new box, I get nowhere:

# dig +trace www.olearycomputers.com
; <<>> DiG 9.9.1-P1-RedHat-9.9.1-2.P1.fc17 <<>> +trace www.olearycomputers.com
;; global options: +cmd
. 517932 IN NS g.root-servers.net.
. 517932 IN NS e.root-servers.net.
[[some root ns snipped]]
518025 IN RRSIG NS 8 0 518400 20120807000000 20120730230000 50398 .
ICR2HkAQdy85QN3+i3lpLqoFc11zE/ZTNiBcb9F6dyglatHsX+dvWdJS 1laG5xA//M/
OfFCALDy/xApk/Thnh20mTeEtXiiB0IEBFE17B3NgTggO gqbhk7sWt0m7SyDbXgHLbbFB
+xyLMbT3bOaUUVf7470Cnx6eTI8Q5Hco PVs=
;; Received 857 bytes from 143.43.32.170#53(143.43.32.170) in 5 ms
;; connection timed out; no servers could be reached

A straight hit to one of the root ns on the new box is equally as bad:

# dig @a.root-servers.net.
; <<>> DiG 9.9.1-P1-RedHat-9.9.1-2.P1.fc17 <<>> @a.root-servers.net.
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

But, on the old box works like a champ:

# ssh ${old} 'dig @a.root-servers.net.'
; <<>> DiG 9.3.4 <<>> @a.root-servers.net.
; (2 servers found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1160
;; flags: qr aa rd; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 14
;; QUESTION SECTION:
;. IN NS
;; ANSWER SECTION:
[[sniped]]
;; Query time: 25 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Tue Jul 31 15:50:47 2012
;; MSG SIZE rcvd: 512

Can someone tell me why the root ns don't seem to like the new bind
9.9 systems?

Thanks for any hints/tips/suggestions.

Doug O'Leary

------
Senior UNIX Admin
O'Leary Computers Inc
linkedin: http://www.linkedin.com/dkoleary
Resume: http://www.olearycomputers.com/resume.html

Michael Hoskins (michoski)

unread,

Aug 6, 2012, 2:05:42 AM8/6/12

to dkol...@olearycomputers.com, comp-protoc...@isc.org

-----Original Message-----

This almost sounds like an upstream firewall or proxy with faulty protocol
"fixups". If you do a query and EDNS is blocked or improperly configured
a "fall back" will occur which causes queries to take longer or possibly
timeout.

Since it's a new IP, are you sure ACLs are allowing any to 53/tcp and
53/udp on your new name server and from your name server to any on the
same ports?

What are you seeing in named's logs?

https://kb.isc.org/article/AA-00708/55/Why-does-BIND-log-messages-about-dis
abling-EDNS-or-reducing-the-advertised-packet-size

Doug Barton

unread,

Aug 6, 2012, 2:28:17 AM8/6/12

to Michael Hoskins (michoski), dkol...@olearycomputers.com, comp-protoc...@isc.org

On 08/05/2012 23:05, Michael Hoskins (michoski) wrote:

> This almost sounds like an upstream firewall or proxy with faulty protocol
> "fixups". If you do a query and EDNS is blocked or improperly configured
> a "fall back" will occur which causes queries to take longer or possibly
> timeout.

+1

https://www.dns-oarc.net/oarc/services/replysizetest

--

I am only one, but I am one. I cannot do everything, but I can do
something. And I will not let what I cannot do interfere with what
I can do.
-- Edward Everett Hale, (1822 - 1909)

Michael Hoskins (michoski)

unread,

Aug 6, 2012, 1:56:38 PM8/6/12

to Doug O'Leary, comp-protoc...@isc.org

-----Original Message-----

From: Doug O'Leary <dkol...@olearycomputers.com>
Date: Monday, August 6, 2012 9:58 AM
To: 'Doug Barton' <do...@dougbarton.us>, Mike Hoskins <mich...@cisco.com>
Cc: "comp-protoc...@isc.org" <comp-protoc...@isc.org>
Subject: RE: new bind 9.9 and root NS

>After the network admin verified there was no firewall rule differences,
>we
>powered off the old secondary server and re-IPed the new one with the old
>secondary. The old secondary is able to get to the root nameservers w/o
>issue. Once we re-IPed the new one, it still was unable to get to the
>root
>nameservers via dig.

Just checking the obvious; no host-based firewall on the new box? Is it
the same OS?

>I also downloaded and installed lft - layer four traceroute (wonderful
>program, that one is). Lft was unable to get *anywhere* using udp
>regardless of what the IP address of the new system is. So, there's
>something with the virtualization software, vmware, which is preventing
>udp
>packets. There are some web sites saying the same thing so this isn't
>completely out of the blue. The client's opening a service call with
>vmware
>to see if there's a resolution.

I'm serving several thousand clients using VMware + BIND, so I'm curious
to see where this goes. :-)

Which VMware product are you using, and what host platform?

Thanks!