Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Spotty Lookups on One of Our Networks

23 views
Skip to first unread message

Martin McCormick

unread,
Oct 30, 2012, 4:10:34 PM10/30/12
to bind-...@lists.isc.org
I don't thing this is a bind problem because this
particular network has some Microsoft DNS's that are doing
exactly the same thing.

There are several domains names that are broken in this
network and the symptome is always the same:

Dig +trace @localhost one.bad.domain.com.

We see all the root name servers listed. We get the TLD
servers next and, from one of them, we get authoritative DNS's
from bad.domain.com where we should get the IP address from
one.bad.domain.com. That is where it breaks. It times out with
no authoritative servers that will talk to us. One site is
noaa.gov which is the National Oceanic and Atmospheric
Administration in the United States.

We have no trouble at all resolving them from our
network so I filled in the missing information for the
authoritative domain name servers and hard-coded one or two of
them in lookups on the problem network and, no surprise, the
lookup still times out.

One really good lead evaporated when I discovered that
this network still had a 512-byte limit on its firewall so we
thought this might be the problem but no such luck. The firewall
now passes edns packets just fine, but nothing has really
changed.

Any ideas as to what prevents some lookups from
resolving. Others do resolve.

We have been kicking this problem around for about a
week and the customers, there, are getting a bit restless. They
are connected to the same ISP we are and we are not having any
problems like this.

There seems to be no reason why some remote domains
work and others don't. I am asking on this list in hopes that
somebody has seen something like this somewhere else and found
the cause.

Thank you.

Martin McCormick WB5AGZ Stillwater, OK
Systems Engineer
OSU Information Technology Department Telecommunications Services Group

Barry Margolin

unread,
Oct 30, 2012, 4:38:02 PM10/30/12
to comp-protoc...@isc.org
In article <mailman.538.1351627...@lists.isc.org>,
Martin McCormick <mar...@dc.cis.okstate.edu> wrote:

> I don't thing this is a bind problem because this
> particular network has some Microsoft DNS's that are doing
> exactly the same thing.
>
> There are several domains names that are broken in this
> network and the symptome is always the same:
>
> Dig +trace @localhost one.bad.domain.com.
>
> We see all the root name servers listed. We get the TLD
> servers next and, from one of them, we get authoritative DNS's
> from bad.domain.com where we should get the IP address from
> one.bad.domain.com. That is where it breaks. It times out with

I'm not sure what you mean by that sentence about getting authoritative
DNSs from X when it sbould be from Y. Can you post the actual dig?

BTW, @servername doesn't mean much when using +trace, since +trace
queries the servers listed in NS records, not a resolver.

> no authoritative servers that will talk to us. One site is
> noaa.gov which is the National Oceanic and Atmospheric
> Administration in the United States.
>
> We have no trouble at all resolving them from our
> network so I filled in the missing information for the
> authoritative domain name servers and hard-coded one or two of
> them in lookups on the problem network and, no surprise, the
> lookup still times out.

What happens if you try to telnet to port 53 on the auth nameservers
from your local resolvers? What about traceroute?

>
> One really good lead evaporated when I discovered that
> this network still had a 512-byte limit on its firewall so we
> thought this might be the problem but no such luck. The firewall
> now passes edns packets just fine, but nothing has really
> changed.
>
> Any ideas as to what prevents some lookups from
> resolving. Others do resolve.
>
> We have been kicking this problem around for about a
> week and the customers, there, are getting a bit restless. They
> are connected to the same ISP we are and we are not having any
> problems like this.
>
> There seems to be no reason why some remote domains
> work and others don't. I am asking on this list in hopes that
> somebody has seen something like this somewhere else and found
> the cause.
>
> Thank you.
>
> Martin McCormick WB5AGZ Stillwater, OK
> Systems Engineer
> OSU Information Technology Department Telecommunications Services Group

--
Barry Margolin
Arlington, MA

John Miller

unread,
Oct 30, 2012, 4:46:47 PM10/30/12
to bind-...@lists.isc.org
Hi Martin,

Just to clarify, how many domain names are doing this for you? Are they
all remote domains, or are some of them okstate.edu domains?

John
--
John Miller
Systems Engineer
Brandeis University
john...@brandeis.edu

On 10/30/2012 04:10 PM, Martin McCormick wrote:
> I don't thing this is a bind problem because this
> particular network has some Microsoft DNS's that are doing
> exactly the same thing.
>
> There are several domains names that are broken in this
> network and the symptome is always the same:
>
> Dig +trace @localhost one.bad.domain.com.
>
> We see all the root name servers listed. We get the TLD
> servers next and, from one of them, we get authoritative DNS's
> from bad.domain.com where we should get the IP address from
> one.bad.domain.com. That is where it breaks. It times out with
> no authoritative servers that will talk to us. One site is
> noaa.gov which is the National Oceanic and Atmospheric
> Administration in the United States.
>
> We have no trouble at all resolving them from our
> network so I filled in the missing information for the
> authoritative domain name servers and hard-coded one or two of
> them in lookups on the problem network and, no surprise, the
> lookup still times out.
>
> One really good lead evaporated when I discovered that
> this network still had a 512-byte limit on its firewall so we
> thought this might be the problem but no such luck. The firewall
> now passes edns packets just fine, but nothing has really
> changed.
>
> Any ideas as to what prevents some lookups from
> resolving. Others do resolve.
>
> We have been kicking this problem around for about a
> week and the customers, there, are getting a bit restless. They
> are connected to the same ISP we are and we are not having any
> problems like this.
>
> There seems to be no reason why some remote domains
> work and others don't. I am asking on this list in hopes that
> somebody has seen something like this somewhere else and found
> the cause.
>
> Thank you.
>
> Martin McCormick WB5AGZ Stillwater, OK
> Systems Engineer
> OSU Information Technology Department Telecommunications Services Group
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>
> bind-users mailing list
> bind-...@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>

Martin McCormick

unread,
Oct 30, 2012, 5:03:26 PM10/30/12
to bind-...@lists.isc.org
John Miller writes:
> Just to clarify, how many domain names are doing this for you? Are they
> all
> remote domains, or are some of them okstate.edu domains?

They are all remote as far as I can tell.

I will have some answers for Barry Margolin's questions a bit
later. It seems like the tear of DNS's closest to the failed
lookup is that what is failing to be reachable.

My own theory is that it is specific to port 53.

Thanks to both.

Mark Andrews

unread,
Oct 30, 2012, 6:18:28 PM10/30/12
to Martin McCormick, bind-...@isc.org

In message <201210302010....@x.it.okstate.edu>, Martin McCormick wri
tes:
> I don't thing this is a bind problem because this
> particular network has some Microsoft DNS's that are doing
> exactly the same thing.
>
> There are several domains names that are broken in this
> network and the symptome is always the same:
>
> Dig +trace @localhost one.bad.domain.com.
>
> We see all the root name servers listed. We get the TLD
> servers next and, from one of them, we get authoritative DNS's
> from bad.domain.com where we should get the IP address from
> one.bad.domain.com. That is where it breaks. It times out with
> no authoritative servers that will talk to us. One site is
> noaa.gov which is the National Oceanic and Atmospheric
> Administration in the United States.

Newer versions of dig turn on +dnssec with +trace (you can do +trace
+nodnssec +noedns to get the old behaviour back) as that better
reflects what the nameserver does. The nameserver will retry with
a lower EDNS UDP buffer size, dig won't.

They are most probably dropping IP fragments at the firewall. Fixing
the 512 byte limit (below) is only the first step.
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org

Martin McCormick

unread,
Oct 31, 2012, 8:54:02 AM10/31/12
to comp-protoc...@isc.org
I described a case where one of our remote campuses can't
resolve a number of remote domains. One example is noaa.gov. It
also successfully resolves random remote domains without
seemingly any rime or reason.

Here is a bad dig trace for noaa.gov


; <<>> DiG 9.7.7 <<>> @localhost +trace noaa.gov
; (2 servers found)
;; global options: +cmd
. 453464 IN NS b.root-servers.net.
. 453464 IN NS l.root-servers.net.
. 453464 IN NS a.root-servers.net.
. 453464 IN NS i.root-servers.net.
. 453464 IN NS j.root-servers.net.
. 453464 IN NS f.root-servers.net.
. 453464 IN NS g.root-servers.net.
. 453464 IN NS e.root-servers.net.
. 453464 IN NS h.root-servers.net.
. 453464 IN NS d.root-servers.net.
. 453464 IN NS c.root-servers.net.
. 453464 IN NS k.root-servers.net.
. 453464 IN NS m.root-servers.net.
;; Received 512 bytes from 127.0.0.1#53(127.0.0.1) in 320 ms

gov. 172800 IN NS b.gov-servers.net.
gov. 172800 IN NS a.gov-servers.net.
;; Received 133 bytes from 192.58.128.30#53(192.58.128.30) in 210 ms

noaa.gov. 86400 IN NS ns-e.noaa.gov.
noaa.gov. 86400 IN NS ns-mw.noaa.gov.
noaa.gov. 86400 IN NS ns-nw.noaa.gov.

This trace took several minutes since no successful
resolution was made.

Here is a good trace using our DNS.


; <<>> DiG 9.8.1-P1 <<>> +trace @localhost noaa.gov
; (2 servers found)
;; global options: +cmd
. 369104 IN NS d.root-servers.net.
. 369104 IN NS j.root-servers.net.
. 369104 IN NS b.root-servers.net.
. 369104 IN NS g.root-servers.net.
. 369104 IN NS i.root-servers.net.
. 369104 IN NS e.root-servers.net.
. 369104 IN NS l.root-servers.net.
. 369104 IN NS m.root-servers.net.
. 369104 IN NS h.root-servers.net.
. 369104 IN NS f.root-servers.net.
. 369104 IN NS c.root-servers.net.
. 369104 IN NS a.root-servers.net.
. 369104 IN NS k.root-servers.net.
;; Received 512 bytes from 127.0.0.1#53(127.0.0.1) in 497 ms

gov. 172800 IN NS a.gov-servers.net.
gov. 172800 IN NS b.gov-servers.net.
;; Received 133 bytes from 192.112.36.4#53(192.112.36.4) in 439 ms

noaa.gov. 86400 IN NS ns-e.noaa.gov.
noaa.gov. 86400 IN NS ns-mw.noaa.gov.
noaa.gov. 86400 IN NS ns-nw.noaa.gov.
;; Received 133 bytes from 69.36.157.30#53(69.36.157.30) in 224 ms

noaa.gov. 86400 IN A 140.90.200.21
noaa.gov. 86400 IN A 140.172.17.21
noaa.gov. 86400 IN A 129.15.96.21
noaa.gov. 86400 IN NS ns-e.noaa.gov.
noaa.gov. 86400 IN NS ns-mw.noaa.gov.
noaa.gov. 86400 IN NS ns-nw.noaa.gov.
;; Received 181 bytes from 140.90.33.237#53(140.90.33.237) in 37 ms

Carsten Strotmann

unread,
Oct 31, 2012, 9:28:56 AM10/31/12
to comp-protoc...@moderators.individual.net

Hello Martin,

Martin McCormick <mar...@dc.cis.okstate.edu> writes:

> I described a case where one of our remote campuses can't
> resolve a number of remote domains. One example is noaa.gov. It
> also successfully resolves random remote domains without
> seemingly any rime or reason.
>
> Here is a bad dig trace for noaa.gov
>
[...]

<http://www.zonecut.net/dns> shows that
nameserver ns-e.noaa.gov is not responding

The dig +trace might "hang" if that authoritative DNS server is selected
for the query.

"ns-mw.noaa.gov" and "ns-nw.noaa.gov" operate fine. "ns-e" could mean
"east coast".

-- Carsten

Barry Margolin

unread,
Oct 31, 2012, 1:48:47 PM10/31/12
to comp-protoc...@isc.org
In article <mailman.544.1351690...@lists.isc.org>,
Did the problem coincide with Hurricane Sandy? That would explain
inability to reach many east coast servers. Resolvers should work around
this by failing over to other servers (assuming the organization has
them geographically distributed, as NOAA.GOV does), but dig +trace
doesn't.

John Miller

unread,
Oct 31, 2012, 2:17:44 PM10/31/12
to Martin McCormick, bind-...@isc.org
Martin, what do you see if you do a packet capture on the host where you're running dig?  How 'bout at the border of your network?  Obviously traffic's not making it through, but where?  Any sort of split routing paths that might be involved?

John

On Wed, Oct 31, 2012 at 8:54 AM, Martin McCormick <mar...@dc.cis.okstate.edu> wrote:
I described a case where one of our remote campuses can't
resolve a number of remote domains. One example is noaa.gov. It
also successfully resolves random remote domains without
seemingly any rime or reason.

        Here is a bad dig trace for noaa.gov


_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-...@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users



--
John Miller
Systems Engineer
Brandeis University
john...@brandeis.edu
(781) 736-4619

Martin McCormick

unread,
Oct 31, 2012, 3:29:39 PM10/31/12
to comp-protoc...@isc.org
The system hung long enough to have timed out on every
possible DNS that it could have tried so it should have gotten
to one.

Barry Margolin writes:
> Did the problem coincide with Hurricane Sandy? That would explain
> inability to reach many east coast servers. Resolvers should work around
> this by failing over to other servers (assuming the organization has
> them geographically distributed, as NOAA.GOV does), but dig +trace
> doesn't.

Thank you very much for your suggestions.
We are more or less in a waiting mode right now as the
network staff on our remote campus check some settings on their
firewall. We know now this is almost certainly not a bind issue
as we have discovered many remote networks that seem to have no
TCP/IP connectivity from the remote campus but are perfectly
reachable from here.

We started receiving complaints about a week ago so the
hurricane is not to blame.

I will let the group know what happened as soon as we
find out, ourselves.

Martin McCormick
0 new messages