What's probably happening is that there's disagreement among the
authoritative servers for the domain. The name exists on some servers but
not on the others. If you ask a server in the second group you'll get an
NXDOMAIN error and cache it.
>I have set "max-ncache-ttl" to 900 this seems to have helped, but not
>eliminated the intermittent lookup failure for a valid domain name.
>Is it wise to lower the limit to say 600 or 300? I know doing so will
>increase the load on the servers but is there any other reason why I should
>not do that?
Not only will it increase the load on your servers, but all the other
servers out there due to your increased number of queries.
>One other question, As I stated earlier we have 2 DNS server that handle all
>external DNS lookups. Is there that much latency on the internet that a
>request times out with out being able to find the name on the first try? Or
>are there a lot of DNS domains and or DNS servers that are not configured
>correctly? I sometimes find myself troubleshooting other companies DNS
>issues because it affects our company's ability to get important email
>though to a partner. Any thought would be helpful.
There are lots of domains (especially reverse domains) that aren't
configured correctly.
--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
-RCE
>If decreasing the time for the negative cache is not a good thing, any ideas
>of what I can do to help us resolve the domain names. We have mail that sits
>in the queues.
shorten your mail queue_retry time and increase queue_lifetime, since
probably the mail will punch through eventually.
> We have a current db.cache file. Could the problems be
>Internet Latency?
could be anything.
when you see these mails deferred, dig for its MX to see whether you get an
answer.
what are the deferral reasons in the maillog?
Some MX's really suck. webtv is the worst I know, zwallet's next, while
yahoo can have problems, but consistent deferrals MSN and Hotmail MX's show
MS really doesn't know how to get their $billions to support infrastructure
mail volumes. AOL MX's typically don't show any mail deferrals.
Len
http://MenAndMice.com/DNS-training
http://BIND8NT.MEIway.com : ISC BIND 8.2.4 for NT4 & W2K
http://IMGate.MEIway.com : Build free, hi-perf, anti-abuse mail gateways
What Barry is saying is that the symptom you describe is because
the domains or DNS are set up wrong, and may have nothing to do
with the server that is resolving them. If you have high
latency, you fail to get an answer, this isn't cached (Although
BIND remembers if a server is particularly slow in responding so
it can try others first).
NXDOMAIN is only cached if something tells you that a domain
doesn't exist. So reducing max-ncache-ttl may help you requery
one of the servers that is working correctly, but it doesn't fix
the broken servers.
Mail sitting in queues should be a DIFFERENT problem, and not
caused by NXDOMAIN responses.
If I mail fr...@nonesuchdomain.com I get an NXDOMAIN from one of
the GTLD-SERVERS.NET.
Immediately my mailer gives up and says;
<fr...@nonesuchdomain.com>: Name service error for
nonesuchdomain.com: Host not found
Why would a mailer hang on to mail for a domain that doesn't
exist? It can never be delivered? (Unless someone registers
nonesuchdomain.com, and sets the DNS up shortly after you sent
the message ;).
Are you sure your getting NXDOMAIN, and not some other error?
What does "mailq" show?
How can I tell?
Below are cache dumps from different times. The first two are from our
Internal DNS servers that that handle all external DNS queries. At that
particular point in time both DNS server could find the Domain, the NS, A
and MX records. The third one is from the same server as #1 but at a
different time. What does the LAME= mean? I was not able to lookup the A
record or the MX record.
1. From DNS server #1
samsung 3656 IN NS nic.samsung.co.kr. ;Cr=answer [203.255.234.103]
3656 IN NS red.samsung.co.kr. ;Cr=answer [203.255.234.103]
3656 IN NS green.samsung.co.kr. ;Cr=answer [203.255.234.103]
43161 IN A 203.254.192.15 ;Cr=answer [203.241.135.135]
57675 IN MX 0 imail00.samsung.co.kr.;Cr=answer [203.241.135.135]
2. From DNS server #2
samsung 2203 IN NS nic.samsung.co.kr. ;Cr=addtnl [203.248.240.141]
2203 IN NS red.samsung.co.kr. ;Cr=addtnl [203.248.240.141]
2203 IN NS green.samsung.co.kr. ;Cr=addtnl [203.248.240.141]
43169 IN A 203.254.192.15 ;Cr=answer [203.241.135.130]
43188 IN MX 0 imail00.samsung.co.kr.;Cr=answer [203.241.135.130]
3. From DNS server #1 At a different time
samsung 3138 IN NS nic.samsung.co.kr. ;Cr=addtnl LAME=158 [203.255.234.103]
3138 IN NS red.samsung.co.kr. ;Cr=addtnl LAME=157 [203.255.234.103]
3138 IN NS green.samsung.co.kr.;Cr=addtnl LAME=158 [203.255.234.103]
Performing a nslookup in debug mode I get the following. The below lookups
gave me results #3 above. What does the SERVFAIL really mean? This has
been an intermittent issue for the past month or so.
Thanks for your help
RCE
> samsung.co.kr.
Server: rootdns1.agere.com
Address: 192.19.192.98
;; res_nmkquery(QUERY, samsung.co.kr, IN, A)
------------
Got answer:
HEADER:
opcode = QUERY, id = 27819, rcode = SERVFAIL
header flags: response, want recursion
questions = 1, answers = 0, authority records = 0, additional = 0
QUESTIONS:
samsung.co.kr, type = A, class = IN
------------
*** rootdns1.agere.com can't find samsung.co.kr.: Server failed
> set type=mx
> samsung.co.kr.
Server: rootdns1.agere.com
Address: 192.19.192.98
;; res_nmkquery(QUERY, samsung.co.kr, IN, MX)
------------
Got answer:
HEADER:
opcode = QUERY, id = 44586, rcode = SERVFAIL
header flags: response, want recursion
questions = 1, answers = 0, authority records = 0, additional = 0
QUESTIONS:
samsung.co.kr, type = MX, class = IN
------------
*** rootdns1.agere.com can't find samsung.co.kr.: Server failed
> set type=any
> samsung.co.kr.
Server: rootdns1.agere.com
Address: 192.19.192.98
;; res_nmkquery(QUERY, samsung.co.kr, IN, ANY)
------------
Got answer:
HEADER:
opcode = QUERY, id = 44587, rcode = NOERROR
header flags: response, want recursion
questions = 1, answers = 3, authority records = 3, additional = 3
QUESTIONS:
samsung.co.kr, type = ANY, class = IN
ANSWERS:
-> samsung.co.kr
nameserver = nic.samsung.co.kr
ttl = 3205 (53m25s)
-> samsung.co.kr
nameserver = red.samsung.co.kr
ttl = 3205 (53m25s)
-> samsung.co.kr
nameserver = green.samsung.co.kr
ttl = 3205 (53m25s)
AUTHORITY RECORDS:
-> samsung.co.kr
nameserver = nic.samsung.co.kr
ttl = 3205 (53m25s)
-> samsung.co.kr
nameserver = red.samsung.co.kr
ttl = 3205 (53m25s)
-> samsung.co.kr
nameserver = green.samsung.co.kr
ttl = 3205 (53m25s)
ADDITIONAL RECORDS:
-> nic.samsung.co.kr
internet address = 203.241.132.34
ttl = 299 (4m59s)
-> red.samsung.co.kr
internet address = 203.241.135.130
ttl = 54764 (15h12m44s)
-> green.samsung.co.kr
internet address = 203.241.135.135
ttl = 48523 (13h28m43s)
------------
Non-authoritative answer:
samsung.co.kr
nameserver = nic.samsung.co.kr
ttl = 3205 (53m25s)
samsung.co.kr
nameserver = red.samsung.co.kr
ttl = 3205 (53m25s)
samsung.co.kr
nameserver = green.samsung.co.kr
ttl = 3205 (53m25s)
Authoritative answers can be found from:
samsung.co.kr
nameserver = nic.samsung.co.kr
ttl = 3205 (53m25s)
samsung.co.kr
nameserver = red.samsung.co.kr
ttl = 3205 (53m25s)
samsung.co.kr
nameserver = green.samsung.co.kr
ttl = 3205 (53m25s)
nic.samsung.co.kr
internet address = 203.241.132.34
ttl = 299 (4m59s)
red.samsung.co.kr
internet address = 203.241.135.130
ttl = 54764 (15h12m44s)
green.samsung.co.kr
internet address = 203.241.135.135
ttl = 48523 (13h28m43s)
Looks good here!
% dig +norec mx samsung.co.kr. @a.root-servers.net.
[...snip...]
[pick one of the name servers listed, eg. ns.krnic.net]
% dig +norec mx samsung.co.kr. @ns.krnic.net.
[...snip...]
[pick one of the name servers listed, eg. nic.samsung.co.kr]
% dig +norec mx samsung.co.kr. @nic.samsung.co.kr.
; <<>> DiG 8.3 <<>> +norec mx samsung.co.kr. @nic.samsung.co.kr.
; (1 server found)
;; res options: init defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46685
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; QUERY SECTION:
;; samsung.co.kr, type = MX, class = IN
;; ANSWER SECTION:
samsung.co.kr. 1D IN MX 0 imail00.samsung.co.kr.
;; ADDITIONAL SECTION:
imail00.samsung.co.kr. 1D IN A 203.254.197.70
;; Total query time: 5900 msec
;; FROM: hypatia.dns.net to SERVER: nic.samsung.co.kr. 203.241.132.34
;; WHEN: Sat Nov 10 11:07:26 2001
;; MSG SIZE sent: 31 rcvd: 71
This answer is non-authoritative, since there is no `aa' (authoritative
answer) flag. NIC.SAMSUNG.CO.KR has been delegated to in the parent
zone CO.KR as a server for the zone SAMSUNG.CO.KR. Since it answers
non-authoritatively for queries for records in that zone, it is called
a lame server.
More seriously, each of the three delegated servers was lame when I
ran these queries. Most commonly this is due to syntax errors in the
zone. In my experience if a zone is subject to this kind of problem,
the non-authoritative status of answers will intermittently reappear.
In some cases, SERVFAIL could result if the zone were truly broken.
Right now at least the servers are responding, albeit non-authoritatively.
You could check your mail configuration: perhaps your mailer is requiring
authoritative answers when doing DNS lookups (something like the AAONLY
resolver flag). Changing the configuration could allow lookups to
succeed even with this degree of brokenness.
Unrelated, for BIND gurus: Cricket's DNS & BIND (4th edition), p.476
says that RES_AAONLY flag has not been implemented in either the BIND
resolver or name server. Is this still true?
-- Andras Salamon and...@dns.net
From BIND 8.2.5's res_debug.c:
res_debug.c: case RES_AAONLY: return "aaonly(unimpl)";
cricket
Men & Mice
DNS Software & Services
www.menandmice.com
Attend our next DNS and BIND class! See
http://www.menandmice.com/8000/8000_dns_training.html
for the schedule and to register for upcoming classes
> If decreasing the time for the negative cache is not a good thing, any ideas
> of what I can do to help us resolve the domain names. We have mail that sits
> in the queues. We have a current db.cache file. Could the problems be
> Internet Latency?
What sites is it ?
They might be ok, then you should continue adjusting your side.
If the sites are faulty then the correction should be done at
their side.
> -RCE
--
Peter HÃ¥kanson
IPSec Sverige (At the Riverside of Gothenburg, home of Volvo)
Sorry about my e-mail address, but i'm trying to keep spam out.
Remove "icke-reklam"and "invalid" and it works.