I have a question regarding TTLs, we are seeing discrepancy between our
NS TTL values (3600) and the parent server TTL values (172800).
For example,
http://www.dnsreport.com/tools/dnsreport.ch?domain=smokingpipes.com
shows the error (halfway down the page). This same error shows for
Network Solutions
http://www.dnsreport.com/tools/dnsreport.ch?domain=networksolutions.com
(independent example).
What problems will this cause? How do I fix it?
Thanks!
- Craig
Thanks for the info, I will make the changes. I would rather have my
TTLs less than 48 hours (172800) but apparently that is not possible.
Can you expand a little more so I understand why?
Craig
It's not the discrepancy that is bad, it's the low TTL on NS ( and a records
fro these) that causes delays and timeouts.
The cure is simple, make them have a larger TTL, start with a day.
> Thanks!
> - Craig
--
Peter Håkanson
IPSec Sverige ( At Gothenburg Riverside )
Sorry about my e-mail address, but i'm trying to keep spam out,
remove "icke-reklam" if you feel for mailing me. Thanx.
Ray
"Craig Isdahl" <cr...@dashsystems.com> wrote in message
news:c0nfhq$30fj$1...@sf1.isc.org...
This is what appears to be a recently discovered problem. With .com
and .net domains, there is always a TTL of 172000 (48 hours) for all
NS records served by the parent servers. Normally, the NS records for
a domain appear in both the zone parent's zone file as well as at the
primary DNS server for the domain. So the NS records for example.com
may appear at bother X.gtld-servers.net (the .com parent servers) and
at nsX.example.com (the example.com DNS servers).
The problem occurs when the TTL varies between the two. In theory,
the two records should be identical. But if the TTL varies *and* is
returned by the domain's DNS servers, a DNS resolver will see both a
TTL of 172800 (from X.gtld-servers.net) and perhaps 3600 (from
nsX.example.com). As a result, the DNS resolver is required to trust
the results from nsX.example.com. Therefore, it is going to expire
the record in 3600 seconds.
The first problem is that it is rude. The .com/.net parent servers
give out 48 hour TTLs for a reason. Forcing DNS resolvers to go back
before 48 hours is up causes extra load on the .com/.net parent
servers.
The second problem is much bigger. Let's say that you have an 1800
second TTL on the A records for ns1.example.com and ns2.example.com,
the 2 NS records for example.com. Let's say you try to get the MX
record for example.com after the 1800 seconds has passed, but before
the 3600 second TTL you gave to your NS record. If this happens, the
DNS resolver knows to go to ns1.example.com and ns2.example.com, but
it now can't get to them. The problem is that to get the A record for
ns1.example.com and ns2.example.com, the DNS resolver must go to the
NS records for example.com -- but, it can't get to them without the A
record, and you're stuck in a loop.
This has been known to cause serious problems, specifically with
mailservers, that end up bouncing all E-mail destined to a domain. I
do not yet know which DNS servers this applies to (although I do know
that at least one version of BIND will do this), and whether an
RFC-compliant DNS server will or will not do this.
It seems that the real problem is with NS A records that have a TTL
that differs from the NS records. But, if there is a NS TTL
discrepancy, there is likely a TTL difference between the NS record
and the NS's A record.
Again, this appears to be a recently discovered issue, and delves into
the depths of DNS that few people venture into, so there isn't much
information about it yet.
-Scott
> It seems that the real problem is with NS A records that have a TTL
> that differs from the NS records. But, if there is a NS TTL
> discrepancy, there is likely a TTL difference between the NS record
> and the NS's A record.
Are you writing that if my NS records and A records for ns1.exmaple.com
have the same TTL I'm okay in spite of what dnsreport says?
Or am I "stuck" with using 172800 for nameservers even just before the
very occasional move of a nameserver to a different IP#?
I changed my nameservers' TTLs down to 600 a few months ago before a
move and didn't ever move them back <frown>.
I'm going to change them back to 172800 now that this thread has brought
the problem to my attention (yes; I agree I was rude to leave them that
way, but I thought I should move them before a move, to expect fastest
resolution afterwards).
I'll change both the A and NS record TTLs, as it looks like you're
saying that's the problem.
> Again, this appears to be a recently discovered issue, and delves
> into the depths of DNS that few people venture into, so there isn't
> much information about it yet.
I hope that once the issue is better understood someone will post a
complete explanation here.
Jeff
--
Jeff Lasman, nobaloney.net, P. O. Box 52672, Riverside, CA 92517 US
Professional Internet Services & Support / Consulting / Colocation
Our blists address used on lists is for list email only
Phone +1 909 324-9706, or see: "http://www.nobaloney.net/contactus.html"
> On Sunday 15 February 2004 06:44 pm, R. Scott Perry wrote:
>
> > It seems that the real problem is with NS A records that have a TTL
> > that differs from the NS records. But, if there is a NS TTL
> > discrepancy, there is likely a TTL difference between the NS record
> > and the NS's A record.
>
> Are you writing that if my NS records and A records for ns1.exmaple.com
> have the same TTL I'm okay in spite of what dnsreport says?
Yes. Hardly anyone uses the same TTLs as the TLD servers do, but they
usually have consistent TTLs within their zones. The Internet hasn't
come to a screaching halt yet.
--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
It's not recently discovered, and it's not a problem.
RSP> [...] If this happens, the DNS resolver knows to go to
RSP> ns1.example.com and ns2.example.com, but it now can't get
RSP> to them. The problem is that to get the A record for
RSP> ns1.example.com and ns2.example.com, the DNS resolver must
RSP> go to the NS records for example.com -- but, it can't get
RSP> to them without the A record, and you're stuck in a loop.
This is why we have "additional" section processing, "glue" resource record
sets, and fallback to the nearest enclosing superdomain whose content DNS
servers are known. Far from being recently discovered, this chicken-and-egg
problem was addressed in RFC 1034.
If the glue A records time out of the cache before the NS records do,
the chicken-and-egg problem returns. So you should ensure that the TTLs
on your nameservers' A records are at least as long as the TTLs on the
NS records.
--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
Resolvers just have to detect this situation and ask the parent
server for the missing glue. This works until the child changes
the nameservers w/o informing the parent then you get a broken
delegation.
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org
> > If the glue A records time out of the cache before the NS records do,
> > the chicken-and-egg problem returns. So you should ensure that the TTLs
> > on your nameservers' A records are at least as long as the TTLs on the
> > NS records.
>
> Resolvers just have to detect this situation and ask the parent
> server for the missing glue.
Does BIND do this? I was under the impression it doesn't -- I've seen
plenty of times when a domain couldn't be resolved and it appeared to be
because of this situation. So I assume that when it's trying to resolve
the hostnames in the NS records, it simply uses the standard resolution
algorithm, and doesn't treat this loop as a special case.
Mark
Does 'It' means any bind8, or bind8.3.4 or any bind9 or 9.2.3 or this is
not version depended?.
Once it found the glue from the parent it is not going to provide to the
recursive client anyway, isn't it?
As Mark said earlier, glue is not considered as an answer, so there will
have to be another step which will verify with the authoritative
servers, that provided glue is correct am I right?
What happen if the walk-back is unsuccessful?
Will bind ever provide the NS records, without fetching their
corresponding A records?
What if corresponding A record doesn't exist, let's say because of
mistake or connectivity problems?
Ladislav
BM> If the glue A records time out of the cache before the NS
BM> records do, the chicken-and-egg problem returns.
In this situation, one falls back to the nearest enclosing superdomain whose
content DNS servers are known (i.e. for which both halves of the delegation
information are known).
BIND's particular problem with this, very probably what you report having
observed (since it is quite commonly triggered), is not a chicken-and-egg
problem. It is an interaction with BIND's "credibility" rules, and it happens
when there's a mismatch between the delegation information published by the
superdomain content DNS servers and the delegation information published by
the subdomain content DNS servers. (In detail: The situation arises when the
first half of the delegation, published by the subdomain content DNS servers,
cannot be matched up with the second half of the delegation, published by the
superdomain content DNS servers, to form the actual delegation; i.e. where the
superdomain content DNS servers publish "glue" resource records that don't
match any of the "NS" resource records that the subdomain content DNS servers
publish.) BIND gets itself out of the chicken-and-egg situation by querying
the superdomain content DNS servers only to put itself right back into it by
refusing to believe the new delegation information that it has just received
because it isn't "credible" enough.
Other resolving proxy DNS server softwares, without BIND's "credibility"
rules, don't have this particular problem.