Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

HELP! Resolving Problems

2 views
Skip to first unread message

David Meier

unread,
Jun 25, 2003, 8:44:53 AM6/25/03
to
Hi all,

I have the nice job do debug a DNS server on OpenVMS. First I do not
know much about DNS itself and secondly I am not a great VMS crack.
Anyway, here's the problem:

We have a leased line and are running our own DNS servers. 99.9% of
the domains can be resolved, however a few can't and they DO exist and
they ARE up and running. The problem usually occurs when some of our
clients try to send emails to external domains through our servers.
Sendmail complains that they are unknown hosts. However, it can not be
a sendmail problem since pinging and surfing to the unresolved
addresses fails as well.

Our leased line provider sais that our domain servers are visible from
all over the world (how ever they checked that) and therefore it
cannot be a routing problem of theirs. They say it must be either our
router or DNS servers itself.

Please, any input of tracking down this issue is GREATLY appreciated.

p...@icke-reklam.ipsec.nu

unread,
Jun 25, 2003, 4:34:31 PM6/25/03
to

Tell us domain and ipaddress of nameservers and we _might_ be able to help.

--
Peter HÃ¥kanson
IPSec Sverige ( At Gothenburg Riverside )
Sorry about my e-mail address, but i'm trying to keep spam out,
remove "icke-reklam" if you feel for mailing me. Thanx.

David Meier

unread,
Jun 26, 2003, 2:07:37 AM6/26/03
to
p...@icke-reklam.ipsec.nu wrote in message news:<bdd1f0$15h$1...@sf1.isc.org>...

> David Meier <me...@logmail.net> wrote:
> > Hi all,
>
> > I have the nice job do debug a DNS server on OpenVMS. First I do not
> > know much about DNS itself and secondly I am not a great VMS crack.
> > Anyway, here's the problem:
>
> > We have a leased line and are running our own DNS servers. 99.9% of
> > the domains can be resolved, however a few can't and they DO exist and
> > they ARE up and running. The problem usually occurs when some of our
> > clients try to send emails to external domains through our servers.
> > Sendmail complains that they are unknown hosts. However, it can not be
> > a sendmail problem since pinging and surfing to the unresolved
> > addresses fails as well.
>
> > Our leased line provider sais that our domain servers are visible from
> > all over the world (how ever they checked that) and therefore it
> > cannot be a routing problem of theirs. They say it must be either our
> > router or DNS servers itself.
>
> > Please, any input of tracking down this issue is GREATLY appreciated.
>
> Tell us domain and ipaddress of nameservers and we _might_ be able to help.

Here they are:

DNS1: asaxp1.actiris.ch, 195.141.214.34
DNS2: asaxp2.actiris.ch, 195.141.214.35

We also tracked down the problem some more: Some mailserver in our
IP-range comes with it's own DNS installed and this one can resolve
those domains. Therefore it must be a problem of DNS1 and DNS2.

Simon Waters

unread,
Jun 26, 2003, 2:38:36 PM6/26/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

diDavid Meier wrote:
>
>>Tell us domain and ipaddress of nameservers and we _might_ be able to
help.
>
>
> Here they are:
>
> DNS1: asaxp1.actiris.ch, 195.141.214.34
> DNS2: asaxp2.actiris.ch, 195.141.214.35

I think he meant the domains that you can't look up, not the domain name
of the DNS servers in question?!

Hmm, you sure it is a BIND variant, as it doesn't smell like BIND, and I
already found a bug in your nameserver implementation (try 'dig . ns'),
which admittedly doesn't rule out BIND --- but I'm NOT that good, and
recent versions of BIND aren't THAT bad.

It is commonly recommended not to serve domains from the same servers as
are doing your recursive look ups - if that gives you a good excuse to
implement a nice shiny new BIND, on a couple of old PC's running an OS
you do know ;-)

-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE++z2qGFXfHI9FVgYRAneNAKCHD5WtlzLA47N/XRSOHNK176dyHwCgij3Y
ZvxZZoTdANoKHpk8Pv6nMaQ=
=lNK4
-----END PGP SIGNATURE-----


David Meier

unread,
Jun 26, 2003, 5:12:39 PM6/26/03
to
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> diDavid Meier wrote:
>>
>>> Tell us domain and ipaddress of nameservers and we _might_ be able
>>> to help.
>>
>>
>> Here they are:
>>
>> DNS1: asaxp1.actiris.ch, 195.141.214.34
>> DNS2: asaxp2.actiris.ch, 195.141.214.35
>
> I think he meant the domains that you can't look up, not the domain
> name of the DNS servers in question?!

OK, I outed myself as a newbie... We have problems looking up pwr.ag or
klaeui.com for example. I have also someone pointing me out it may have
something to do with 'glueless delegation'. I printed out a few tutorials on
that matter which I am going to read now...

>
> Hmm, you sure it is a BIND variant, as it doesn't smell like BIND,
> and I already found a bug

Which one?

> in your nameserver implementation (try
> 'dig . ns'), which admittedly doesn't rule out BIND --- but I'm NOT
> that good, and recent versions of BIND aren't THAT bad.
>

It is an old version of bind. Being the VMS crack as I am I wasn't able to
find out which one.

> It is commonly recommended not to serve domains from the same
> servers as are doing your recursive look ups - if that gives you a
> good excuse to implement a nice shiny new BIND, on a couple of old
> PC's running an OS you do know ;-)
>

I think you got the point. Still I would like to know what's wrong about our
DNS's even if it's only for curiosity.

Simon Waters

unread,
Jun 27, 2003, 5:55:29 AM6/27/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Meier wrote:
>
> We have problems looking up pwr.ag or
> klaeui.com for example. I have also someone pointing me out it may have
> something to do with 'glueless delegation'. I printed out a few
tutorials on
> that matter which I am going to read now...

Hmm, weird.

pwr.ag fails because... eek it is complex... I think it may be failing
because queries for ns1.nameserver.ch to the ascio.com nameservers
return no authority records (even though they are supposedly
authoritative)!?

klaeui.com has a complex delegation which might also be causing problems
(BIND only looks so far before deciding the answer is deliberately
obtuse!). Does this problem go away if you restart the nameserver?


-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+/BSPGFXfHI9FVgYRAtYgAJsFao2p1et7MsxD60uPHwyY9WBZQgCgwgw/
J9BeUoyRL2tlPz9pWzN2i6U=
=j+zv
-----END PGP SIGNATURE-----


David Meier

unread,
Jun 27, 2003, 8:02:37 AM6/27/03
to
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David Meier wrote:
>>
>> We have problems looking up pwr.ag or
>> klaeui.com for example. I have also someone pointing me out it may
>> have something to do with 'glueless delegation'. I printed out a
>> few tutorials on
>> that matter which I am going to read now...
>
> Hmm, weird.
>
> pwr.ag fails because... eek it is complex... I think it may be
> failing because queries for ns1.nameserver.ch to the ascio.com
> nameservers return no authority records (even though they are
> supposedly authoritative)!?
>
> klaeui.com has a complex delegation which might also be causing
> problems (BIND only looks so far before deciding the answer is
> deliberately obtuse!). Does this problem go away if you restart the
> nameserver? -----BEGIN PGP SIGNATURE-----

Restarting the name servers does not help. What are the tools to track down such problems or the delegation path?

> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQE+/BSPGFXfHI9FVgYRAtYgAJsFao2p1et7MsxD60uPHwyY9WBZQgCgwgw/
> J9BeUoyRL2tlPz9pWzN2i6U=
> =j+zv
> -----END PGP SIGNATURE-----
>
>
>

David Meier
LOGNET AG
Studbachstrasse 13c
CH-8340 Hinwil
Phone: +41-1-938-8032
Fax: +41-1-938-8039

Jeff Stevens

unread,
Jun 27, 2003, 10:02:43 AM6/27/03
to
On 6/26/2003 1:38 PM, Simon Waters wrote:

> It is commonly recommended not to serve domains from the same servers as
> are doing your recursive look ups

You mean a physically different machine, and not just another BIND
server on the same machine?
--
Jeffrey Stevens

Simon Waters

unread,
Jun 27, 2003, 10:11:56 AM6/27/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Meier wrote:
>
> Restarting the name servers does not help. What are the tools to
> track down such problems or the delegation path?

So far I mainly used "dig".

I use "doc" which is a script by various DNS people, currently
maintained by Brad Knowles. Several good websites are around which will
also check domains.

However tools aren't always as helpful as they might seem.

For example "doc" complains about the delegation of pwr.ag, but in fact
this is just a general moan at ULTRADNS.NET whose servers produce
responses that don't look like BIND's, but which ought to work (at least
one would hope so given how much of the DNS they serve).

Curiously the servers failing to return NS records, are running
PowerDNS, and not BIND. I believe BIND wouldn't load a zone file that
broken, at least not BIND 9.

I don't know of any tools that point out over complicated DNS
delegation, but maybe the folks at menandmice.com, or Dan
(http://cr.yp.to/), have something suitable?

Generally I assume when I'm starting to get lost as to which servers
serve which domain at which point, the delegation has got too complex.
It might not be so complex as to defeat the nameservers, but it usually
defeats the administrators a long time before that!
-----BEGIN PGP SIGNATURE-----


Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+/FCpGFXfHI9FVgYRAllVAJ9J2A4BfdjwHLssFeg0lhA3EnYVDACfYVDZ
Z9ZgAH1YRXyx8zMwAWybrbA=
=rLYr
-----END PGP SIGNATURE-----


David Meier

unread,
Jun 27, 2003, 10:57:38 AM6/27/03
to
Thanks a lot, Simon. This will get me started. I will dive into the world of DNS this weekend.

David Meier

Jonathan de Boyne Pollard

unread,
Jun 27, 2003, 8:09:37 PM6/27/03
to
SW> I think it may be failing because queries for
SW> ns1.nameserver.ch to the ascio.com nameservers
SW> return no authority records [...]

The "nameserver.ch." content DNS servers don't add the "NS" resource record
set for "nameserver.ch." to all of their responses, true, but this is
relatively benign. It merely means that the "ch." content DNS servers will
have to be queried afresh every 12 hours.

There is certainly a failure here (albeit that it's possibly not the one that
David Meier is experiencing - he hasn't given us enough information to
determine this). However, it is not related to the "nameserver.ch." content
DNS servers.

SW> klaeui.com has a complex delegation which might also
SW> be causing problems

Actually, what is happening with "klaeui.com." involves just two queries, one
to the root servers and one to the UltraDNS servers. The complex delegation
doesn't actually enter into it at all.

The failure here (with both "klaeui.com." and "pwr.ag.") is a combination of
an over-optimisation in some versions of Sendmail, the fact that BIND passes
through results with the AA bit set to 1 the first time around, and an
outright error in the way that the UltraDNS servers work. Its aetiology is as
follows:

1. To avoid a BIND version 4 bug in the resolution of "CNAME" queries, most
SMTP Client softwares perform an "*" query rather than a "CNAME" query when
canonicalising the domain part of an envelope recipient mailbox, filtering out
from the result those resource records of the type that it actually wanted.
Effectively, a "CNAME" lookup is disguised as a "*" lookup.

However, Sendmail in particular (and, indeed, only certain versions of
Sendmail) _also_ does this in place of issuing explicit "MX" and "A" queries
(even though the BIND version 4 bug does not affect "MX" or "A" queries, only
"CNAME" queries) if it thinks that the response to the first "*" query wasn't
cached along the way. It assumes that "any" means "all" in such
circumstances.

(Not all versions of Sendmail try to optimise their DNS query traffic by doing
this. Also, other MTS softwares, such as "qmail", do not do this. And it's
certainly debatable whether this is a reasonable optimisation to be doing in
the first place, given that best practice is for an SMTP Client to have a
local caching proxy DNS server anyway.)

2. The process of query resolution for an "*" query for "pwr.ag." stops at
the "ag." content DNS servers. In response to other types of queries, the
"ag." content DNS servers return a partial answer comprising a referral for
"pwr.ag.", as expected.

[204.74.112.1:0035] -> [0.0.0.0:0000] 73
Header: 0002 1+0+2+0, R, , query, no_error
Question: pwr.ag. IN A
Authority: pwr.ag. IN NS 86400 ns2.namecenter.ch.
Authority: pwr.ag. IN NS 86400 ns1.namecenter.ch.

However, in response to an "*" query they instead return a complete answer
(i.e. one that doesn't end in a referral) - but one where the relevant
resource record sets ("MX" and "A") are erroneously empty.

[204.74.112.1:0035] -> [0.0.0.0:0000] 123
Header: 0002 1+2+2+0, R, AUTH, query, no_error
Question: pwr.ag. IN *
Answer: pwr.ag. IN NS 86400 ns2.namecenter.ch.
Answer: pwr.ag. IN NS 86400 ns1.namecenter.ch.
Authority: ag. IN NS 86400 TLD2.ULTRADNS.NET.
Authority: ag. IN NS 86400 TLD1.ULTRADNS.NET.

Given that the "*" query will be the first one made, it will be unlikely that
any "pwr.ag." delegation information is already cached. So query resolution
will not reach the "pwr.ag." content DNS servers at all, and will instead stop
at the "ag." content DNS servers when they return that complete answer.

There's a strong argument that the "ag." content DNS servers (run by UltraDNS)
are wrong here. Certainly, a complete answer is not the response that would
be generated by following the algorithm in RFC 1034 section 4.3.2.

3. If this is the first time that the "*" query was made, as it will be in
these particular circumstances, BIND passes through the response leaving the
AA bit set to 1. Sendmail takes this to mean that the response wasn't cached
along the way, and so re-uses the response when performing the "MX" and "A"
lookups, filtering it for resource records of the desired types, instead of
making further queries.

4. The assumptions that Sendmail is making thus break. It is assuming that if
the AA bit is set to 1 in the response, "any" will have really meant "all".
But that's not true in these circumstances.

When Sendmail filters the result of the "*" query looking for "MX" and "A"
resource record sets it finds no resource records of those types. Its
disguised "MX" and "A" lookups thus return empty resource record sets, and it
thus complains that it cannot transport mail addressed to "pwr.ag." and
"klaeui.com." mailboxes.

Note that, as mentioned, this problem relies upon a set of subtle interactions
between a specific combination of softwares. Change any one of them and the
problem goes away:

* Change Sendmail to some other MTA (or change to an appropriate version of
Sendmail), and the assumption that a "passed through" response (i.e. with the
AA bit set to 1) to an "any" query actually contains "all" records goes away.
Other MTAs instead explicitly issue "MX" and "A" queries, which will be
properly resolved because the UltraDNS servers correctly return referrals in
response to "MX" and "A" queries.

* Change BIND to some other proxy DNS server software, and "passed through"
responses from the proxy DNS server go away. ("dnscache" always sets the AA
bit to 0, for example.) Sendmail will thus never assume that "any" has in
fact meant "all", and will thus explicitly issue "MX" and "A" queries rather
than using the result from an "*" query.

* Fix the broken UltraDNS servers so that they always hand out referrals when
appropriate, _even when_ the query type is "*", and query resolution always
ends by asking the "pwr.ag." content DNS servers themselves. They _do_ return
the "MX" and "A" resource record sets in the response to an "*" query, and so
Sendmail's optimisation will happen to work.

Jonathan de Boyne Pollard

unread,
Jun 28, 2003, 7:27:21 AM6/28/03
to
SW> I don't know of any tools that point out over complicated DNS
SW> delegation, but maybe the folks at menandmice.com, or Dan
SW> (http://cr.yp.to/), have something suitable?

Dan Bernstein's "dnstrace" will show all of the possible paths that may be
followed to resolve a query, but one has to sit down, read, and decode the
output, the format of which isn't actually documented.

<URL:http://cr.yp.to/djbdns/debugging.html>

Moreover, one has to come up with one's own definition of what "over
complicated" is. However, the actual _amount_ of output is, of course, a
rough guide to the amount of gluelessness involved.

Mark_A...@isc.org

unread,
Jun 30, 2003, 7:38:40 PM6/30/03
to

To be precise the "bug" was independent of query type. If
there was a error loading the zone named would return
SERVFAIL for negative answers. "*" queries just returned
what was available so unless you had a bad domain you wouldn't
get a negative answer.

Most mail domains have A and/or MX records. Very few had
CNAMES. The problem was described as a problem with CNAMES.

Note there was never a need for sendmail to issue the CNAME
query. Standard DNS processing would have returned the
CNAMES if they existed to MX/A queries.

Nor that requires by RFC 2308.

Upgrade to BIND 9 or the next BIND 8 release (aa was incorrectly
being preserved).

> * Fix the broken UltraDNS servers so that they always hand out referrals when
> appropriate, _even when_ the query type is "*", and query resolution always
> ends by asking the "pwr.ag." content DNS servers themselves. They _do_ retur
> n
> the "MX" and "A" resource record sets in the response to an "*" query, and so
> Sendmail's optimisation will happen to work.

--
Mark Andrews, Internet Software Consortium
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark.A...@isc.org

Simon Waters

unread,
Jun 30, 2003, 7:50:01 PM6/30/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jonathan de Boyne Pollard wrote:
> SW> I think it may be failing because queries for
> SW> ns1.nameserver.ch to the ascio.com nameservers
> SW> return no authority records [...]
>
> The "nameserver.ch." content DNS servers don't add the "NS" resource
record
> set for "nameserver.ch." to all of their responses, true, but this is
> relatively benign. It merely means that the "ch." content DNS servers
will
> have to be queried afresh every 12 hours.

I think it is more catastrophic for some versions of BIND 9, which
assume if a nameserver says "it is authoritative and no nameservers"
exist for a domain, then it is believed, despite the obvious contradiction.

> he hasn't given us enough information to
> determine this). However, it is not related to the "nameserver.ch."
content
> DNS servers.

I'm sure he has supplied enough information, he gave us his recursive
server IP, which can be seen to know that pwr.ag is served by
ns[12].namecenter.ch, but if you ask it to get ns1.namecenter.ch IP
address it gives SERVFAIL, which I'm pretty sure brings us back to the
answer I gave before, that the answers for the question "what is the IP
of ns1.namecenter.ch" gives a corrupt answer.

> Given that the "*" query will be the first one made, it will be
unlikely that
> any "pwr.ag." delegation information is already cached.

Urm it is cached, just query the server, but it is incomplete.


-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/AMynGFXfHI9FVgYRAjUtAJwPeILgQe8Ro0DmNIjrgOFBUOn8ygCgtSSz
7dayMlaMpFhxPluyhi7xJZk=
=vaWj
-----END PGP SIGNATURE-----


Mark_A...@isc.org

unread,
Jun 30, 2003, 8:32:16 PM6/30/03
to

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jonathan de Boyne Pollard wrote:
> > SW> I think it may be failing because queries for
> > SW> ns1.nameserver.ch to the ascio.com nameservers
> > SW> return no authority records [...]
> >
> > The "nameserver.ch." content DNS servers don't add the "NS" resource
> record
> > set for "nameserver.ch." to all of their responses, true, but this is
> > relatively benign. It merely means that the "ch." content DNS servers
> will
> > have to be queried afresh every 12 hours.
>
> I think it is more catastrophic for some versions of BIND 9, which
> assume if a nameserver says "it is authoritative and no nameservers"
> exist for a domain, then it is believed, despite the obvious contradiction.

No version of named depends upon the authoritative servers for the
zone returning NS records for the zone in the authority section. They
will be cached, used and preferred if returned.

You may be confusing this with named not querying servers for which
it has received a NXDOMAIN for. This happens when *only* glue
address records are added and not the real records the glue records
are supposed to be copies of.

Prior to adding IPv6 support this sort of error was not highly visible.
Named looks for missing glue and as the parent has not returned glue
AAAA records (as they don't exist). The NXDOMAIN response is cached
and the nameserver is not tried until the cache expires.

Usually the first query to the zone succeeds and subsequent ones fail.

> > he hasn't given us enough information to
> > determine this). However, it is not related to the "nameserver.ch."
> content
> > DNS servers.
>
> I'm sure he has supplied enough information, he gave us his recursive
> server IP, which can be seen to know that pwr.ag is served by
> ns[12].namecenter.ch, but if you ask it to get ns1.namecenter.ch IP
> address it gives SERVFAIL, which I'm pretty sure brings us back to the
> answer I gave before, that the answers for the question "what is the IP
> of ns1.namecenter.ch" gives a corrupt answer.
>
> > Given that the "*" query will be the first one made, it will be
> unlikely that
> > any "pwr.ag." delegation information is already cached.
>
> Urm it is cached, just query the server, but it is incomplete.
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQE/AMynGFXfHI9FVgYRAjUtAJwPeILgQe8Ro0DmNIjrgOFBUOn8ygCgtSSz
> 7dayMlaMpFhxPluyhi7xJZk=
> =vaWj
> -----END PGP SIGNATURE-----
>
>

0 new messages