Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Large Zone transfer from Bind 8/9 - W2K

341 views
Skip to first unread message

Jeffery Jones

unread,
Sep 1, 2002, 12:32:41 AM9/1/02
to


I have a zone file that contains much more data than the others;
it's about 190Kbytes. When transferring from Bind 8.33 or 9.2.2rc1
on W2K, all zones transfer to the secondary fine except for the large
zone file where the transfer fails:

Aug 28 11:59:55.362 transfer of 'example.com/IN' from 192.168.16.3#53:
failed while receiving responses: end of file
Aug 28 11:59:55.362 transfer of 'example.com/IN' from 192.168.16.3#53:
end of transfer

This happens on both 8.33 and 9.2.2rc1. This matches the
description in the previous message

http://groups.google.com/groups?selm=a2ngn4%24sa9%40pub3.rc.vix.com&oe=UTF-8&output=gplain


DIGging the AXFER results in a similar message -

;; communications error to 192.168.16.3#53: end of file

after about 64K of data has been transferred.

Although this is a potential solution to a different problem, I tried
the "transfer-format one-answer" statement, but this had no effect.

Any ideas?


Me

unread,
Sep 2, 2002, 12:16:51 AM9/2/02
to
According to the notes, this issue was corrected in Windows 2000 Service
Pack 3. Out of curiosity, how many entries are in that file (approximately)?

Ray

"Jeffery Jones" <jeffe...@altavista.net> wrote in message
news:aks599$4g2f$1...@isrv4.isc.org...

Michael Niksch

unread,
Sep 2, 2002, 10:56:04 AM9/2/02
to

I once had to set increased values for the max-transfer-time-in (slave)
and max-transfer-time-out (master) options, because I had to transfer a
large zone file across a slow line. Note that an AXFR for a dynamic
zone will also stop if a change is applied on the master.

--
Michael Niksch /Zurich/IBM @ IBMCH
IBM Zurich Research Laboratory n...@zurich.ibm.com
Saeumerstrasse 4 http://www.zurich.ibm.com/~nik/
CH-8803 Rueschlikon / Switzerland P: +41-1-724-8913 F: +41-1-724-8080

Jeffery Jones

unread,
Sep 3, 2002, 12:20:24 AM9/3/02
to

On 2 Sep 2002 04:16:51 -0000, "Me" <repl...@newsgroup.only> wrote:

>According to the notes, this issue was corrected in Windows 2000 Service
>Pack 3. Out of curiosity, how many entries are in that file (approximately)?
>

The system is running Windows 2000 SP3, but I believe that fixed
only the Windows 2000 DNS. I am running Bind 8 and Bind 9 on Windows
NT.

I have about 4500 dns entries - it's simply a matter of choice; a
very few entries require that the forward and reverse DNS exist and
match. So I auto generated everything so they would all match. I
think I'll need to rethink this. I wonder if $GENERATE creates a
smaller zone transfer ... I'll need to experiment.

Jeffery Jones

unread,
Sep 3, 2002, 12:28:45 AM9/3/02
to

On 2 Sep 2002 14:56:04 -0000, "Michael Niksch" <n...@zurich.ibm.com>
wrote:

>
>I once had to set increased values for the max-transfer-time-in (slave)
>and max-transfer-time-out (master) options, because I had to transfer a
>large zone file across a slow line. Note that an AXFR for a dynamic
>zone will also stop if a change is applied on the master.

That gave me an idea; I tried running the AXFR directly on the
server from 127.0.0.1. There was no error, but only 62K of data was
transferred. This must be a quirk with the WIN32 implementation and
how it handles chunks larger than 64K.

Simon Waters

unread,
Sep 3, 2002, 3:32:25 AM9/3/02
to

Jeffery Jones wrote:
>
> I wonder if $GENERATE creates a
> smaller zone transfer ... I'll need to experiment.

No, $GENERATE literally generates the zone locally on the server
loading the zone, it isn't part of the DNS protocol, so the
directive itself isn't sent, just the generated data.

As regards the original problem I try not to do W2K, do you get
any messages on loading the zone? Are the usual tools happy
(named-checkzone)?

Jeffery Jones

unread,
Sep 3, 2002, 7:25:41 PM9/3/02
to

On 3 Sep 2002 07:32:25 -0000, Simon Waters
<Si...@wretched.demon.co.uk> wrote:

>As regards the original problem I try not to do W2K, do you get
>any messages on loading the zone? Are the usual tools happy
>(named-checkzone)?

The zone loads fine otherwise with no warnings. If I manually
copy the zone file to the backup it loads OK. named-checkzone says
it's ok also.


N/A

unread,
Sep 4, 2002, 4:15:43 AM9/4/02
to
This talks about the problem on NT, but it seems to claim that the limit is
16K and the fix lets you get to 64K:
http://support.microsoft.com/default.aspx?scid=kb;en-us;Q302639

Ray

"Jeffery Jones" <jeffe...@altavista.net> wrote in message

news:al1da8$8914$1...@isrv4.isc.org...

Danny Mayer

unread,
Sep 4, 2002, 8:54:48 AM9/4/02
to

At 06:39 PM 8/31/02, Jeffery Jones wrote:

> I have a zone file that contains much more data than the others;
>it's about 190Kbytes. When transferring from Bind 8.33 or 9.2.2rc1
>on W2K, all zones transfer to the secondary fine except for the large
>zone file where the transfer fails:
>
>Aug 28 11:59:55.362 transfer of 'example.com/IN' from 192.168.16.3#53:
>failed while receiving responses: end of file
>Aug 28 11:59:55.362 transfer of 'example.com/IN' from 192.168.16.3#53:
>end of transfer
>
> This happens on both 8.33 and 9.2.2rc1. This matches the
>description in the previous message
>
>http://groups.google.com/groups?selm=a2ngn4%24sa9%40pub3.rc.vix.com&oe=UTF-8&output=gplain
>
>
> DIGging the AXFER results in a similar message -
>
> ;; communications error to 192.168.16.3#53: end of file
>
> after about 64K of data has been transferred.
>
> Although this is a potential solution to a different problem, I tried
>the "transfer-format one-answer" statement, but this had no effect.
>
> Any ideas?

On NT the master (where you are transferring from) needs to be a BIND 9
server. The problem only occurs on BIND 8 masters on NT. I just transferred
a zone with over 30,000 records (requiring 23 messages, according to dig)
to do the transfer. That's well over the 64K data limit. What's the master
really running? Check your application event log for the version number.
Don't rely on any other method.

Danny


Jeffery Jones

unread,
Sep 4, 2002, 8:50:03 PM9/4/02
to

On 4 Sep 2002 12:54:48 -0000, Danny Mayer <ma...@gis.net> wrote:

>On NT the master (where you are transferring from) needs to be a BIND 9
>server. The problem only occurs on BIND 8 masters on NT. I just transferred
>a zone with over 30,000 records (requiring 23 messages, according to dig)
>to do the transfer. That's well over the 64K data limit. What's the master
>really running? Check your application event log for the version number.
>Don't rely on any other method.

I can see what's happening now, the diagnosis was obscured by 2
different problems. The original problem occurred while
transferring from a Bind 8 Master to a Bind 9 Slave. Before posting
here, I checked the Bind 9 as a master, but I did it from a dialup
line. When it failed, I assumed it had the same problem as Bind 8.
As Michael Niksch pointed out, transferring over a slow line also
requires increasing the max-transfer-time-out setting for a large
zone.

I have since gone back and confirmed that the large zone transfers
properly from a Bind 9 master at T-1 speeds. I am unable to upgrade
the Bind 8.33 master to a Bind 9.2.0 master because it eventually
stops resolving external zones under Win32. Does the current beta
release candidate BIND 9.2.2rc1 correct this problem under Win32?

Thanks,


Danny Mayer

unread,
Sep 4, 2002, 10:06:14 PM9/4/02
to

At 08:47 PM 9/4/02, Jeffery Jones wrote:
> I have since gone back and confirmed that the large zone transfers
>properly from a Bind 9 master at T-1 speeds. I am unable to upgrade
>the Bind 8.33 master to a Bind 9.2.0 master because it eventually
>stops resolving external zones under Win32. Does the current beta
>release candidate BIND 9.2.2rc1 correct this problem under Win32?

I am unaware of any problem that would cause it to stop resolving external
zones under Win32. How do you know that it isn't? Did you run dig against
the box and get timeouts instead of a response? Did you use fully qualified
domain names? Are the authorative nameservers lame? Do you have a
firewall problem? Are all queries to external zones timedout or just some of
them?

Danny


Danny Mayer

unread,
Sep 5, 2002, 7:05:23 PM9/5/02
to

At 03:56 AM 9/5/02, Stanley Liu wrote:

>Danny,

>I do experience the problem Jeffery highlighted above: BIND9.2.1 for NT
>stops to resolve external zones after a while. Sometimes it stops resolve
>even local zones. If you look at the NT service console, the ISC BIND
>service is still running but it just times out (?) all queries. I run dig
>against the box using fully qualified domain names (external and local) with
>no firewall in between. I've posted here before and couple of guys reported
>that it was a known problem and recommended to go back to BIND8.3.3. It was
>very tempting but I've decided to give BIND 9.2.2rc1 a try and so far (about
>a week) so good.

9.2.0 did have an occasional problem with timeouts because of a problem
with the select loop but should only have been seen when there was no
activity on the machine. 9.2.1 fixed that but at the expense of making the
select timeout too short and having it become compute bound. 9.2.2rc1
should be just right. 9.3.0 will totally eliminate these problems as it's a
rewrite of that piece of code. Unfortunately it's not yet available even in a
snapshot.

>Judging from your response, Danny, you don't seem to be aware of such a
>problem. You've got me worry now (about eliminating the problem using
>BIND9.2.2rc1).

You need to let us know.

>Regards,
>
>Stanley Liu
>stanl...@toyota.com.au

Danny


Stanley Liu

unread,
Sep 5, 2002, 7:05:38 PM9/5/02
to

Danny,

Judging from your response, Danny, you don't seem to be aware of such a


problem. You've got me worry now (about eliminating the problem using
BIND9.2.2rc1).

Regards,

Stanley Liu
stanl...@toyota.com.au


Jeffery Jones

unread,
Sep 5, 2002, 7:56:14 PM9/5/02
to


The following threads have more info along with at least one other person who has
seen a similar problem.

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=agcl6i%245nvv%241%40isrv4.isc.org&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3D%2522jeffery%2Bjones%2522%2Bbind%2Brecursive%26btnG%3DGoogle%2BSearch

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=agf518%2479b7%241%40isrv4.isc.org&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3D%2522jeffery%2Bjones%2522%2Bbind%2Brecursive%26btnG%3DGoogle%2BSearch


The "No more recursive clients"error is suspicious since Bind 8.33 never misses a beat
in the same environment. Increasing the recursive-clients settings caused it to eventually
just stop resolving external domain names without even logging an error.

I was using NSLookup at the time, and it got an immediate reply
for all external domain names that indicated failure. I'll set it up again with 9.22 so that I'm paged next
time it fails and get better diagnostics with DIG, and dump the cache to see if some names are
present in cache but not returned.


Danny Mayer

unread,
Sep 7, 2002, 9:42:21 PM9/7/02
to

At 08:13 AM 9/5/02, Jeffery Jones wrote:

>On 5 Sep 2002 02:06:14 -0000, Danny Mayer <ma...@gis.net> wrote:
>
> >

Yes, I've exchanged mail with Bjorn a number of times of various issues.


> The "No more recursive clients"error is suspicious since Bind 8.33
> never misses a beat
>in the same environment. Increasing the recursive-clients settings
>caused it to eventually
>just stop resolving external domain names without even logging an error.

The "No more recursive clients" error is not unusual in BIND 9 and is not
specific
to Win32. You can increase the limit by adding to your options section:
recursive-clients 2000;
for example. The default is 1000.

> I was using NSLookup at the time, and it got an immediate reply
>for all external domain names that indicated failure. I'll set it up
>again with 9.22 so that I'm paged next
>time it fails and get better diagnostics with DIG, and dump the cache to
>see if some names are
>present in cache but not returned.

Danny


Danny Mayer

unread,
Sep 9, 2002, 7:09:38 AM9/9/02
to

At 08:13 AM 9/5/02, Jeffery Jones wrote:

>On 5 Sep 2002 02:06:14 -0000, Danny Mayer <ma...@gis.net> wrote:
>
> >

Jeffery Jones

unread,
Sep 12, 2002, 2:34:24 PM9/12/02
to

On 5 Sep 2002 23:05:38 -0000, "Stanley Liu" <stanl...@toyota.com.au> wrote:

>> I am unaware of any problem that would cause it to stop resolving external
>> zones under Win32. How do you know that it isn't? Did you run dig
>against
>> the box and get timeouts instead of a response? Did you use fully
>qualified
>> domain names? Are the authorative nameservers lame? Do you have a
>> firewall problem? Are all queries to external zones timedout or just some
>of
>> them?

I upgraded from 8.33 to 9.2.2rc1 and got results similar to 9.2.1: it would run
for 3 to 24 hours then fail. I have one debug session, but unfortunately after I
dumped the cache, I realized that nothing I queried with DIG was in cache. However,
it also failed to return anything for a zone for which it was authoritative. 8.33 runs for
months (or at least between hotfix reboots!) without failing.

Here's some more details:


Failure #1 (after about 20 hours) -

Sep 11 19:04:12.337 socket.c:2230: fatal error:
Sep 11 19:04:12.337 select() failed: Socket operation on non-socket
Sep 11 19:04:12.337 exiting (due to fatal error in library)


Failure #2 after running for about 3 hours, nothing in log file:


>rndc status

number of zones: 287
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
server is up and running

>rndc stats

+++ Statistics Dump +++ (1031827355)
success 46762
referral 380
nxrrset 1747
nxdomain 69830
recursion 38606
failure 853
--- Statistics Dump --- (1031827355)


>dig www.yahoo.com. @192.168.212.3

; <<>> DiG 9.2.2rc1 <<>> www.yahoo.com. @192.168.212.3
;; global options: printcmd
;; connection timed out; no servers could be reached


(This was run from the machine and should have returned immediately as in
the example below run after I downgraded back to 8.33:)


; <<>> DiG 8.3 <<>> www.yahoo.com. @192.168.212.3
; (1 server found)
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 10, ADDITIONAL: 10
;; QUERY SECTION:
;; www.yahoo.com, type = A, class = IN

.... NORMAL STUFF DELETED ....

;; Total query time: 0 msec
;; FROM: homeur to SERVER: 192.168.212.3 192.168.212.3
;; WHEN: Thu Sep 12 10:08:42 2002
;; MSG SIZE sent: 31 rcvd: 539

Danny Mayer

unread,
Sep 12, 2002, 7:42:45 PM9/12/02
to

At 10:40 AM 9/12/02, Jeffery Jones wrote:

>On 5 Sep 2002 23:05:38 -0000, "Stanley Liu" <stanl...@toyota.com.au> wrote:
>

> >> I am unaware of any problem that would cause it to stop resolving external
> >> zones under Win32. How do you know that it isn't? Did you run dig
> >against
> >> the box and get timeouts instead of a response? Did you use fully
> >qualified
> >> domain names? Are the authorative nameservers lame? Do you have a
> >> firewall problem? Are all queries to external zones timedout or just some
> >of
> >> them?
>

> I upgraded from 8.33 to 9.2.2rc1 and got results similar to 9.2.1: it
> would run
>for 3 to 24 hours then fail.

How would it fail? What's the output of a query?

> I have one debug session, but unfortunately after I
>dumped the cache, I realized that nothing I queried with DIG was in cache.

What do you mean by a debug session. Did you compile the source code
and run it under the debugger?

> However,
>it also failed to return anything for a zone for which it was authoritative.

What did it do?

> 8.33 runs for
>months (or at least between hotfix reboots!) without failing.
>
> Here's some more details:
>
>
>Failure #1 (after about 20 hours) -
>
>Sep 11 19:04:12.337 socket.c:2230: fatal error:
>Sep 11 19:04:12.337 select() failed: Socket operation on non-socket
>Sep 11 19:04:12.337 exiting (due to fatal error in library)

This is a known problem that is fixed in 9.3.0.


>Failure #2 after running for about 3 hours, nothing in log file:
>
>
> >rndc status
>
>number of zones: 287
>debug level: 0
>xfers running: 0
>xfers deferred: 0
>soa queries in progress: 0
>query logging is OFF
>server is up and running
>
> >rndc stats
>
>+++ Statistics Dump +++ (1031827355)
>success 46762
>referral 380
>nxrrset 1747
>nxdomain 69830
>recursion 38606
>failure 853
>--- Statistics Dump --- (1031827355)
>
>
> >dig www.yahoo.com. @192.168.212.3
>
>; <<>> DiG 9.2.2rc1 <<>> www.yahoo.com. @192.168.212.3
>;; global options: printcmd
>;; connection timed out; no servers could be reached

You need to file a bug report to bind9...@isc.org.
Does the server respond to anything, like rndc or any TCP request like
a zone transfer?


> (This was run from the machine and should have returned immediately as in
>the example below run after I downgraded back to 8.33:)
>
>
>; <<>> DiG 8.3 <<>> www.yahoo.com. @192.168.212.3
>; (1 server found)
>;; res options: init recurs defnam dnsrch
>;; got answer:
>;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4
>;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 10, ADDITIONAL: 10
>;; QUERY SECTION:
>;; www.yahoo.com, type = A, class = IN
>
> .... NORMAL STUFF DELETED ....
>
>;; Total query time: 0 msec
>;; FROM: homeur to SERVER: 192.168.212.3 192.168.212.3
>;; WHEN: Thu Sep 12 10:08:42 2002
>;; MSG SIZE sent: 31 rcvd: 539

Danny


0 new messages