Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stalling slave transfers

1,426 views
Skip to first unread message

Tom Sommer

unread,
May 8, 2013, 3:26:27 AM5/8/13
to bind-...@lists.isc.org
Hi,

I have a problem with one of 3 slave servers, all set up the exact same
way, with the exact same bind version and configuration.

One slave has a problem transfering zones from the master.

The logfiles are flooded with "received notify for zone" .. "refresh in
progress, refresh check queued" lines and "rndc status" returns a
constant high number of "soa queries in progress".
After a few hours the zones are transfers, so the connection to the
master is working, but there is a major delay. I tried resetting the
slave and transfering ALL slave zones again, which worked fine
instantly. The problem still appeared again after a few hours though.

The master has three network-paths, one on external IP, one on internal
IP and one on IPv6. All 3 paths work fine, because the transfers happen
after an hour or so.

There is no hints in the master's log.
The other two slaves are running perfectly, no errors or delays what so
ever.

Bind version 9.9.2-P2 (recently upgraded to).

Any hints would be appreciated, as I feel like I've exhausted most
options.

Thank you.
--
Tom Sommer

Cathy Almond

unread,
May 8, 2013, 6:25:04 AM5/8/13
to bind-...@lists.isc.org
Have a look at this KB article (you'll need to register to view - but
registration is open to all):

https://kb.isc.org/article/AA-00726/30/Tuning-your-BIND-configuration-effectively-for-zone-transfers-particularly-with-many-frequently-updated-zones.html

Also - and this isn't covered in that article (yet) - if you're using
views, then use-alt-transfer-source defaults to 'yes'. You might want
to set it explicitly to 'no' or to define alt-transfer-source
and/or alt-transfer-source-v6.

Tom Sommer

unread,
May 8, 2013, 2:15:55 PM5/8/13
to Cathy Almond, bind-...@lists.isc.org
Thank you, great resource. I think I solved it with raising
serial-query-limit, it's just odd that it's not required on the other
two servers.

Another issue has arisen now though, the logfile is filled with lots of
named[5596]: zone example.com/IN: refresh: failure trying master
1.2.3.4#53 (source 0.0.0.0#0): operation canceled

But if I do a "dig example.com @1.2.3.4" it's working just fine. Same
server as with the previous issue.

Any thoughts? Thank you.

// Tom

Tom Sommer

unread,
May 8, 2013, 2:18:08 PM5/8/13
to Cathy Almond, bind-...@lists.isc.org

On 5/8/13 8:15 PM, Tom Sommer wrote:
> Another issue has arisen now though, the logfile is filled with lots of
> named[5596]: zone example.com/IN: refresh: failure trying master
> 1.2.3.4#53 (source 0.0.0.0#0): operation canceled
>
and

named[5596]: zone example.com/IN: refresh: retry limit for master
1.2.3.4#53 exceeded (source 0.0.0.0#0)

// Tom

Cathy Almond

unread,
May 9, 2013, 5:36:46 AM5/9/13
to bind-...@lists.isc.org
> Another issue has arisen now though, the logfile is filled with lots of
> named[5596]: zone example.com/IN: refresh: failure trying master
> 1.2.3.4#53 (source 0.0.0.0#0): operation canceled
>
> But if I do a "dig example.com @1.2.3.4" it's working just fine. Same
> server as with the previous issue.
>
> Any thoughts? Thank you.
>
> // Tom

I don't think you solved the problem - I think you moved it (or made it
happen faster...)

The refresh errors indicate that the master isn't responding to your
slave for some reason. That's what you'll need to investigate. I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
routing tables.

A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.

Cathy

Tom Sommer

unread,
May 9, 2013, 6:55:31 AM5/9/13
to Cathy Almond, bind-...@lists.isc.org

On 5/9/13 11:36 AM, Cathy Almond wrote:
> I don't think you solved the problem - I think you moved it (or made it
> happen faster...)
>
> The refresh errors indicate that the master isn't responding to your
> slave for some reason. That's what you'll need to investigate. I would
> suggest auditing the differences between this slave and the others in
> their named configurations as well as their configured IP interfaces and
> routing tables.
>
> A pair of network packet traces (slave and the non-responding auth
> server) might also point you in the right direction.
>
Right, but when I perform a "dig" from the server OS, the transfer and
network-communication work fine - so there are no signs as to why named
can't connect to the master, but the OS can.

I'll do some more digging.

Thanks.

Luther, Dan

unread,
May 9, 2013, 8:19:17 AM5/9/13
to bind-...@lists.isc.org
Tom,

What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm wondering here if the slave you're having problems with is blocking TCP port 53. Such a configuration would allow you to query the master server, but not transfer to/from it.

Dan Luther
Operations Engineer
Systems Operation Engineering
Level 3 Communications
One Technology Center, Tulsa OK 74103
e: dan.l...@level3.com
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-...@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Tom Sommer

unread,
May 14, 2013, 2:59:35 AM5/14/13
to Luther, Dan, bind-...@lists.isc.org

 

On 5/9/13 2:19 PM, Luther, Dan wrote:
Tom, 

What happens when you "dig +tcp example.com @1.2.3.4"? Specifically I'm wondering here if the slave you're having problems with is blocking TCP port 53. Such a configuration would allow you to query the master server, but not transfer to/from it.

That works fine, but I think I figured out the problem, it was due to the server having acquired a 2nd (autodiscovered) IPv6 address, and it was using that as transfer source. It would be very helpful if the logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That would help debugging a lot.

I'm down to only seeing the error "retry limit for master" and "refresh: failure trying master" on IPv6 now, and only occasionally.

It also appears the master is sending two notifies for each zone, to each slave, one on IPv4 and one on IPv6?

// Tom

Tony Finch

unread,
May 15, 2013, 10:58:37 AM5/15/13
to Tom Sommer, bind-...@lists.isc.org
Tom Sommer <ma...@tomsommer.dk> wrote:
>
> That works fine, but I think I figured out the problem, it was due to
> the server having acquired a 2nd (autodiscovered) IPv6 address, and it
> was using that as transfer source. It would be very helpful if the
> logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
> would help debugging a lot.

I have found that if you have multiple master addresses listed for a slave
zone, named will not fall back to trying later addresses if the first one
fails.

Tony.
--
f.anthony.n.finch <d...@dotat.at> http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.

Cathy Almond

unread,
May 17, 2013, 10:54:22 AM5/17/13
to bind-...@lists.isc.org
On 15/05/13 15:58, Tony Finch wrote:
> Tom Sommer <ma...@tomsommer.dk> wrote:
>>
>> That works fine, but I think I figured out the problem, it was due to
>> the server having acquired a 2nd (autodiscovered) IPv6 address, and it
>> was using that as transfer source. It would be very helpful if the
>> logfile said the actual source IP, and not just 0.0.0.0#53 or ::#0. That
>> would help debugging a lot.
>
> I have found that if you have multiple master addresses listed for a slave
> zone, named will not fall back to trying later addresses if the first one
> fails.
>
> Tony.
>
The speed of fall-back through the masters list may depend on whether or
not you set "try-tcp-refresh no;" in named.conf.

Another contributing factor is whether the failure mode is immediate
(ICMP error or connection failure) or has to time out from named's
perspective.


0 new messages