refresh_callback: zone , after bind9 upgrade

Jerry

unread,

Aug 7, 2001, 1:19:34 PM8/7/01

to

I recently upgraded one of my slaves to bind 9.1.3. (My master is an
old bind8 box)
Now I receive several:
/opt/bind9/sbin/named[646]: refresh_callback: zone
155.62.209.in-addr.arpa/IN: failure for 10.1.1.6#53: timed out

I have noticed tons of posts here regarding the same thing, but no
real resolution. It seems harmless since the zones are getting
transfered, but I am wondering exactly what refresh_callback does and
why it's failing. This happens with a large number of zones (maybe
all, I haven't been able to sift through it yet). Eventually it also
gets an error for maximum number of retries exceeded for
refresh_callback as well for each zone.

Does anyone know the real problem here? There is no firewall access
or permission problems, since this machine formerly transfered zones
without the error and dig axfr transfers the zone fine.

Is this something new in bind9 or an incompatibility between bind8 and
9?

Any help would be greately appreciated.
Jerry

Mark.A...@nominum.com

unread,

Aug 7, 2001, 6:27:09 PM8/7/01

to

A refresh_callback is the code called when a refresh query
succeeds or fails. A refresh query is the query made to
see if the zone is up to dat. The server will make three
of these before it gives up and reschedules roughly retry
seconds later. The failure is that the nameserver didn't
get a response in the expected time (timed out).

Mark

--
Mark Andrews, Nominum Inc.
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark.A...@nominum.com

Barry Margolin

unread,

Aug 8, 2001, 10:58:20 AM8/8/01

to

In article <9kppvt$q...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
>
> A refresh_callback is the code called when a refresh query
> succeeds or fails. A refresh query is the query made to
> see if the zone is up to dat. The server will make three
> of these before it gives up and reschedules roughly retry
> seconds later. The failure is that the nameserver didn't
> get a response in the expected time (timed out).

Is this the log message that replaced:

named[<pid>: Err/TO getting serial# for "<domain>"
named-xfer[<pid>]: connect(<addr>) for zone <domain> failed: Connection timed out

BIND's log messages have never really been the epitome of clarity. But
from what I've seen in the newsgroup (we're not yet running BIND 9 here),
it seems like BIND 9 has made things even worse. Shouldn't a complete
rewrite have given you the opportunity to improve this?

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Mark.A...@nominum.com

unread,

Aug 8, 2001, 6:12:37 PM8/8/01

to

> In article <9kppvt$q...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
> >
> > A refresh_callback is the code called when a refresh query
> > succeeds or fails. A refresh query is the query made to
> > see if the zone is up to dat. The server will make three
> > of these before it gives up and reschedules roughly retry
> > seconds later. The failure is that the nameserver didn't
> > get a response in the expected time (timed out).
>
> Is this the log message that replaced:
>
> named[<pid>: Err/TO getting serial# for "<domain>"
> named-xfer[<pid>]: connect(<addr>) for zone <domain> failed: Connection timed
> out

Basically.

>
> BIND's log messages have never really been the epitome of clarity.

Syslog messages are not supposed to be large. We try and
get the pertinent information into the message (BIND 9.1
dropped the ball somewhat there). In this case there is
not much more we could report other that "we timed out".

The ARM does need a appendix which covers the error messages
and what they mean.

> But
> from what I've seen in the newsgroup (we're not yet running BIND 9 here),
> it seems like BIND 9 has made things even worse. Shouldn't a complete
> rewrite have given you the opportunity to improve this?

The re-write had to do more with code managability and
correctness of operation, new protocol features, IPv6
support. Clearer error messages was further down the list.
I think we have actually achieved the primary goals of the
BIND 9 re-write.

Mark

>
> --
> Barry Margolin, bar...@genuity.net
> Genuity, Woburn, MA
> *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
> Please DON'T copy followups to me -- I'll assume it wasn't posted to the grou
> p.
>

Barry Margolin

unread,

Aug 8, 2001, 6:45:23 PM8/8/01

to

In article <9ksdgl$n...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
> The re-write had to do more with code managability and
> correctness of operation, new protocol features, IPv6
> support. Clearer error messages was further down the list.
> I think we have actually achieved the primary goals of the
> BIND 9 re-write.

But you were rewording the error messages anyway. It shouldn't take any
more effort to make them understandable in the process. On the contrary,
it seems to me like you took pains to make them even more inscrutable than
BIND 8's.

Mark.A...@nominum.com

unread,

Aug 8, 2001, 9:07:27 PM8/8/01

to

> In article <9ksdgl$n...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
> > The re-write had to do more with code managability and
> > correctness of operation, new protocol features, IPv6
> > support. Clearer error messages was further down the list.
> > I think we have actually achieved the primary goals of the
> > BIND 9 re-write.
>
> But you were rewording the error messages anyway. It shouldn't take any
> more effort to make them understandable in the process. On the contrary,
> it seems to me like you took pains to make them even more inscrutable than
> BIND 8's.

Barry we were not "re-wording" the error messages. We were
re-writing from scratch that included adding code to emit
error messages where appropriate. Our primary goal was to
make sure that anything that stop named performing its job
emitted a error message. Informational messages fell by the
way side.

We have tried to keep the error messages stable in 9.1.x
so that scripts could process them without having to be
changed for each bug fix release.

We have improved the error reporting in 9.2.x and if you
have specific problems with error messages in 9.2.x there
is still time to get things changed prior to the 9.2.0
release. At which stage additional changes will go into
9.3.x.

Mark

>
> --
> Barry Margolin, bar...@genuity.net
> Genuity, Woburn, MA
> *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.

Barry Margolin

unread,

Aug 9, 2001, 10:35:12 AM8/9/01

to

In article <9ksnof$p...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
> Barry we were not "re-wording" the error messages. We were

Sure you were. You guys are presumably intimately familiar with most of
BIND 8's error messages. You could have simply reused the old wording in
cases where they applied, but instead you decided to come up with new
wording from scratch.

> re-writing from scratch that included adding code to emit
> error messages where appropriate. Our primary goal was to
> make sure that anything that stop named performing its job
> emitted a error message. Informational messages fell by the
> way side.

But every time you had to emit an error message, you had the choice of
wording it in English or computerese. You apparently chose the latter in
many cases, and that's what I'm complaining about.

> We have improved the error reporting in 9.2.x and if you
> have specific problems with error messages in 9.2.x there
> is still time to get things changed prior to the 9.2.0
> release. At which stage additional changes will go into
> 9.3.x.

Since we're not running 9.x, I can't comment on specific cases. All I know
are the ones I've seen mentioned here in the newsgroup, and they're even
less understandable than their BIND 8 equivalents.

Kevin Darcy

unread,

Aug 9, 2001, 5:46:50 PM8/9/01

to

Barry Margolin wrote:

> In article <9ksnof$p...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:

> > Barry we were not "re-wording" the error messages. We were
>

> Sure you were. You guys are presumably intimately familiar with most of
> BIND 8's error messages. You could have simply reused the old wording in
> cases where they applied, but instead you decided to come up with new
> wording from scratch.

Barry, the "refresh_callback" warning message is a diagnostic for the
refresh_callback() routine, which didn't even *exist* in BIND 8. There is no "old
wording" which applies, because there is no equivalent part of the code. Mark has
already explained that Nominum intends to put "friendly" error messages into
BIND 9. But it's just not at the top of the priority list. You can argue about
priorities all you want, but I think it's a little disingenuous to say "simply
reuse[] the old wording".

- Kevin

pe...@icke-reklam.ipsec.nu.invalid

unread,

Aug 9, 2001, 6:28:19 PM8/9/01

to

Barry Margolin <bar...@genuity.net> wrote:
> In article <9ksnof$p...@pub3.rc.vix.com>, <Mark.A...@nominum.com> wrote:
>> Barry we were not "re-wording" the error messages. We were

> Sure you were. You guys are presumably intimately familiar with most of
> BIND 8's error messages. You could have simply reused the old wording in
> cases where they applied, but instead you decided to come up with new
> wording from scratch.

>> re-writing from scratch that included adding code to emit
>> error messages where appropriate. Our primary goal was to
>> make sure that anything that stop named performing its job
>> emitted a error message. Informational messages fell by the
>> way side.

> But every time you had to emit an error message, you had the choice of
> wording it in English or computerese. You apparently chose the latter in
> many cases, and that's what I'm complaining about.

>> We have improved the error reporting in 9.2.x and if you
>> have specific problems with error messages in 9.2.x there
>> is still time to get things changed prior to the 9.2.0
>> release. At which stage additional changes will go into
>> 9.3.x.

> Since we're not running 9.x, I can't comment on specific cases. All I know
> are the ones I've seen mentioned here in the newsgroup, and they're even
> less understandable than their BIND 8 equivalents.

I would like to add a word about bind-9 and error messages.

The error messages ( or messages in general ) emitted from bind-8 is no
wonder of clarity. But we have got used to it.

Unfortently bind-9 development folks did their own error messages, most of them
has no or little familiary with bind-8.

But, as i read between the lines here that there is an intention to have
clear and exact error/messages, they just havn't the time to "get it right"

That leaves us with an opportunity ! Why not collect all possible messages,
get them in a file, try to make consensus about how they should be.

Maybe even an "Error-acrynym" that is unique and could be explained in
a document that comes with bind.

Inluding en "error-acronym" ( anyone remenber VMS ? SYS-F-Lost-Power: style could sometimes
be helpful :-)

Maybe we even could prepare the usage of international messages via some method.

Comments ?

> --
> Barry Margolin, bar...@genuity.net
> Genuity, Woburn, MA
> *** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
> Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

--
Peter Håkanson
IPSec Sverige (At the Riverside of Gothenburg, home of Volvo)
Sorry about my e-mail address, but i'm trying to keep spam out.
Remove "icke-reklam"and "invalid" and it works.

Barry Margolin

unread,

Aug 9, 2001, 7:36:04 PM8/9/01

to

In article <9kv0ca$b...@pub3.rc.vix.com>,

Kevin Darcy <k...@daimlerchrysler.com> wrote:
>Barry, the "refresh_callback" warning message is a diagnostic for the
>refresh_callback() routine, which didn't even *exist* in BIND 8. There is no "old
>wording" which applies, because there is no equivalent part of the code. Mark has

There may not have been an equivalent piece of code, but there certainly
was an equivalent situation. Both programs perform refreshes, and they
both detect timeouts. BIND 8 said 'Err/TO getting serial# for "<domain>"'
in the analogous situation. I can't think of any reason why
refresh_callback() couldn't have used those words, or something similar,
when it detects the timeout.

>already explained that Nominum intends to put "friendly" error messages into
>BIND 9. But it's just not at the top of the priority list. You can argue about

So they think it's better to do it twice than do it right the first time
(which is actually at least the second time, since BIND 1-8 could be
considered the first time)? Do you really think that they'll find the time
and resources to go back through all the error messages, when there's
always more critical work to be done on the program? This is the type of
project that typically sits at the bottom of the priority list for years.

>priorities all you want, but I think it's a little disingenuous to say "simply
>reuse[] the old wording".

I don't think they should have reused the old wording. I think that since
they were doing a complete rewrite, and had to compose new error messages,
they should have composed understandable ones. It has nothing to do with
priorities, unless you think that "don't be cryptic" needs to be explicitly
on a priority list. It has to do with doing something well instead of
half-assed (I'm *only* talking about the error messages, not the protocol
implementation), and taking the users into consideration.

It takes no more time or resources to compose understandable error messages
than to compose cryptic ones. Going back through all the code and redoing
all the error messages -- *that* is alot of work. I won't hold my breath
waiting for it to happen.

BTW, to the person who posted the suggestion that we go to VMS-style TLA's:
Boo, hiss!

Brad Knowles

unread,

Aug 9, 2001, 8:10:55 PM8/9/01

to

At 11:19 PM +0000 8/9/01, Barry Margolin wrote:

> So they think it's better to do it twice than do it right the first time
> (which is actually at least the second time, since BIND 1-8 could be
> considered the first time)?

BIND 9 was a clean rewrite from scratch, using the basic
principles of proper nameserver design. Therefore, there was no code
re-use from BIND 8, nor was there much error message re-use, since I
understand that guys like Paul Vixie and Mark Andrews (who know the
BIND 8 code backwards and forwards) were not much involved in doing
BIND 9.

Making a point of going through the BIND 8 code and writing down
every single error message and the circumstances in which that error
message cropped up, so that this behaviour could be perfectly
duplicated in BIND 9, was simply not an important design criteria.

> Do you really think that they'll find the time
> and resources to go back through all the error messages, when there's
> always more critical work to be done on the program? This is the type of
> project that typically sits at the bottom of the priority list for years.

If you really feel this strongly about it, then I'd suggest you
stop bitching, and start reviewing some BIND 8 code, so as to bring
forward all your favourite error messages. I'm sure that code
contributions would be more than welcomed by the ISC and Nominum.

> I don't think they should have reused the old wording. I think that since
> they were doing a complete rewrite, and had to compose new error messages,
> they should have composed understandable ones.

This is a different issue altogether, and does not relate back to
your previous statement:

You guys are presumably intimately familiar with most of
BIND 8's error messages. You could have simply reused
the old wording in cases where they applied, but instead
you decided to come up with new wording from scratch.

I will agree that good error reporting is highly desirable.
However, I also understand how much work went into a complete,
ground-up re-write of BIND 9 based on first principles.

When you get an entire team of people involved in doing a project
like this, and they get into serious "hack mode", it is not at all
unusual to find that techies naturally write error messages for other
techies (i.e., mostly their co-workers on the team), and even if you
ask them to spend a lot of time to try and make the wording as
crystal clear as possible, the result is frequently still extremely
opaque and obtuse to any outsider (including people like you and me).

This is life. You can either learn to deal with it, and do your
part to help fix the situation, or not. The choice is up to you.

> It takes no more time or resources to compose understandable error messages
> than to compose cryptic ones.

Not at all true. Composing error messages that are actually
generally understandable is one of the most difficult tasks in
writing a program, because you basically have to summarize into a
single line the entire state of the massive million-line (or
whatever) program, and express enough information to be useful
without expressing so much information as to cause data overload.

Getting that balance just right is one of the rarest gifts I've
ever seen from any programmer.

IMO, this is something that the folks at Men & Mice do really,
really well with DNS Expert Professional -- the first DNS debugging
tool I've ever encountered that has plain English descriptions of the
error.

--
Brad Knowles, <brad.k...@skynet.be>

H4sICIFgXzsCA2RtYS1zaWcAPVHLbsMwDDvXX0H0kkvbfxiwVw8FCmzAzqqj1F4dy7CdBfn7
Kc6wmyGRFEnvvxiWQoCvqI7RSWTcfGXQNqCUAnfIU+AT8OZ/GCNjRVlH0bKpguJkxiITZqes
MxwpSucyDJzXxQEUe/ihgXqJXUXwD9ajB6NHonLmNrUSK9nacHQnH097szO74xFXqtlbT3il
wMsBz5cnfCR5cEmci0Rj9u/jqBbPeES1I4PeFBXPUIT1XDSOuutFXylzrQvGyboWstCoQZyP
dxX4dLx0eauFe1x9puhoi0Ao1omEJo+BZ6XLVNaVpWiKekxN0VK2VMpmAy+Bk7ZV4SO+p1L/
uErNRS/qH2iFU+iNOtbcmVt9N16lfF7tLv9FXNj8AiyNcOi1AQAA

Barry Margolin

unread,

Aug 10, 2001, 11:00:25 AM8/10/01

to

In article <9kv8qf$c...@pub3.rc.vix.com>,

Brad Knowles <brad.k...@skynet.be> wrote:
> If you really feel this strongly about it, then I'd suggest you
>stop bitching, and start reviewing some BIND 8 code, so as to bring
>forward all your favourite error messages. I'm sure that code
>contributions would be more than welcomed by the ISC and Nominum.

I don't have any favorite messages. Like I said before, BIND 8's messages
also were terrible. It's just that BIND 9's seem even worse!

>> I don't think they should have reused the old wording. I think that since
>> they were doing a complete rewrite, and had to compose new error messages,
>> they should have composed understandable ones.
>
> This is a different issue altogether, and does not relate back to
>your previous statement:
>
> You guys are presumably intimately familiar with most of
> BIND 8's error messages. You could have simply reused
> the old wording in cases where they applied, but instead
> you decided to come up with new wording from scratch.

I said "could have", not "should have". I specifically said that I
presumed that the people writing BIND 9 were already familiar with BIND 8,
either as developers or users. If that presumption were true, it wouldn't
have required any special effort to use similar messages, like your earlier
suggestion that they go through BIND 8 writing down all the error messages.

My point all along has been that they shouldn't have to go to any special
effort to produce messages better than BIND 8, or at least the same.
Either they already know BIND 8's messages, in which case they could just
reuse them, or they have to compose new messages from scratch, in which
case they should compose understandable ones.

> When you get an entire team of people involved in doing a project
>like this, and they get into serious "hack mode", it is not at all

Professional software development should not be done in "hack mode".

>unusual to find that techies naturally write error messages for other
>techies (i.e., mostly their co-workers on the team), and even if you
>ask them to spend a lot of time to try and make the wording as
>crystal clear as possible, the result is frequently still extremely
>opaque and obtuse to any outsider (including people like you and me).

When BIND was first written, only techies used it, so it's understandable
that the error messages were targeted to them. The Internet user base has
changed dramatically in the two decades since then, and the authors of one
of the most critical pieces of software should take that into account.

> This is life. You can either learn to deal with it, and do your
>part to help fix the situation, or not. The choice is up to you.

So if someone does something I think is silly, I'm not allowed to point it
out? I'll bet most of you don't simply "learn to deal with" all the
stupidity that comes out of Microsoft.

>> It takes no more time or resources to compose understandable error messages
>> than to compose cryptic ones.
>
> Not at all true. Composing error messages that are actually
>generally understandable is one of the most difficult tasks in

I'm not asking for bon mots, just plain English. "Error while performing
SOA query to refresh <domain>: <error message>" does not require the skills
of a technical writer. Anyone who can't do this should not be in the
business of writing software with a user interface.

An error message that simply says the name of some internal subroutine that
the user has never heard of and can't possibly know the purpose of is the
exact opposite of this. Requiring users to use the source to understand
tracing messages is not unreasonable (although I wish there were a better
way), but requiring them to read the source to understand common error
messages is.