Re: Zone Immortality

Mark Andrews

unread,

Aug 21, 2008, 12:28:09 AM8/21/08

to

> RFC 1034 states:
>
> Each secondary server is required to perform the following operations
> against the master, but may also optionally perform these operations
> against other secondary servers. This strategy can improve the transfer
> process when the primary is unavailable due to host downtime or network
> problems, or when a secondary server has better network access to an
> "intermediate" secondary than to the primary.
>
>
>
> Now, disregarding the obsolete "secondary" language (we now call them
> "slaves"), and the obviously-incorrect word "following" (which can't
> possibly be right, since this is the last paragraph of the section --
> there is nothing relevant "following"), one notes the blatant ambiguity
> of "these operations". Which operations? Minimally, checking the serial
> number and performing a zone transfer if it has increased? I think that
> much is fairly obvious.
>
> What's a little *less* obvious, and the point of my missive, is whether,
> when and/or how "these operations" might include the following, from a
> previous paragraph: "
>
> If the secondary finds it impossible to perform a serial check for the EXPIRE
> interval, it must
> assume that its copy of the zone is obsolete an[sic] discard it."
>
> The problem with slaves resetting their EXPIRE timers every time they
> perform a successful refresh from another slave is that the slaves may
> be in a reciprocal or circular -- A refreshes from B, B refreshes from
> C, C refreshes from A -- relationship, so the zone may become
> "immortal", i.e. even when it is deleted from the primary master, the
> slaves continue to keep the zone alive indefinitely by constantly
> refreshing each other.
>
> Surely it cannot have been the intent for zones to become "immortal".
> That would defeat the whole purpose of having EXPIRE in the first place.
>
> Yet, having slaves in a reciprocal or circular relationship is a
> perfectly _valid_ arrangement. It ensures that, if the primary master
> goes down for an extended period of time (i.e. multiple REFRESH cycles),
> all slaves in the set eventually converge on the latest-available
> version of the zone.
>
> I would suggest, therefore, that if this language in 1034 cannot or will
> not be clarified/updated in the foreseeable future, so as to outlaw this
> "optional" replication topology, that implementors review their
> refresh/expire logic, and if it doesn't already exist in their
> implementation, provide a way for administrators to differentiate true
> "master" replication sources, which reset the EXPIRE timer on each
> successful refresh, from "peer" sources, which do not. This would allow
> large enterprises --such as ours -- to build highly robust and available
> DNS infrastructures without having to struggle with zone-immortality
> risks/issues.
>
>
> - Kevin

There are lots of existing ways to address this.

* Outlaw loops.
* Don't reset the expire counter on refresh from peers only
on transfer from peers.

* Use DNSSEC to make data go stale.

Extend AXFR/IXFR/SOA to add a expiry counter. The master would
always set this to SOA EXPIRE. The slave would report it's
current value for that counter.

This fits well into EDNS as it is a hop-by-hop value.

It would also address the N * expiry problem that currently
exists when you have a deep transfer graph.

e.g. <EXPIRE><4><COUNTER>

The slave would add a <EXPIRE><0> option to its SOA
refresh queries and to IXFR and AXFR queries.

The slave would use the maximum of the returned value and
it's expire counter on SOA serial matches to refresh queries
or IXFR up to date responses to set its expire counter.

The slave would use this value rather than the SOA expire
field to initialise it's expire counter on IXFR/AXFR which
change the zone content.

The slave will perform a sanity check to ensure that the
returned value is no greater than the SOA expire field. If
it is greater then it will use the SOA expire field instead.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org

--
to unsubscribe send a message to namedroppe...@ops.ietf.org with
the word 'unsubscribe' in a single line as the message text body.
archive: <http://ops.ietf.org/lists/namedroppers/>

Masataka Ohta

unread,

Aug 21, 2008, 1:52:00 AM8/21/08

to

Kevin Darcy wrote:

> all slaves in the set eventually converge on the latest-available
> version of the zone.

A hidden assumption of 1034 is that such convergence will occur
within a period much shorter than the expiration period.

If not, you have an administrative problem a lot more serious
than a small variation of expiraiton period.

Masataka Ohta

Mark Andrews

unread,

Aug 26, 2008, 12:56:05 AM8/26/08

to

> As for the "expiry counter" idea, I like it, but to be perfectly honest,
> I don't know that either the immortality problem, or the N*EXPIRE issue,
> or both, can generate enough enthusiasm to get any traction for a
> protocol extension, at this time (given the other pressing DNS-security
> concerns at the moment).
>
> - Kevin

The draft to describe this is written and posted.

draft-andrews-dnsext-expire-00.txt

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org

--

Edward Lewis

unread,

Aug 30, 2008, 5:25:23 PM8/30/08

to

At 14:28 +1000 8/21/08, Mark Andrews wrote:
> * Outlaw loops.
> * Don't reset the expire counter on refresh from peers only
> on transfer from peers.

I'd reset the expiry only after the successful transfer of an zone
with an incremented serial number. That would take care of loops.

> * Use DNSSEC to make data go stale.

Don't do that. Don't overload the meaning of the RRSIG temporal
fields. They are meant to protect the cryptography, not convey any
meaning of the data.

At 1:00 -0400 8/21/08, Brian Dickson wrote:
>If a zone is transfered *from* a slave, the values for the TTLs should
>be modified downwards by the local EXPIRE timer. Only transfers from a
>zone master should have the original TTL values from the SOA used for timers.

The problem is distinguishing between a master and a slave.

At 14:52 +0900 8/21/08, Masataka Ohta wrote:
>A hidden assumption of 1034 is that such convergence will occur
>within a period much shorter than the expiration period.

Many assumptions of that era no longer are valid. ;)

>If not, you have an administrative problem a lot more serious
>than a small variation of expiraiton period.

Not necessarily. There are applications of DNS on disjointed
networks such as very remote places (which hardly exist anymore),
sea-going vessels, inter-planetary applications.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis +1-571-434-5468
NeuStar

Never confuse activity with progress. Activity pays more.

Peter Koch

unread,

Sep 15, 2008, 1:12:18 PM9/15/08

to

On Wed, Aug 20, 2008 at 11:32:38PM -0400, Kevin Darcy wrote:

> Now, disregarding the obsolete "secondary" language (we now call them
> "slaves")

{it's not that easy; master and slave actually terms to describe both ends
of an XFR operation, where primary and secondary refer to the root of the
XFR dependency graph and very other node, repsectively. The problem
occurs in cases where several servers are both master and slave.
\end{terminology-rant}}

> Surely it cannot have been the intent for zones to become "immortal".
> That would defeat the whole purpose of having EXPIRE in the first place.

From an operational perspective I'm not sure I understand what you're trying
to achieve here. "expire" and its consequences have alwas appeared little
predictable to me. Mark will know BIND's history much better, but I remember
seeing SERVFAIL responses or "valid" answers with just the AA bit missing.
Zone "expiry" never occured to me as a desirable feature that I'd rely upon.
Instead it's trying to put an end to waste of resources of repeated SOA-
and maybe even XFR attempts. Insofar it's a relief to the slave system
less than a feature to securely and predictably phase out DNS zones.
I'd see that at the name server management level.

> implementation, provide a way for administrators to differentiate true
> "master" replication sources, which reset the EXPIRE timer on each
> successful refresh, from "peer" sources, which do not. This would allow

When a succesful refresh with the non-primary-master doesn't influence the
expire timer, why try it in the first place?

-Peter

Mark Andrews

unread,

Sep 15, 2008, 8:26:22 PM9/15/08

to

Peter, look at the following two common senarios (1 & 2) and the
two solution senarios (3 & 4). Kevin is worried about senario 2.
I'm worried about both senarios 1 and 2.

Senario 1:
Simple transfer graph, no loops.

primary:
<no masters>
secondary1:
masters { primary; };
secondary2:
masters { primary; };
secondary3:
masters { secondary1; secondary2; };

When the primary falls over when do secondary1, secondary2 and
secondary3 stop serving the zone?

secondary1 <expire>
secondary2 <expire>
secondary3 <expire>*2

Senario 2:
A transfer graph with a loop between secondary1 and secondary2.

primary:
<no masters>
secondary1:
masters { primary; secondary2; };
secondary2:
masters { primary; secondary1; };
secondary3:
masters { secondary1; secondary2; };

When the primary falls over when do secondary1, secondary2 and
secondary3 stop serving the zone?

secondary1 never (as answers from secondary2 restart timer)
secondary2 never (as answers from secondary1 restart timer)
secondary3 never

Senario 3:
A transfer graph where you can identify peers.
"peers" are "masters" that don't reset expiry timer.

primary:
<no masters>
secondary1:
masters { primary;
peers { secondary2; };
secondary2:
masters { primary; };
peers { secondary1; };
secondary3:
masters { secondary1; secondary2; };

When the primary falls over when do secondary1, secondary2 and
secondary3 stop serving the zone?

secondary1 <expire>
secondary2 <expire>
secondary3 <expire>*2

Senario 4:
A transfer graph with loops using the EDNS EXPIRE option
from draft-andrews-dnsext-expire-00.txt.
primary:
<no masters>
secondary1:
masters { primary; secondary2; };
secondary2:
masters { primary; secondary1; };
secondary3:
masters { secondary1; secondary2; };

When the primary falls over when do secondary1, secondary2 and
secondary3 stop serving the zone?

secondary1 <expire>
secondary2 <expire>
secondary3 <expire>

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org

--

Peter Koch

unread,

Sep 16, 2008, 5:21:39 AM9/16/08

to

Hi Mark,

> Peter, look at the following two common senarios (1 & 2) and the
> two solution senarios (3 & 4). Kevin is worried about senario 2.
> I'm worried about both senarios 1 and 2.

I think I understand the observation, both that sometimes "expiry" doesn't
take effect and that it might take multiple "expire" intervals. However,
I'm not convinced that the proposed solution goes into the right direction
or isn't worse than the observation (not to name it a problem).
First, the SOA timer semantics have always been between master and slave,
now redefining the expire timer to always refer to the primary master is
a major step.
Second, what are the expectations of a "controlled expire"? Is the slave
supposed to permanently deconfigure the zone? How would it be brought back to
life? We both know that people choose all kinds of interesting expire values
and with today's interpretation, they can fix stuff and slaves will sync in
again.
Third, part of which is described in <draft-koch-dns-unsolicited-queries-02.txt>,
currently expired (no pun): Even if a "controlled expire" would remove the
respective statements from a slave's configuration, you'd not be able to get
rid of the (lame) delegation this way. Even worse, while the effects Kevin
and you described keep old data around, they at least avoid 100% lame delegations,
which every now and then result in query storms.

So, a better description of the requirements would help clarify the problem
that a redefinition of the expire timer is a proposed solution to. And it
might turn out that the problem can't be solved in-band sufficiently.

-Peter

Mark Andrews

unread,

Sep 16, 2008, 8:13:48 PM9/16/08

to

In message <20080916092...@unknown.office.denic.de>, Peter Koch writes:
> Hi Mark,
>
> > Peter, look at the following two common senarios (1 & 2) and the
> > two solution senarios (3 & 4). Kevin is worried about senario 2.
> > I'm worried about both senarios 1 and 2.
>
> I think I understand the observation, both that sometimes "expiry" doesn't
> take effect and that it might take multiple "expire" intervals. However,
> I'm not convinced that the proposed solution goes into the right direction
> or isn't worse than the observation (not to name it a problem).
> First, the SOA timer semantics have always been between master and slave,
> now redefining the expire timer to always refer to the primary master is
> a major step.

Not really. The master dies. The slaves die after the
expire period has elapsed. It doesn't matter if the slave
is direct or indirect.

This makes the more complicated transfer graphs behave the
same as the simple star transfer graph.

> Second, what are the expectations of a "controlled expire"? Is the slave
> supposed to permanently deconfigure the zone? How would it be brought back
> to life?

The slave is supposed to forget about the current zone
*contents* and continue attempting to transfer a new copy of
the zone until it succeeds. This is exactly the same as
if the slave has just been configured for a zone.

This has not changed.

> We both know that people choose all kinds of interesting expire values
> and with today's interpretation, they can fix stuff and slaves will sync in
> again.

Not if they have expired. They should re-transfer.

> Third, part of which is described in <draft-koch-dns-unsolicited-queries-02.t
> xt>,
> currently expired (no pun): Even if a "controlled expire" would remove the
> respective statements from a slave's configuration, you'd not be able to get
> rid of the (lame) delegation this way.

Yes the zone would go lame. This is what is supposed to happen.

> Even worse, while the effects Kevin
> and you described keep old data around, they at least avoid 100% lame delegat
> ions,
> which every now and then result in query storms.

Only avoid them when there is a loop.

> So, a better description of the requirements would help clarify the problem
> that a redefinition of the expire timer is a proposed solution to. And it
> might turn out that the problem can't be solved in-band sufficiently.
>
> -Peter
>
> --
> to unsubscribe send a message to namedroppe...@ops.ietf.org with
> the word 'unsubscribe' in a single line as the message text body.
> archive: <http://ops.ietf.org/lists/namedroppers/>

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org

--