Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

20-year timewarps: an essay

125 views
Skip to first unread message

David L. Mills

unread,
Jan 1, 2002, 11:53:35 PM1/1/02
to
Folks,

On reflection on what may happen when folks get back to work tomorrow
(Wednesday). As for the folks on this list, whaddaya mean get back to
work? We never left.

One of my most serious goals 20 years ago when I designed what became
NTP was reliability, reliability, reliability. I considered it then and
consider it today more important than accuracy. Reliability in the face
of lies, cheats and even warps. But warps shouldn't happen in well
engineered NTP subnets with adequate redundancy, diversity and
cryptographic means.

Every critical server should have at least four sources, no two from the
same organization and, as much as possible, reachable only via diverse,
nonintersecting paths. The NTP Byzantine agreement algorithm ineeds four
sources so that, if one of them turns falsetick, the remaining three
truechimes can vote it out with no ties possible. Every critical
organization should run at least four low-stratum servers configured as
above, so dependent servers and clients can do the same thing. Each
critical server should run NTP symmetric mode (better yet manycast mode)
with each of the other servers at the same stratum, together with at
least one peer at the same stratume in another trusted organization. If
one of these servers loses all sources, the others can flip from time
taker to time giver. With manycast the flip is completely automatic and
without finicky configuration engineering. About a dozen CAIRN routers
are doing that right now. The reason for external servers at the same
stratum is to resist, or at least detect, when all the configuration
files have the same error or the latest download was botched or
infected.

Everything, I mean everything should be cryptographically protected,
preferably using public key means. I don't think there is any excuse not
to, since NTP has both symmetric and public key cryptography and the
cryptographic library is probably in every machine that runs ssh. While
taking in time should be strongly authenticated, giving out time should
not, since time is after all a public value and highly useful as a
redundant norm for other machines, even if a few givers lie.

A particularly scratchy issue has been how to treat significant time
disagreements, specifically time differences greater than the panic
threshold (1000 s), step threshold (128 ms) and stepout threshold (900
s), and whether to allow steps are not. A previous post to this list
mentioned the NTS-100 warper taking over clients one at a time as the
number of warpees reached majority in each configuration. That should
never have happened unless the panic threshold was tinkered or disabled
(set to zero). Any NTP daemon that was asked to warp the time more than
panic should have unsprung and exited.

Did I understand correctly the post that clocks were slowly drifting to
Valhalla? That would require step corrections to be disabled. You really
have to stop and think about this. It is physical reality that, if a
clock is at panic (or whatever value you accept) and the maximum slew
rate is limited go 500 PPM (or whatever value you accept), but the clock
is not allowed to step, it will take most of a day (or whatever interval
you compute) to reach consensus. During that interval the clock is not
synchronized and its time cannot be believed by others on the net. The
intent here is that you get to tinker those values to fit your beliefs,
whether or not that turns out to be good enginering.

This might scare you a bit, but one of the reasons for going to floating
doubles in NTPv4 was to avoid the kind of hazard that existes in NTPv2
and NTPv3 to the present, where certain large time errors are not
detected by the clock filter algorithm. The likelihood of this happening
is very small, but it did happen once when the clock counter of a
distant ancestor of the NTS-100 stuck a hot bit and the apparent time
was warped several years.

This is getting long, but I do it only once per year and this is a good
day.

On the NTS-100 issue. Here's the horror. So far as I know, there are
very few makers of GPS cores used in OEM equipment, including Motorola
Oncore, Trimble Palisade and Magnavox MX4200. Apologies if I've
misrepresented one or five. So, if the same core is used in several
end-user products, we can expect a massive conspiracy should somebody at
the OEM end get the sliderule on backwards. (sliderule?) It's probably a
good bet that, if TrueTime used an Oncore, then they would use it in all
of their products. Supporting this supposition is the report that a
TrueTime XL/DC also warped, but I found another one in Australia with
defunct NTP daemon. Maybe it warped and the daemon did what it was
supposed to do. I found another TrueTime GPS-VME Ozzie without warp, but
with cesium primary and TrueTime backup. It isn't running the atom
driver, so I can't tell if the TrueTime faded and the atoms rule. I have
no other TrueTime products here, but I do have a raft of Spectracom GPS
(and WWVB) receivers. None of them freaked beyond the usual jitter and
wander.

Anybody got a Trimble Palisade? I have one here, but it isn't hooked up
at the moment. There were a couple of Magnapox MX4200s cooking on the
net, but I don't know where they are now. All my Austgon GPS receivers
became goop long ago. How 'bout a Garmin? Found one in Czech Republic,
no warp.

Dave sends

M.C. van den Bovenkamp

unread,
Jan 2, 2002, 2:41:43 AM1/2/02
to
"David L. Mills" wrote:

> good bet that, if TrueTime used an Oncore, then they would use it in all
> of their products. Supporting this supposition is the report that a
> TrueTime XL/DC also warped, but I found another one in Australia with
> defunct NTP daemon. Maybe it warped and the daemon did what it was
> supposed to do. I found another TrueTime GPS-VME Ozzie without warp, but
> with cesium primary and TrueTime backup. It isn't running the atom
> driver, so I can't tell if the TrueTime faded and the atoms rule. I have
> no other TrueTime products here, but I do have a raft of Spectracom GPS
> (and WWVB) receivers. None of them freaked beyond the usual jitter and
> wander.

My Oncore M12 (an M12 Starter Kit with version 1.3 firmware): not a
burp.

Regards,

Marco.

Bohdan Tashchuk

unread,
Jan 2, 2002, 3:52:40 AM1/2/02
to
"David L. Mills" wrote:
>
> I have
> no other TrueTime products here, but I do have a raft of Spectracom GPS
> (and WWVB) receivers. None of them freaked beyond the usual jitter and
> wander.

For what it's worth, TrueTime has a public stratum 1 server that I sync
to. Either it didn't hiccup at all, or it was fixed by 11:50 AM PST on
1/1/02.

David L. Mills

unread,
Jan 2, 2002, 8:57:49 AM1/2/02
to
Folks,

I should have pointed out in my essay that, with respect to multiple
independent and mutually redundant sources, that the principle applies
to external sources, such as GPS receivers, as well. Case in point: on
my recommendation some time ago, GTE installed three master servers in
three different regions of the country. Each server consists of two
mutually redundant HP machines, each one connected to two receivers -
one a WWVB the other a GPS - for a total of six machines and six radios.
The six HP machines each runs NTP symmetric mode with the others,
although not with sanity sources outside GTE as I suggested. There are
dozens of secondary servers and a total, they tell me, of about 30,000
workstations, PCs and other gear.

Along with a NTS-100 and NTS-200, we run five mutually redundant primary
servers connected to two GPS and two WWVB receivers, with several
miscellaneous GPS, WWVB, WWV, CHU, IRIG and three cesium oscillators for
sanity and experiments. The various radios argue with each other all the
time, and some lose these arguments, especially the 20-year old WWVB
radios. See www.eecis.udel.edu/~mills/lab.htm for the Awful Truth.

Twentysomething years ago the entire Internet was timed by a dinky Heath
WWV receiver, no longer manufactured, but still ticking here. Once in a
while it warped a few seconds one way or another. We got used to it. On
occasion some early ancestor of NTP was deployed with brutally cloned
bugs in every IP gateway. We got used to that, but learned the lessons.
My engineering attitude is conditioned on this early antiquity.

Dave

0 new messages