Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

is there a way to "lock" the drift frequency

88 views
Skip to first unread message

wayne

unread,
Nov 14, 2003, 3:01:56 PM11/14/03
to

Is there a way to "lock" down the pll frequency to some value around
what is in the drift file?

I have found that on my two systems, any time the frequency wanders
too far from a known good value, that it is a sign of a problem and it
recovers *much* quicker if I simply kill the daemon and restart it
than to let ntpd work the problem out by itself.

I am using ntp 4.1.1b and 4.1.2a with Debian Linux.


The two most common causes of this are when my ADSL line gets
saturated for a long period of time (usually due to people trying to
download my entire 2GB website), or something to do with APM and lost
interrupts on my laptop. In the case of my ADSL line being saturated,
the computer's clock is working just fine, but the huge increases in
the delays appear to cause ntp to think that it is off. In the case
of my laptop and the dropped interrupts, the clock is off by
100-500ms, but the network is fine. In neither case is the frequency
the cause of the problem. In both cases, ntp will adjust the
frequency, the clock will be "corrected", but by the time the clock is
correct again, the frequency is so far off that ntp overshoots and the
frequency ends up oscillating for a long time before it settles down
again. During this time, the offset from true time is much larger
than normal.


I have actually written up some scripts that run from cron that check
the drift file. If frequency is off by "too much", it tries using
ntpdate to see if it can get a consistent time from the network. If
it can, and the clock is off by "too much", it kills the server and
restarts it. I have another quick script that goes through the
loopstats file, finds the times when the systems seems to be running
well, and finds the median frequency from those periods. So, yes, as
the seasons change, the frequency needs to be adjusted for
temperature, but, really, the change isn't that huge.


-wayne

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

Terje Mathisen

unread,
Nov 14, 2003, 4:57:23 PM11/14/03
to
wayne wrote:

>
> Is there a way to "lock" down the pll frequency to some value around
> what is in the drift file?
>
> I have found that on my two systems, any time the frequency wanders
> too far from a known good value, that it is a sign of a problem and it
> recovers *much* quicker if I simply kill the daemon and restart it
> than to let ntpd work the problem out by itself.

tinker huffpuff 36000

would allow ntpd to 'coast' past up to 10 hours of excessive one-way load.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

wayne

unread,
Nov 14, 2003, 9:03:22 PM11/14/03
to
In <bp3j44$aut$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:

> tinker huffpuff 36000
>
> would allow ntpd to 'coast' past up to 10 hours of excessive one-way load.

Thanks for the info. I've been using "tinkier huffpuff 7200", and
that seems to help, but it doesn't seem to cure the problem. It
certainly doesn't fix the problem with my laptop's APM problem.

Terje Mathisen

unread,
Nov 15, 2003, 5:25:21 AM11/15/03
to
wayne wrote:

> In <bp3j44$aut$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:
>
>
>>tinker huffpuff 36000
>>
>>would allow ntpd to 'coast' past up to 10 hours of excessive one-way load.
>
>
> Thanks for the info. I've been using "tinkier huffpuff 7200", and
> that seems to help, but it doesn't seem to cure the problem. It
> certainly doesn't fix the problem with my laptop's APM problem.

:-)

Coasting past periods of widely variable basic cpu frequency is reather
hard, I believe you have to either disable APM, or live with a sawtooth
clock performance.

wayne

unread,
Nov 16, 2003, 7:09:16 AM11/16/03
to
In <bp4uuh$2mt$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:

> Coasting past periods of widely variable basic cpu frequency is
> reather hard, I believe you have to either disable APM, or live with a
> sawtooth clock performance.

I agree that ntp simply can't correct for APM mucking up the cpu
clock, however, having ntp react by changing the frequency just
doesn't work very well.

Over the last couple of years of watching ntp, I *know* that when
everything is running fine, the frequency is x +/- 5%. When ntp
either gets bad data from the net or bad data from the clock, it will
often try changing the frequency by more than 100% and it can take
hours or days for things to settle back down. It seems wrong to need
another program running to watch ntp and reset it when it goes off
into lala land.

This is why I'm wondering if there is a way to "lock" the drift
frequency to always be within a certain range.

Terje Mathisen

unread,
Nov 16, 2003, 9:31:22 AM11/16/03
to
wayne wrote:

> In <bp4uuh$2mt$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:
>
>
>>Coasting past periods of widely variable basic cpu frequency is
>>reather hard, I believe you have to either disable APM, or live with a
>>sawtooth clock performance.
>
>
> I agree that ntp simply can't correct for APM mucking up the cpu
> clock, however, having ntp react by changing the frequency just
> doesn't work very well.
>
> Over the last couple of years of watching ntp, I *know* that when
> everything is running fine, the frequency is x +/- 5%. When ntp
> either gets bad data from the net or bad data from the clock, it will
> often try changing the frequency by more than 100% and it can take
> hours or days for things to settle back down. It seems wrong to need
> another program running to watch ntp and reset it when it goes off
> into lala land.
>
> This is why I'm wondering if there is a way to "lock" the drift
> frequency to always be within a certain range.

The capture range (== maximum frequency offset) used to be 100 ppm, it
is currently 500 ppm.

It would seem possible to patch & recompile to only allow some
relatively small range around the initial ntp.drift value.

wayne

unread,
Nov 16, 2003, 2:22:29 PM11/16/03
to
In <bp8i08$f3t$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:

>
> It would seem possible to patch & recompile to only allow some
> relatively small range around the initial ntp.drift value.

It is my understanding that there are a fair amount of subtle reasons
for the way ntp does the things it does. So, would such a patch be a
good idea?


Reading back through old posts, I saw a post by David Mills called
"Taming the pinball machine", which seems like it might be a related
problem. The conditions described in that post, unforatunately,
aren't quite the same as what I'm seeing and so I'm not sure if it is
appropriate to try upgrading to 4.2(?). Sadly, the topic seemed to
have quickly veered off on a tangent about ftp.

Bohdan Tashchuk

unread,
Nov 16, 2003, 4:48:59 PM11/16/03
to
wayne wrote:

> This is why I'm wondering if there is a way to "lock" the drift
> frequency to always be within a certain range.

I agree with you.

I don't have problems as severe as you do, but from years of observing
ntpd supplied with easch release of FreeBSD 3.x and 4.x (not sure about
versions) I think my system would be more stable with a limiting
algorithm that is a variation of your proposal. Something like:

only change drift frequency by 1 ppm max per hour

This would avoid the occasional positive feedback I see (about 3 or 4
times a year). When this occurs, the drift file changes something like this:

-13 starting value, very close to ideal for my hardware
0
-50
100
-250
....
ntp steps time,
sometimes this fixes oscillation, sometimes I need to reboot

Roy

unread,
Nov 16, 2003, 10:21:08 PM11/16/03
to
wayne wrote in message news:<x4r808y...@footbone.midwestcs.com>...

>
> Reading back through old posts, I saw a post by David Mills called
> "Taming the pinball machine", which seems like it might be a related
> problem. The conditions described in that post, unforatunately,
> aren't quite the same as what I'm seeing and so I'm not sure if it is
> appropriate to try upgrading to 4.2(?).

If you want to try it, just remember to get the tarball from November
5. The version on the downloads page is also version 4.2.0, but it is
from October 15 and doesn't have the new algorithm tweaks.

Have a great time,

roy
--
The suespammers.org mail server is located in California. Please do
not send unsolicited bulk e-mail or unsolicited commercial e-mail to
my suespammers.org address or any of my other addresses. These are my
opinions, not necessarily my employer's.

Terje Mathisen

unread,
Nov 17, 2003, 2:25:33 AM11/17/03
to
wayne wrote:

> In <bp8i08$f3t$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:
>
>
>>It would seem possible to patch & recompile to only allow some
>>relatively small range around the initial ntp.drift value.
>
>
> It is my understanding that there are a fair amount of subtle reasons
> for the way ntp does the things it does. So, would such a patch be a
> good idea?

No, you would definitely blow up quite often, in the form of
clock-stepping when regular slew operations should suffice.

I.e., I'm _not_ reccomending that you actually do this!

OTOH, you did claim that your clock _in reality_ was quite stable,
varying very little around your normal ntp.drift value, right?

Assuming this is correct, you could indeed make a "broken" ntpd
specifically for your machine, which clamped the slew rate.

John Sager

unread,
Nov 17, 2003, 5:46:57 AM11/17/03
to
In article <x4ptfut...@footbone.midwestcs.com>,

wayne <wa...@midwestcs.com> writes:
> In <bp3j44$aut$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:
>
>> tinker huffpuff 36000
>>
>> would allow ntpd to 'coast' past up to 10 hours of excessive one-way load.
>
> Thanks for the info. I've been using "tinkier huffpuff 7200", and
> that seems to help, but it doesn't seem to cure the problem. It
> certainly doesn't fix the problem with my laptop's APM problem.
>

A completely different way of solving the problem would be to
use the traffic shaping features of *BSD or Linux, but
that requires you to have all your ADSL traffic go through
the traffic shaping box. Essentially you set up three queues,
one for the web server traffic, one for the NTP queries and
one for everything else. You then set up the NTP queue as
the highest priority one, and the web traffic as lowest
priority. Since your web server is rate-limited by the upstream
rate (<< downstream rate), then your NTP queries should get good
service in both directions.

J

Tim Shoppa

unread,
Nov 17, 2003, 7:45:09 AM11/17/03
to
wayne <wa...@midwestcs.com> wrote in message news:<x4smkqv...@footbone.midwestcs.com>...

> Is there a way to "lock" down the pll frequency to some value around
> what is in the drift file?
>
> The two most common causes of this are when my ADSL line gets
> saturated for a long period of time (usually due to people trying to
> download my entire 2GB website)

For this case, a patch to NTP's "MAXDISPERSE" value might do some good.
(Actually, it's peerdelay/2 + peerdispersion that this is compared to.)
By lowering it you could cause your peers, when they don't respond
quickly (and you would set the threshold to be a chunk above "normal"
network load) to appear unreachable. Your machine will keep it's
nominal frequency and "drift" until the network load reduces and you
trust it again. When the delay comes back down and they are deemed
reachable, then you meet the problem found in the "pinball" thread, though.
(A case of "a hole in the bucket"!)

MAXDISPERSE is in include/ntp.h in the NTP source tree.

If you really don't like tweaking MAXDISPERSE you could add another
test based only on peerdelay in ntp_proto.c. It could be added near
test 8, in particular.

Tim.

wayne

unread,
Nov 17, 2003, 7:51:27 AM11/17/03
to
In <bp9t5e$775$1...@osl016lin.hda.hydro.com> Terje Mathisen <terje.m...@hda.hydro.com> writes:

> wayne wrote:
>
> Assuming this is correct, you could indeed make a "broken" ntpd
> specifically for your machine, which clamped the slew rate.


Ok, pardon my ignorance, but I thought the slew rate was something
that was only marginly related to the pll frequency.

In the case of my laptop and the poor APM/interrupt interaction, doing
a step (if the difference is large) or a high slew rate would be the
correct solution. In the case of my main server that sometimes gets
lagged by a full pipe, the correct solution is to ignore the NTP
packets and free run for a while. In neither case is the frequency of
the cpu clock significantly different so I it was my understanding
that the pll frequency would also stay the same.

Craig

unread,
Nov 17, 2003, 12:47:04 PM11/17/03
to
wayne <wa...@midwestcs.com> wrote in message
>
> I agree that ntp simply can't correct for APM mucking up the cpu
> clock, however, having ntp react by changing the frequency just
> doesn't work very well.
>
> Over the last couple of years of watching ntp, I *know* that when
> everything is running fine, the frequency is x +/- 5%. When ntp
> either gets bad data from the net or bad data from the clock, it will
> often try changing the frequency by more than 100% and it can take
> hours or days for things to settle back down. It seems wrong to need
> another program running to watch ntp and reset it when it goes off
> into lala land.
>
> This is why I'm wondering if there is a way to "lock" the drift
> frequency to always be within a certain range.

I'd also like to second this observation. I run NTP in a very
controlled lab-like environment where I have a direct connection to a
stratum 1 server, and a small class C GB-ethernet network. I'm mostly
running Linux, and nominally I see offsets under 1ms even during high
CPU/network loads.

What kills me is when the OS misses an interrupt (mostly due to a
kernel module that holds off interrupts for to long), and my offset
jumps upwards of 10ms or more (HZ=100). At this point NTP responds by
bumping the frequency up by 200% or more (-150 to 120ppm), and slowly
works it back down over an hour or so. This is with maxpoll==6,
burst, and using tinker step 0.008 & tinker stepout 8.

The step (if offset>8ms) occurs fairly soon, but the frequency still
takes a while to recover to the point that I'm back under 1ms of
offset. I too would like to have the option to heavily dampen the
frequency changes, and have NTP assume that large offsets are due to
errors on the local machine, and not the network or server. After
all, I'm talking to a stratum 1 server on a network that never
exhibits delays of more than 6ms average over any 8 samples (burst).

I'm currently using 4.1.1a.

Terje Mathisen

unread,
Nov 17, 2003, 1:11:25 PM11/17/03
to
Craig wrote:

'maxpoll = 6' means that you cannot get the automatic damping that's
inherent in the larger poll intervals.

What about 'tinker huffpuff xxx'?

This was created specifically to survive times of asymmetric network delays.

> I'm currently using 4.1.1a.

Why not 4.2.0?

This is the current stable version, it has behaved very well here.

Craig

unread,
Nov 17, 2003, 6:41:16 PM11/17/03
to
Terje Mathisen <terje.m...@hda.hydro.com> wrote in message news:<bpb30e$vad$1...@osl016lin.hda.hydro.com>...


It's a two edged sword. If I increase the maxpoll, it takes longer to
converge initially (the lab machines are up and down frequently), and
longer to re-converge after a missed interrupt. In my experience, it
also does not help much with the missed interrupt case.

I don't think the huffpuff filter is applicable, as I do not have
asymmetric network delays. Quite the opposite.

I do plan to move to 4.2, but nothing I've seen in the release notes
implies that the behavior I need was added. Am I wrong on this?

Over the years, I've expected someone to fork a version of NTP that
has been tweaked to satisfy the "small LAN" community, with no joy to
date. This hypothetical version of NTP (say Local NTP), would be
tailored for small switched networks characterized by fast/symmetric
delays, plenty of bandwidth, a directly connected high stratum server,
and strict requirements for fast initial convergence (assuming known
drift) and narrow offset margins (forward steps always acceptable).
Here, the server is always trusted (within reason), the network is
very reliable, and any errors or discontinuities are assumed to be
caused by the host (again within reason).

Still waiting for that someone (smarter than me),
Craig

Harlan Stenn

unread,
Nov 17, 2003, 8:16:05 PM11/17/03
to
Try the current bk repo code for either -stable or -dev.

I *think* the changes Dave Mills just made might help.

H

Hal Murray

unread,
Nov 18, 2003, 12:30:47 AM11/18/03
to
>Over the years, I've expected someone to fork a version of NTP that
>has been tweaked to satisfy the "small LAN" community, with no joy to
>date. This hypothetical version of NTP (say Local NTP), would be
>tailored for small switched networks characterized by fast/symmetric
>delays, plenty of bandwidth, a directly connected high stratum server,
>and strict requirements for fast initial convergence (assuming known
>drift) and narrow offset margins (forward steps always acceptable).
>Here, the server is always trusted (within reason), the network is
>very reliable, and any errors or discontinuities are assumed to be
>caused by the host (again within reason).
>
>Still waiting for that someone (smarter than me),

I'd put my effort in either of two other directions.

First would be fixing the lost interrupts on your system(s).

The other approach would be to use the cycle counter rather
than counting interrupts. Previous discussions indicate
that this gets hard in multi-processor systems. It might make
an interesting compile-time option for the kernel - safe to
use it if the SMP option is off.

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited


commercial e-mail to my suespammers.org address or any of my other addresses.

These are my opinions, not necessarily my employer's. I hate spam.

Terje Mathisen

unread,
Nov 18, 2003, 3:02:46 AM11/18/03
to
Craig wrote:
> It's a two edged sword. If I increase the maxpoll, it takes longer to
> converge initially (the lab machines are up and down frequently), and
> longer to re-converge after a missed interrupt. In my experience, it
> also does not help much with the missed interrupt case.

If you actually have lost timer ticks, then you need to fix this first,
instead of relying on NTP to patch your clock for you:

Specifically, you should look into (assuming open source os) methods to
detect said lost interrupts, most probably by looking at any available
separate hw clock, like the RDTSC counter on x86.

>
> I don't think the huffpuff filter is applicable, as I do not have
> asymmetric network delays. Quite the opposite.

You can't have it both ways:

Either you have some kind of asymmetry (not neccessarily in the network
itself!), or all ntp exchanges will be correct, right?

If the NTP packets are OK, then a sudden jump in offset would indeed be
a clear indication of a dropped timer tick.

> Over the years, I've expected someone to fork a version of NTP that
> has been tweaked to satisfy the "small LAN" community, with no joy to
> date. This hypothetical version of NTP (say Local NTP), would be
> tailored for small switched networks characterized by fast/symmetric
> delays, plenty of bandwidth, a directly connected high stratum server,
> and strict requirements for fast initial convergence (assuming known
> drift) and narrow offset margins (forward steps always acceptable).
> Here, the server is always trusted (within reason), the network is
> very reliable, and any errors or discontinuities are assumed to be
> caused by the host (again within reason).

So basically you want a custom ntpd that quickly 'fixes' any lost timer
ticks, instead of having an OS (+ hw & drivers) that doesn't suffer from
the problem in the first case, right?

Craig

unread,
Nov 18, 2003, 11:45:35 AM11/18/03
to
hmu...@suespammers.org (Hal Murray) wrote in message news:<vrjbk7...@corp.supernews.com>...

> >Over the years, I've expected someone to fork a version of NTP that
> >has been tweaked to satisfy the "small LAN" community, with no joy to
> >date. This hypothetical version of NTP (say Local NTP), would be
> >tailored for small switched networks characterized by fast/symmetric
> >delays, plenty of bandwidth, a directly connected high stratum server,
> >and strict requirements for fast initial convergence (assuming known
> >drift) and narrow offset margins (forward steps always acceptable).
> >Here, the server is always trusted (within reason), the network is
> >very reliable, and any errors or discontinuities are assumed to be
> >caused by the host (again within reason).
> >
> >Still waiting for that someone (smarter than me),
>
> I'd put my effort in either of two other directions.
>
> First would be fixing the lost interrupts on your system(s).

Well yes of course this would be ideal, but is simply impractical. I
run many types of machines, with many types of processors, running
many types of OS's. They *all* exhibit this behavior (missed
interrupts).

I have the source for some of these OS's, but not most. For those I do
have source for, the problems can sometimes be traced to vendor
supplied object-only modules for a boards I must use. Until the recent
versions of Linux w/ low-latency & preemptable patches, there were
many places where interrupts were held off for too long (there are
still a few). So even when it is theoretically possible to fix the
missed interrupt (i.e. I have the source), it is extremely difficult
to track these things down much less fix them. I'm not about to start
mucking around in the Linux virtual memory code, nor do I have the
skill/time to fix every potential kernel module in the system.

Even if I did fix all the problems, now I have to maintain my own
kernel & modules, and distribute them along with my app's. For my
Sun's, Alpha's, SGI's, and PowerPC boxes, I'm just out of luck.

I feel that this is what NTP was developed to do. All these machines
have horribly drifting RTC clocks feeding huge monolithic kernels
running dozens of apps interfacing with a myriad of hardware. NTP sits
quietly in the background keeping things sane. It does 95% of what I
need it to do, with a slightly different mindset it would be perfect.

>
> The other approach would be to use the cycle counter rather
> than counting interrupts. Previous discussions indicate
> that this gets hard in multi-processor systems. It might make
> an interesting compile-time option for the kernel - safe to
> use it if the SMP option is off.

Now your talking! I've played with the nanokernel in the PPSkit for
i386 Linux that does exactly this. Unfortunately, just as you said, it
doesn't work for SMP which is what all my Linux boxes are. I also
have not been able to patch in both this and the High-Res Timers patch
at the same time (which I need). I'll revisit this when 2.6 is out, as
the HRT patch is integrated. I'm still out of luck for all my other
systems w/o source & w/o TSC's.

Craig

Hal Murray

unread,
Nov 18, 2003, 9:41:41 PM11/18/03
to
>Well yes of course this would be ideal, but is simply impractical. I
>run many types of machines, with many types of processors, running
>many types of OS's. They *all* exhibit this behavior (missed
>interrupts).

I'll take your word for it, but I don't see lost interrupts
discussed here very often. Do you also have problems with
lost characters on serial ports? (and similar things)

What sort of IO gear are you using?

My experience is that I rarely see lost interrupts. I used to
have a recipe to provoke them. I forget what system that was.
I think it involved a lot of disk activity with DMA turned off.


I wonder if it would be reasonable to hack together a lost
interrupt fixer-upper. The idea is to have some code (could
be user level) that compares the cycle counter with the time of
day, and calls the kernel to generate a fake/extra timer interrupt
when it notices one got lost.

wayne

unread,
Nov 19, 2003, 8:02:59 PM11/19/03
to

> I'll take your word for it, but I don't see lost interrupts
> discussed here very often. Do you also have problems with
> lost characters on serial ports? (and similar things)

Lost timer interrupts will normally only cause small blips on the ntp
log. With a HZ value of 100, a jump of 10ms on the clock isn't going
to be very noticeable.

Timer interrupts happen *a lot*. Very few other devices will generate
100 interrupts per second, every second, all day long. If even a very
small percentage of them get lost, you are still going to prevent ntpd
from keeping as good a time as it could.

Timer interrupts have no buffering to fall back on. A UART generates
an interrupt when there are still several characters free in the
buffer, so a lag of 10-20ms is no big deal. Most other I/O devices
are similarly buffered.

Hal Murray

unread,
Nov 19, 2003, 8:35:31 PM11/19/03
to
>Timer interrupts have no buffering to fall back on. A UART generates
>an interrupt when there are still several characters free in the
>buffer, so a lag of 10-20ms is no big deal. Most other I/O devices
>are similarly buffered.

Lost timer interrupts mean that some chunk of code has disabled interrupts
for 10 ms. The people I hang out with would call that broken. (and fix
it if they could) It's probably possible to build a system where that makes
sense, but it generally breaks various things, like time keeping and other
IO gear.

As for buffering on RS232 links... It depends upon what speed you are
running at. 115K bits/second is common for talking to development
systems. That's 11K bytes per second or 115 bytes in 10 ms. Most
FIFOs are not that big.

Frederick Bruckman

unread,
Nov 20, 2003, 9:55:00 PM11/20/03
to
In article <bec993c8.03111...@posting.google.com>,

Those both sound like good ideas. I've been running with a hack that
filters packets greater than a given delay, and it works really well
for the nailed-up ISDN connected host. I've been doing it later,
though, where the popcorn spikes are filtered. That has the drawback
that ntp thinks the clock is still synced to something, even when it
really isn't. I think I will try the MAXDISPERSE thing.

"huff-puff" doesn't cut it -- the delays aren't perfectly asymetric,
so what happens, is, ntp eventually steps the wrong way, and then later
steps back.

I suspect capping the DISPERSE/DELAY would also help intermittently
connected dial-up hosts, though I haven't tested much. What I have
observed, is several servers delivering packets with seconds-long
delays right after connecting. You don't even need a static IP to
see it, just a NAT router/IPv6 tunnel host running a persistent pppd.

--
Frederick

Tim Shoppa

unread,
Nov 21, 2003, 8:46:58 AM11/21/03
to
fr...@immanent.net (Frederick Bruckman) wrote in message news:<6t6dneNTQLs...@ripco.com>...

Remember, if the network delays keep up for long periods of time
(more than a few ppm frequency error for a full day will qualify) then
you're likely going to need to step when the delays come back down and
the servers become reachable again, even with a tweaked MAXDISPERSE.
Although I agree that one step in the right direction seems better
than one wrong followed by one right.

Tim.

Frederick Bruckman

unread,
Nov 21, 2003, 10:56:50 AM11/21/03
to
In article <bec993c8.0311...@posting.google.com>,

I've found that by simply filtering the packets with a delay greater
than 250ms, there won't be any stepping for at least a day, and then
when the network finally normalizes, it's still close enough that ntpd
doesn't need to step. That's way better than the result with huff-puff.

If long downloads go a couple days, though, it bounces all over the
place, again. I think it might do better if the filtering is early
enough so that the servers are declared unsynchronized, so ntpd can
just get out of the way until the network normalizes. I see that
MAXDISPERSE defaults to "16.". What scale is that in? To get the
equivalant of "delay < 250", I would have to reduce MAXDISPERSE to
less than "1."?

Frederick

Ulrich Windl

unread,
Nov 25, 2003, 4:49:44 AM11/25/03
to
wayne <wa...@midwestcs.com> writes:

> Is there a way to "lock" down the pll frequency to some value around
> what is in the drift file?

Indirectly by using a high value for "minpoll", but see the
documentation on that and the FAQ maybe first. Large polling intervals
will "limit the slope" of the frequency correction AFAIK.

Regards,
Ulrich

Frederick Bruckman

unread,
Dec 6, 2003, 9:56:21 AM12/6/03
to
> wayne <wa...@midwestcs.com> wrote in message news:<x4smkqv...@footbone.midwestcs.com>...
>> Is there a way to "lock" down the pll frequency to some value around
>> what is in the drift file?
>>
>> The two most common causes of this are when my ADSL line gets
>> saturated for a long period of time (usually due to people trying to
>> download my entire 2GB website)
>
> For this case, a patch to NTP's "MAXDISPERSE" value might do some good.
> (Actually, it's peerdelay/2 + peerdispersion that this is compared to.)
> By lowering it you could cause your peers, when they don't respond
> quickly (and you would set the threshold to be a chunk above "normal"
> network load) to appear unreachable. Your machine will keep it's
> nominal frequency and "drift" until the network load reduces and you
> trust it again. When the delay comes back down and they are deemed
> reachable, then you meet the problem found in the "pinball" thread, though.
> (A case of "a hole in the bucket"!)
>
> MAXDISPERSE is in include/ntp.h in the NTP source tree.

I'm running with both MAXDELAY lowered to "0.5" from "1.", and MAXDISPERSE
lowered to "1." from "16.", and it turns out that that works *really* well.

With ntp-dev 1.1175, the dialup stepped once Monday after being disconnected
most of the previous weekend, but only once, and then not again all week.
Today, after being connected for less than an hour, it looks like this:

# ntpq -c rv -c pe
assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
version="ntpd 4....@1.1175-r Sat Nov 29 08:53:26 CST 2003 (13)",
processor="i386", system="NetBSD/1.6ZF", leap=00, stratum=3,
precision=-18, rootdelay=210.157, rootdispersion=231.375, peer=20620,
refid=130.126.24.53,
reftime=c37c6158.a51fa333 Sat, Dec 6 2003 8:08:24.645, poll=10,
clock=c37c63de.f4653005 Sat, Dec 6 2003 8:19:10.954, state=4,
offset=-16.405, frequency=0.331, jitter=37.356, stability=0.053
remote refid st t when poll reach delay offset jitter
==============================================================================
*ntp-0.gw.uiuc.e 128.174.38.133 2 u 646 1024 7 193.754 -20.445 4.237
-caesar.cs.wisc. 128.105.201.11 2 u 125 1024 7 157.764 -7.136 1.425
+triangle.kansas 128.252.19.1 2 u 131 1024 7 216.851 -16.418 0.523
-ntp.immanent.ne 128.105.37.11 3 u 622 1024 7 194.389 -3.349 14.708
+ntp1.jrc.us 128.227.205.3 2 u 609 1024 7 183.259 -8.725 1.254

(There's a NAT/firewall in between itself and the internet, so it doesn't have
to deal with it's own IP address changing.) If today is like the last couple of
days, the root dispersion will continue to fall throughout the day, the offsets
will improve a little, but the frequency will stay about where it is now. It's
"ntp.conf" has "tos minclock 3 minsane 3", 3 public servers, my other host, and
one instance of us.pool.ntp.org ("best of five"), with "burst iburst" only to
my home computer. Interestingly, the poll interval almost never falls after the
first day. I calculate this to be less of a burden on the public resources than
if I'd used one server, but with the poll interval frequently going to "64".

I'm getting similar results with the congested ISDN router, at home.

With both, it seems the MAXDELAY drops the server at the first sign of trouble,
while the MAXDISPERSE keeps the first quasi-good packet from destabilizing. On
the ISDN router, I tried MAXDISPERSION of 1.5, but it only took a few hours of
heavy downloading before the frequency started stepping back and forth.

It seems to me, that having MAXDISPERSE of "16." does absolutely nothing
whatsoever. Can anyone recall why the limit is set so high?

--
Frederick

Tim Shoppa

unread,
Dec 7, 2003, 8:25:24 AM12/7/03
to
fr...@immanent.net (Frederick Bruckman) wrote in message news:<3fd1e...@corp.newsgroups.com>...

> It seems to me, that having MAXDISPERSE of "16." does absolutely nothing
> whatsoever. Can anyone recall why the limit is set so high?

Note that MAXDISPERSE isn't *just* a check on each packet; it's also used as
an input to the loop filter for missing data. I feel a little guilty for
recommending that you change it, when in retrospect maybe adding a new test
with a new variable (probably tunable in ntp.conf and with ntpdc) would be more
reasonable. But maybe it is appropriate, if you set MAXDISPERSE to a value
close to a second, to use that as the error bound on missing data? Dave's
made enough comments about parameters tuned to Allan variances of PC clocks
that I don't want to venture that far!

The 16 sec value is probably useful as a sanity check against misbehaving
servers and/or networks which occasionally send packets to deep space
and back... you will note that 16 sec is plenty to get data back and forth
to the moon!

Tim.

Frederick Bruckman

unread,
Dec 7, 2003, 2:33:25 PM12/7/03
to
In article <bec993c8.03120...@posting.google.com>,

sho...@trailing-edge.com (Tim Shoppa) writes:
> fr...@immanent.net (Frederick Bruckman) wrote in message news:<3fd1e...@corp.newsgroups.com>...
>> It seems to me, that having MAXDISPERSE of "16." does absolutely nothing
>> whatsoever. Can anyone recall why the limit is set so high?
>
> Note that MAXDISPERSE isn't *just* a check on each packet; it's also used as
> an input to the loop filter for missing data. I feel a little guilty for
> recommending that you change it, when in retrospect maybe adding a new test
> with a new variable (probably tunable in ntp.conf and with ntpdc) would be more
> reasonable. But maybe it is appropriate, if you set MAXDISPERSE to a value
> close to a second, to use that as the error bound on missing data? Dave's
> made enough comments about parameters tuned to Allan variances of PC clocks
> that I don't want to venture that far!

All I can say is, it works. The wacky frequency jumps and wild steps are gone.
The offset on the dialup can still be pulled off track quite a bit, but that
corrects itself in a short time, it's not cumulative, and it doesn't cause the
system to oscillate.



> The 16 sec value is probably useful as a sanity check against misbehaving
> servers and/or networks which occasionally send packets to deep space
> and back... you will note that 16 sec is plenty to get data back and forth
> to the moon!

You're kidding of course, but if we were really trying to synch up a client
in "deep space", different rules apply: much of the round-trip delay would
be fixed, and easily calclulated, and we could subtract it before feeding
the samples to the filter.

--
Frederick

Hal Murray

unread,
Dec 8, 2003, 3:59:43 AM12/8/03
to
>You're kidding of course, but if we were really trying to synch up a client
>in "deep space", different rules apply: much of the round-trip delay would
>be fixed, and easily calclulated, and we could subtract it before feeding
>the samples to the filter.

Isn't that delay symmetric? Won't NTP do the right thing?

I'd be more interested in relativity. If I have 2 good atomic clocks,
one on Earth and the other on the moon or Mars, will they get the same
answer? If they are in sync at one point, will they stay in sync?

Isn't relativity a significant correction term for GPS?

Terje Mathisen

unread,
Dec 8, 2003, 8:49:00 AM12/8/03
to
Hal Murray wrote:

>>You're kidding of course, but if we were really trying to synch up a client
>>in "deep space", different rules apply: much of the round-trip delay would
>>be fixed, and easily calclulated, and we could subtract it before feeding
>>the samples to the filter.
>
>
> Isn't that delay symmetric? Won't NTP do the right thing?
>
> I'd be more interested in relativity. If I have 2 good atomic clocks,
> one on Earth and the other on the moon or Mars, will they get the same
> answer? If they are in sync at one point, will they stay in sync?
>
> Isn't relativity a significant correction term for GPS?

Yes: 4 us/day difference between a sat clock and one at sea level.

I.e. the sat clocks are tuned before launch to be off by those 4 us, so
that they will be correct while in orbit.

Frederick Bruckman

unread,
Dec 8, 2003, 10:25:03 AM12/8/03
to
In article <vt8fbv5...@corp.supernews.com>,

hmu...@suespammers.org (Hal Murray) writes:
>>You're kidding of course, but if we were really trying to synch up a client
>>in "deep space", different rules apply: much of the round-trip delay would
>>be fixed, and easily calclulated, and we could subtract it before feeding
>>the samples to the filter.
>
> Isn't that delay symmetric? Won't NTP do the right thing?

Not necessarily. Consider one of those ISP's that delivers downloads through
a satellite, and uploads through the phone line. Besides, what I'm suggesting
is that we filter out the samples with the largest delays and dispersions,
in order to fix currently marginal cases, like routine dialups. This will
flat out break the sat-connected dial-ups, NTP to Mars, and other cases
where NTP doesn't work anyway; all I'm saying is there are others ways to
deal with those cases, if there's a motivation to do so.

--
Frederick

Tim Shoppa

unread,
Dec 8, 2003, 11:03:05 AM12/8/03
to
hmu...@suespammers.org (Hal Murray) wrote in message news:<vt8fbv5...@corp.supernews.com>...

> I'd be more interested in relativity. If I have 2 good atomic clocks,
> one on Earth and the other on the moon or Mars, will they get the same
> answer? If they are in sync at one point, will they stay in sync?

They will not stay in sync, due to general relativity. For an
around-the-world flight on an airplane the GR correction is roughly
100ns (and it depends on which direction you go). I don't know off
the top of my head what the correction would be for the moon, but I think
it would be many times greater.

Tim.

Wolfgang S. Rupprecht

unread,
Dec 8, 2003, 11:31:35 AM12/8/03
to

hmu...@suespammers.org (Hal Murray) writes:
>>You're kidding of course, but if we were really trying to synch up a client
>>in "deep space", different rules apply: much of the round-trip delay would
>>be fixed, and easily calclulated, and we could subtract it before feeding
>>the samples to the filter.
>
> Isn't that delay symmetric? Won't NTP do the right thing?

Not that it really matters since speeds in space aren't that fast
compared to the speed of light, but if we want to look at the nitty
gritty then we have to look at what happened during the sometimes
significant delay between sending the initial query packet and getting
the reply. For planetary probes that delay could be hours or days.
In that time the remote object could have moved a bit. The paths for
first packet might be a bit shorter than the path for the return
packet.

Now a doppler-shift aware ntpd could take that into account... It
could even count the beat wave's and know exactly how many wavelengths
the object moved away... I wonder if there is a bit of NASA funding
in there for ntpd.

-wolfgang
--
Wolfgang S. Rupprecht http://www.wsrcc.com/wolfgang/
The above "From:" address is valid. Don't mess with it.

Tim Shoppa

unread,
Dec 8, 2003, 1:46:36 PM12/8/03
to
Terje Mathisen <terje.m...@hda.hydro.com> wrote in message news:<br1vgc$5lg$2...@osl016lin.hda.hydro.com>...

>
> Yes: 4 us/day difference between a sat clock and one at sea level.
>
> I.e. the sat clocks are tuned before launch to be off by those 4 us, so
> that they will be correct while in orbit.

At least on the Block II GPS sats, the correction is tunable (via
a synthesizer on the satellite) after launch. Ashby's indicates that
this was done because of some concern that the GR predictions wouldn't
be correct:


At the time of launch of the first NTS-2 satellite
(June 1977), which contained the first Cesium clock
to be placed in orbit,
there were some who doubted that
relativistic effects were real. A frequency
synthesizer was built into the satellite clock
system so that after launch, if in fact the
rate of the clock in its final orbit was that
predicted by GR, then the synthesizer could
be turned on bringing the clock to the
coordinate rate necessary for operation. The
atomic clock was first operated for about 20
days to measure its clock rate before turning
on the synthesizer. The frequency measured
during that interval was +442.5 parts in
10^12 faster than clocks on the ground; if
left uncorrected this would have resulted in
timing errors of about 38,000 nanoseconds
per day. The difference between predicted
and measured values of the frequency shift
was only 3.97 parts in 10^12, well within
the accuracy capabilities of the orbiting
clock. This then gave about a 1% validation
of the combined motional and gravitational
shifts for a clock at 4.2 earth radii [the
radius of the satellite's orbit].

Rest of paper is at
http://xxx.lanl.gov/abs/gr-qc/9702010

If I were program manager of the (then largely military) GPS program,
I'm not sure I'd trust a bunch of freaky long-haired GR theorists on such
matters either :-)

Tim.

David L. Mills

unread,
Dec 8, 2003, 7:30:45 PM12/8/03
to
Wolfgang,

Funny you should ask. We haven't yet had NTP running on Mars for real,
but we have had it running in simulation on that and other paths. See
the NTP project page and punch up "NTP on the Interplanetary Internet".
Grad student Harish Nair worked out the equations and simulation, the
most ambitious was an Earth-Moon-Mars-Mars satellite using cascaded
Keplerian elements.

Relativistic effects are neglibible, only a few milliseconds corrected
to the solar system barycenter. Trick is to translate everything to the
barycenter and put a correction module between the antenna and ntpd and
teach the module to feed on either sphemeris data calculated from
Chebychev polynomials or Keplerian elements. Clock discipline now
requires an intricate iterative procedure which simultaneously
determines the position, velocity and time. The residuals are in the
noise.

The last Shuttle flight carried an NTP experiment and NASA was kind
enough to give the data to me for analysis. However, it is full of
glitches, but the crew didn't have time to work our what broke. Messages
exchanged with the crew suggested the laptop it was running on had a
very unstable clock oscillator, probably due to sleep/snooze. The
resolution of the clock oscillator was too coarse to hunt for
relativistic effects anyway.

Dave

0 new messages