Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Win7: ntpd adjusting time backwards

376 views
Skip to first unread message

Jeroen Mostert

unread,
Dec 8, 2012, 2:52:45 PM12/8/12
to
If my event log is to be believed, ntpd is adjusting the clock to times in the
past (with pretty big intervals):

Log Name: System
Source: Microsoft-Windows-Kernel-General
Date: 2012-12-08 15:18:52
Event ID: 1
Task Category: None
Level: Information
Keywords: Time
User: PORTIA\ntp
Computer: PORTIA
Description:
The system time has changed to ‎2012‎-‎12‎-‎08T14:18:52.347000000Z from
‎2012‎-‎12‎-‎08T14:18:52.569674500Z.

As I understand it, it's not supposed to be doing this, instead it should slow
the clock down.

Event log entries from NTP (using the Meinberg install on Win7 64-bit):

ntpd 4.2...@1.2349-o Jul 30 11:55:08 (UTC+02:00) 2012 (2)
Raised to realtime priority class
MM timer resolution: 1..1000000 msec, set to 1 msec
Performance counter frequency 3.215 MHz
Clock interrupt period 15.600 msec (startup slew 0.2 usec/period)
Windows clock precision 1.000 msec, min. slew 6.410 ppm/s
using Windows clock directly
proto: precision = 1000.100 usec

This is just a client machine syncing with NTP pool machines, no PPS.

If my Googling indicates anything, those last two lines might indicate a problem
since NTP is supposed to be using interpolation and it doesn't. There's also
hints that the crazy huge precision value indicates a problem with a driver.
However, I've checked two other machines and they log the same thing, so maybe
this is normal.

I've tried 4.2.7p310 binaries as well, but they log nearly the same thing:

ntpd 4.2.7p310-o Oct 09 17:56:01.10 (UTC-00:00) 2012 (1): Starting
Raised to realtime priority class
Clock interrupt period 15.600 msec (startup slew -0.3 usec/period)
Performance counter frequency 3.215 MHz
MM timer resolution: 1..1000000 msec, set to 1 msec
Windows clock precision 1.000 msec, min. slew 6.410 ppm/s
using Windows clock directly
proto: precision = 1000.000 usec (-10)
proto: fuzz beneath 0.201 usec

The clock is now being adjusted forwards instead of backwards, but still with
big increments. Current "ntpq -pn" output:

remote refid st t when poll reach delay offset jitter
==============================================================================
+89.188.26.129 193.79.237.14 2 u 56 64 377 24.989 20.536 8.671
+91.148.192.49 193.67.79.202 2 u 35 64 377 17.968 35.173 13.029
-85.12.35.12 134.221.205.12 2 u 37 64 377 16.989 25.143 10.667
*83.98.155.30 193.79.237.14 2 u 16 64 377 18.963 34.747 13.242

Any hints/tips?

--
J.

unruh

unread,
Dec 8, 2012, 3:12:18 PM12/8/12
to
On 2012-12-08, Jeroen Mostert <jmos...@xs4all.nl> wrote:
> If my event log is to be believed, ntpd is adjusting the clock to times in the
> past (with pretty big intervals):
>
> Log Name: System
> Source: Microsoft-Windows-Kernel-General
> Date: 2012-12-08 15:18:52
> Event ID: 1
> Task Category: None
> Level: Information
> Keywords: Time
> User: PORTIA\ntp
> Computer: PORTIA
> Description:
> The system time has changed to ???2012???-???12???-???08T14:18:52.347000000Z from
> ???2012???-???12???-???08T14:18:52.569674500Z.
>
> As I understand it, it's not supposed to be doing this, instead it should slow
> the clock down.

No. ntpd jumps the time, forward or backward, if the time is out by more
than 128 microseconds. Since ntpd puts a limit of 500PPM on the clock
rate adjust, it would take a minimum of 3 hours to fix a 1 second error.
(and a year to fix a 1 hr error) if it only
used the slew rate to fix the time error.

On Linux there are two clock rate fix machanisms-- a fine rate adjust
and a coarse one. The coarse one can change the rate up to 100000PPM so
could in principle fix a 10 hr offset in 4 days, but ntpd does not use
that mechanism. It jumps instead (Ie, it has a fine rate adjust and an
infinite rate adjust only). I do not know if windows has the same coarse
rate adjust mechanism.
Note that both operating systems use the coarse adjust at bootup when
they calibrate the system clock rate. If your timer rate is out by a
lot, you can recalibrate by hand and readjust the system clock rate
using that coarse adjust.



The main question is whether or not your jumping indicates more severe
problems-- ie why is the clock finding itself so far out that it has to
jump. Why is the usual ntpd fixing mechanism not working properly.
Do you get this behaviour only at the beginning, or after ntpd has been
running for days?

Jeroen Mostert

unread,
Dec 8, 2012, 3:49:07 PM12/8/12
to
On 2012-12-08 21:12, unruh wrote:
> On 2012-12-08, Jeroen Mostert<jmos...@xs4all.nl> wrote:
>> If my event log is to be believed, ntpd is adjusting the clock to times in the
>> past (with pretty big intervals):
<snip>
>> As I understand it, it's not supposed to be doing this, instead it should slow
>> the clock down.
>
> No. ntpd jumps the time, forward or backward, if the time is out by more
> than 128 microseconds.

I think you mean 128 milliseconds. That's what I'm getting from the docs, anyway.

I thought ntp could adjust the time forwards in "big" steps, but would never
adjust the time backwards. I'm guessing I got confused with the slew-only option
(-x) for servers that absolutely cannot tolerate the clock going backwards, even
if it takes forever to adjust it through slew only.

> Note that both operating systems use the coarse adjust at bootup when
> they calibrate the system clock rate. If your timer rate is out by a
> lot, you can recalibrate by hand and readjust the system clock rate
> using that coarse adjust.
>
Since this machine more or less serves as a guinea pig/model for deployment in a
network, calibrating stuff by hand is basically not going to be an option in the
production environment. I'm just going to have to count on that working out.

> The main question is whether or not your jumping indicates more severe
> problems-- ie why is the clock finding itself so far out that it has to
> jump. Why is the usual ntpd fixing mechanism not working properly.
> Do you get this behaviour only at the beginning, or after ntpd has been
> running for days?
>
At the time of the event, ntpd had been running for 19 hours (same as system
uptime). This is of course not that long in NTP terms. The first clock
adjustment (logged in the event log) came at 40 seconds uptime, then 4 hours
later, and after that it started adjusting about every hour.

With your explanation, I'm perfectly willing to blame crappy network conditions
and/or crappy pool servers on the frequent adjustments. Looking at another
machine, I see zero (0) adjustments there since ntpd was configured to use a
central, local time server (stratum 2). This machine has no time-critical
services, so as long as it's accurate to within 1 sec/day I can't complain. I
was just confused about the clock going backwards.

--
J.

David Woolley

unread,
Dec 8, 2012, 4:03:55 PM12/8/12
to
Jeroen Mostert wrote:

>
> I thought ntp could adjust the time forwards in "big" steps, but would
> never adjust the time backwards. I'm guessing I got confused with the
> slew-only option (-x) for servers that absolutely cannot tolerate the
> clock going backwards, even if it takes forever to adjust it through
> slew only.

Even -x doesn't disable steps, it just requires a larger error (600ms)
before stepping. Also note that increasing the error before stepping
forces ntpd into a degraded mode where runs the discipline algorithm in
user space, every second, and runs the clock at +/- 500ppm for a
fraction of each second. In the normal mode, on suitable systems, the
kernel deals with the fine detail at the clock tick rate.

>
> With your explanation, I'm perfectly willing to blame crappy network
> conditions and/or crappy pool servers on the frequent adjustments.
> Looking at another machine, I see zero (0) adjustments there since ntpd

If you suffer from temporary severe asymmetric delays, you can use the
huff and puff tinker option to try and compensate.

Jeroen Mostert

unread,
Dec 8, 2012, 4:18:14 PM12/8/12
to
On 2012-12-08 22:03, David Woolley wrote:
> If you suffer from temporary severe asymmetric delays, you can use the huff and
> puff tinker option to try and compensate.
>
Given that the adjustments come about every hour, and the machine is not doing
anything particularly interesting with the network, *and* I've seen this
behavior on other Win7-machines configured to use the NTP pool servers (with
radically different network setup/use) I doubt that would do the trick. But I'll
keep it in mind.

For now I've tried setting NTPD_USE_INTERP_DANGEROUS, because 1) it has
DANGEROUS in the name, FOR SCIENCE! and 2) it's reported that the interpolation
may actually work again on Windows 7 machines, sometimes (it didn't work well on
Vista, which is why ntpd started using the clock directly).

Though it's really too early to tell, it's interesting that the offset has kept
to <40 ms since the last restart 2 hours ago and no adjustment to the system
time has been logged, so this may be a keeper.

--
J.

David Taylor

unread,
Dec 9, 2012, 3:37:29 AM12/9/12
to
If NTP is regularly using big steps to adjust the time, it suggests that
something is wrong. I would take a look at the drift value to see
whether the clock on your PC has a frequency which is too far in error.
You can also enable loopstats collection, and use a tool such as
Meinberg's NTP Time Server Monitor or my own NTP Plotter to display the
results:

http://www.meinbergglobal.com/english/sw/ntp.htm
http://www.satsignal.eu/software/net.htm#NTPplotter

I would recommend moving to the newer pool directive rather than the
older multiple pool server lines:

http://www.satsignal.eu/ntp/setup.html#pool

as it may use more pool servers than you would but, more importantly, it
reviews the servers from time to time and will drop badly performing
ones (i.e. broken) and replace them with a new one.

NTP will use interpolation on Windows XP and Windows-8, but normally not
in Windows Vista or Windows-7. Yes, it can be worth experimenting with
these settings as you later report.

Does the poll increase from 64 to higher values as NTP runs, or is it
stuck on 64?

I have seen one issue on Windows-7 where, at boot-up, NTP makes the
wrong choice about interpolation because the system clock at that time
is running at 15.6 ms, whereas it will later switch to 1 ms (I may have
the explanation slightly wrong). For this reason, on both my Windows-7
LAN-synced PCs I have:

NTPD_USE_SYSTEM_CLOCK=1

But I do then see: "using Windows clock directly" in the NTP events.

Please let us know how you get on.
--
Cheers,
David
Web: http://www.satsignal.eu

Jeroen Mostert

unread,
Dec 9, 2012, 5:17:44 AM12/9/12
to
Drift is currently at -2.3, and no abnormally high/low values have been recorded.

Loopstats collection has been on since the beginning. From the period where
these big adjustments happen, I get some suspicious data:

56269 38100.218 0.000000000 19.486 0.000000238 0.120120 9
56269 38105.421 -0.054125528 19.486 0.019136264 0.112362 9
56269 38109.421 -0.054433286 19.486 0.017900666 0.105105 9
56269 39160.464 -0.072777326 19.415 0.017956687 0.101492 9
56269 41770.153 0.000000000 19.415 0.000000238 0.094937 9
56269 41775.482 -0.002025903 19.415 0.000716265 0.088805 9
56269 41905.412 -0.044579941 19.409 0.015060036 0.083092 9
56269 43888.511 -0.066275609 19.287 0.016040319 0.088960 9
56269 47068.130 0.000000000 19.287 0.000000238 0.083214 9

The 0 offsets suggest ntpd regularly thinks we're now in perfect sync, something
which is certainly not true. I don't know how to properly interpret the error
and stability values.

> and use a tool such as Meinberg's NTP Time Server Monitor or my own NTP
> Plotter to display the results:
>
> http://www.meinbergglobal.com/english/sw/ntp.htm
> http://www.satsignal.eu/software/net.htm#NTPplotter
>
Not to look a gift horse in the mouth, but the fact that your plotter has drag
and drop for input is annoying. This (to me) is one of the least convenient
mechanisms for providing input. Since the plotter cannot meaningfully run
without input, you might as well throw an Open File dialog up when it starts.
And/or use the Windows convention of supplying a File -> Open menu. I had to
actually look in the readme to even realize it used drag and drop at all.

The lack of feedback doesn't help either. When I drag peerstats.20121209 it's
accepted but nothing happens, but it doesn't tell me why not. (Experimentally,
the reason is that you must drag all files simultaneously, and the set must
contain loopstats. Dragging peerstats individually never does anything, whether
before or after it's already seen loopstats.)

As far as the results go, practically all the graphs look crazy, with enormous
spikes (as you'd imagine). I'll have to do more research before I can
meaningfully interpret them.

> I would recommend moving to the newer pool directive rather than the older
> multiple pool server lines:
>
> http://www.satsignal.eu/ntp/setup.html#pool
>
> as it may use more pool servers than you would but, more importantly, it reviews
> the servers from time to time and will drop badly performing ones (i.e. broken)
> and replace them with a new one.
>
Thanks. For some reason, none of the documentation I've read even mentions this
directive, and that includes the setup instructions of the pool.ntp.org website,
and all manpages for ntpd.conf I've come across. This is the first time I hear
about it.

> NTP will use interpolation on Windows XP and Windows-8, but normally not in
> Windows Vista or Windows-7. Yes, it can be worth experimenting with these
> settings as you later report.
>
> Does the poll increase from 64 to higher values as NTP runs, or is it stuck on 64?
>
It increases all the way up to 1024.

> I have seen one issue on Windows-7 where, at boot-up, NTP makes the wrong choice
> about interpolation because the system clock at that time is running at 15.6 ms,
> whereas it will later switch to 1 ms (I may have the explanation slightly
> wrong). For this reason, on both my Windows-7 LAN-synced PCs I have:
>
> NTPD_USE_SYSTEM_CLOCK=1
>
> But I do then see: "using Windows clock directly" in the NTP events.
>
Yes, that doesn't appear to be the issue.

> Please let us know how you get on.

The "pool" directive had some effect (I now have 9 servers instead of 4) and
initially the offset stayed under 10 ms, but it seems that as soon as the poll
interval goes above 64, the offset starts slipping -- currently at 30 ms. I'll
keep things under observation; I get the feeling it hasn't quite stabilized yet.

--
J.

David Woolley

unread,
Dec 9, 2012, 6:47:01 AM12/9/12
to

> The 0 offsets suggest ntpd regularly thinks we're now in perfect sync,
> something which is certainly not true. I don't know how to properly
> interpret the error and stability values.
>

Zero offset means that the timestamp it read, plus half the round trip
time, matches what it thinks is the local clock time. There may be
errors in the timestamp, and the outward and return delay times may be
different.

My guess, if you are getting a lot of zeroes, is that both the current
system and the upstream system are not able to read their local times
very precisely.

David Taylor

unread,
Dec 9, 2012, 9:14:55 AM12/9/12
to
On 09/12/2012 10:17, Jeroen Mostert wrote:
[]
I wonder whether you 0 offset values mean that NTP has restarted itself,
perhaps after making one the time jumps?

My NTP Plotter program will take an input file or directory from the
command-line, as documented in the read-me, so you need only set up a
batch command once and double-click it to see the current data. I'm
always open to user suggestions, and so far no-one else has asked for a
File|Open dialogue! But you are right that the program expects at least
a loopstats file (or directory) to be dropped. It finds the peerstats
automatically.

The primary documentation for NTP is the set of HTML pages, not
"manpages". I do agree that the pool directive could be more widely
publicised, and I have suggested that Meinberg incorporate that into
their default install

If you are still seeing stepping, I would want to investigate that
further. It could be that your Internet connection is not so good (any
Wi-Fi involved?), but it didn't look like that from the stats. One
possibility may be to set maxpoll lower than 10 in the pool directive.
It should be higher than 6 (64 seconds) to avoid being /too/ unfriendly
to the servers you are using, but perhaps 8 (256 seconds) might prevent
things for getting so far out that a reset is required.

I don't think we're at the bottom of this yet. Are you running any
other software which might attempt to set the time? The W32time service
is disabled and stopped? No fancy audio-visual programs being run?
Nothing which completely hogs the CPU or saturates the network
connection? Just some odd things to think about!

Jeroen Mostert

unread,
Dec 9, 2012, 9:16:46 AM12/9/12
to
On 2012-12-09 09:37, David Taylor wrote:
> I have seen one issue on Windows-7 where, at boot-up, NTP makes the wrong choice
> about interpolation because the system clock at that time is running at 15.6 ms,
> whereas it will later switch to 1 ms (I may have the explanation slightly
> wrong). For this reason, on both my Windows-7 LAN-synced PCs I have:
>
> NTPD_USE_SYSTEM_CLOCK=1
>
> But I do then see: "using Windows clock directly" in the NTP events.
>
For what it's worth, after a reboot (with all custom overrides removed), I saw
exactly the behavior you describe. And this is after the system is already up
and running, starting ntpd manually. ntpd logs this:

ntpd 4.2.7p310-o Oct 09 17:56:01.10 (UTC-00:00) 2012 (1): Starting
Raised to realtime priority class
Clock interrupt period 15.600 msec
Performance counter frequency 14.318 MHz
MM timer resolution: 1..1000000 msec, set to 1 msec
Windows clock precision 15.600 msec, min. slew 6.410 ppm/s
HZ 64.102 using 43 msec timer 23.256 Hz 64 deep

Even though the multimedia timer has been set to 1 msec,
GetSystemTimeAsFileTime() is still returning timestamps with a 15.6 msec offset.
This should be 1 msec. Running clockres (from SysInternals) gives back that the
timer interval is indeed 1 msec. Restarting NTP:

ntpd 4.2.7p310-o Oct 09 17:56:01.10 (UTC-00:00) 2012 (1): Starting
Raised to realtime priority class
Clock interrupt period 15.600 msec (startup slew -0.1 usec/period)
Performance counter frequency 14.318 MHz
MM timer resolution: 1..1000000 msec, set to 1 msec
Windows clock precision 1.000 msec, min. slew 6.410 ppm/s
using Windows clock directly

I'm not sure what the issue is, here. Either ntpd somehow fails to set the
multimedia timer (unlikely), or there is a delay between the multimedia timer
getting set and the resolution increase in GetSystemTimeAsFileTime(), or (worst)
GetSystemTimeAsFileTime() actually can't make up its mind. I'd need to whip up
some code to test this. If it's due to a (subsecond) delay, it would be trivial
to fix this in ntpd.

--
J.

Jeroen Mostert

unread,
Dec 9, 2012, 10:00:42 AM12/9/12
to
On 2012-12-09 15:14, David Taylor wrote:
> On 09/12/2012 10:17, Jeroen Mostert wrote:
> []
>> Drift is currently at -2.3, and no abnormally high/low values have been
>> recorded.
>>
>> Loopstats collection has been on since the beginning. From the period
>> where these big adjustments happen, I get some suspicious data:
>>
>> 56269 38100.218 0.000000000 19.486 0.000000238 0.120120 9
>> 56269 38105.421 -0.054125528 19.486 0.019136264 0.112362 9
>> 56269 38109.421 -0.054433286 19.486 0.017900666 0.105105 9
>> 56269 39160.464 -0.072777326 19.415 0.017956687 0.101492 9
>> 56269 41770.153 0.000000000 19.415 0.000000238 0.094937 9
>> 56269 41775.482 -0.002025903 19.415 0.000716265 0.088805 9
>> 56269 41905.412 -0.044579941 19.409 0.015060036 0.083092 9
>> 56269 43888.511 -0.066275609 19.287 0.016040319 0.088960 9
>> 56269 47068.130 0.000000000 19.287 0.000000238 0.083214 9
>>
>> The 0 offsets suggest ntpd regularly thinks we're now in perfect sync,
>> something which is certainly not true. I don't know how to properly
>> interpret the error and stability values.
>>
<snip>
>> The "pool" directive had some effect (I now have 9 servers instead of 4)
>> and initially the offset stayed under 10 ms, but it seems that as soon
>> as the poll interval goes above 64, the offset starts slipping --
>> currently at 30 ms. I'll keep things under observation; I get the
>> feeling it hasn't quite stabilized yet.
>
> I wonder whether you 0 offset values mean that NTP has restarted itself, perhaps
> after making one the time jumps?
>
No, the service is definitely running continuously.

> My NTP Plotter program will take an input file or directory from the
> command-line, as documented in the read-me, so you need only set up a batch
> command once and double-click it to see the current data.

Fair enough. I have no problem not getting a File -> Open menu from the likes of
gnuplot, but that's because that *only* has a command-line interface so I expect
no better. If something starts up with a GUI, though, I expect to be able to use
it without reading documentation. Funny how that goes. :-)

> I'm always open to user suggestions, and so far no-one else has asked for a
> File|Open dialogue! But you are right that the program expects at least a
> loopstats file (or directory) to be dropped. It finds the peerstats
> automatically.
>
Provided it is named "peerstats.somethingsomething", right?

Unfortunately, the easy-to-find pages for troubleshooting NTP at
http://www.ntp.org/ntpfaq/NTP-s-trouble.htm arbitrarily rename the files to
"loops" and "peers", which is what I've been using (and other folks too, I'd
wager). I'm going to remove those options, but not just yet (since I don't want
to stitch files together).

> The primary documentation for NTP is the set of HTML pages, not "manpages".

Well, you're right. http://www.eecis.udel.edu/~mills/ntp/html/confopt.html
describes this option, as well as the Meinberg docs (which are an earlier
version). The information seems to be "ungoogleable" in that you must read the
whole thing top to bottom before you know it exists, but that is of course no
excuse.

It is a shame that nothing *outside* the reference documentation (in particular
quick-start guides) seems to describe the use of this option, though. It's also
unfortunate that ntpd has been around so long (with relatively few changes) that
outdated documentation is a dime a dozen.

> If you are still seeing stepping, I would want to investigate that further. It
> could be that your Internet connection is not so good (any Wi-Fi involved?), but
> it didn't look like that from the stats. One possibility may be to set maxpoll
> lower than 10 in the pool directive. It should be higher than 6 (64 seconds) to
> avoid being /too/ unfriendly to the servers you are using, but perhaps 8 (256
> seconds) might prevent things for getting so far out that a reset is required.
>
I would figure this might fix things, but I'm holding off on that because (as
you say) this is not friendly on servers and would also not fix/identify the
root cause -- if my admittedly meager understanding of how NTP works is correct,
then it shouldn't be necessary to poll the servers very frequently unless
there's something inherently non-linearly wrong with your clock.

> I don't think we're at the bottom of this yet. Are you running any other
> software which might attempt to set the time? The W32time service is disabled
> and stopped? No fancy audio-visual programs being run? Nothing which completely
> hogs the CPU or saturates the network connection? Just some odd things to think
> about!

Nothing of the sort. It is a consumer PC used for everything, including gaming,
and although no gaming has been going on there could in theory be some exotic
driver or piece of software mucking things up in a non-obvious way. If there is,
though, I have no idea how to find it other than through an extremely tedious
bisection that's really not worth it. If it turns out that NTP cannot work
without stepping my clock every so often, I can actually live with that. Not so
much if it happens on "real" machines, of course. This is strictly a research
project.

It is worth reporting, however, that no stepping has been occurring since I
first reported the problem. This could be due to a simple restart of ntpd and
not through any options I've tinkered with. I've tinkered with a *lot* of them,
and since I'm not a professional scientist, I have made no attempt at systematic
analysis of all options separately and combined, of course, just continued
tweaking while I wasn't satisfied :-)

I'm going to leave it alone for a bit to see the results after ntpd gets some
time to stabilize. For reference, I've done all of the following so far:

- Upgraded to ntpd 4.2.7p310.
- Replaced individual "server" lines with a single "pool" line.
- Adjusted power management options within Windows to make processors run at 100%.
- Updated the BIOS. (Also scratched my nose, which I expect has the same effect.)
- Turned off explicit processor frequency stepping options in the BIOS.
- Used "bcdedit /set useplatformclock on" to force exclusive use of the HPET
(I'm guessing this does nothing at the moment since interpolation isn't used).
- Last but not least, restarted ntpd.

Offset are still horrid (>30 ms) but I notice the drift is swinging up too, so
my guess is that ntpd hasn't fixed on a good value yet after all the tinkering
(and/or my clock is just bogus for reasons yet unknown).

If that doesn't work out, I may take apart the network traffic (including the
DSL router) to see if that has anything to do with anything. Delay and jitter on
the NTP packets seem fairly high (although I don't know if that would explain
continuous bad offsets).

--
J.

Jeroen Mostert

unread,
Dec 9, 2012, 10:10:59 AM12/9/12
to
On 2012-12-09 16:00, Jeroen Mostert wrote:
> I'm going to leave it alone for a bit to see the results after ntpd gets some
> time to stabilize. For reference, I've done all of the following so far:
>
> - Upgraded to ntpd 4.2.7p310.
> - Replaced individual "server" lines with a single "pool" line.
> - Adjusted power management options within Windows to make processors run at 100%.
> - Updated the BIOS. (Also scratched my nose, which I expect has the same effect.)
> - Turned off explicit processor frequency stepping options in the BIOS.
> - Used "bcdedit /set useplatformclock on" to force exclusive use of the HPET
> (I'm guessing this does nothing at the moment since interpolation isn't used).
> - Last but not least, restarted ntpd.
>
And forgot:

- Disabled Teredo tunneling. Likely nothing to do with the performance, but ntpd
was logging spurious events for the Teredo interfaces appearing/disappearing.
Since I do nothing with IPv6, I saw no reason why this should be on.

--
J.

David Taylor

unread,
Dec 9, 2012, 11:00:15 AM12/9/12
to
On 09/12/2012 15:00, Jeroen Mostert wrote:
[]
> No, the service is definitely running continuously.
[]
> Fair enough. I have no problem not getting a File -> Open menu from the
> likes of gnuplot, but that's because that *only* has a command-line
> interface so I expect no better. If something starts up with a GUI,
> though, I expect to be able to use it without reading documentation.
> Funny how that goes. :-)
>
[]
> Provided it is named "peerstats.somethingsomething", right?
>
> Unfortunately, the easy-to-find pages for troubleshooting NTP at
> http://www.ntp.org/ntpfaq/NTP-s-trouble.htm arbitrarily rename the files
> to "loops" and "peers", which is what I've been using (and other folks
> too, I'd wager). I'm going to remove those options, but not just yet
> (since I don't want to stitch files together).
>
>> The primary documentation for NTP is the set of HTML pages, not
>> "manpages".
>
> Well, you're right.
> http://www.eecis.udel.edu/~mills/ntp/html/confopt.html describes this
> option, as well as the Meinberg docs (which are an earlier version). The
> information seems to be "ungoogleable" in that you must read the whole
> thing top to bottom before you know it exists, but that is of course no
> excuse.
>
> It is a shame that nothing *outside* the reference documentation (in
> particular quick-start guides) seems to describe the use of this option,
> though. It's also unfortunate that ntpd has been around so long (with
> relatively few changes) that outdated documentation is a dime a dozen.
[]
> I would figure this might fix things, but I'm holding off on that
> because (as you say) this is not friendly on servers and would also not
> fix/identify the root cause -- if my admittedly meager understanding of
> how NTP works is correct, then it shouldn't be necessary to poll the
> servers very frequently unless there's something inherently non-linearly
> wrong with your clock.
[]
NTO restarting - I meant a logical internal restart after a step, not
the service restarting.

I recall that the NTP Plotter is fairly open to accepting various
different file names automatically because, as you say, different places
have different recommendations. Sigh! Maybe I'll add a File|Open
dialog if the program starts with no file specified, but it's not top
priority at the moment.

Nothing describes "pool"? Well, my Windows set-up page (also referenced
by Meinberg) does for a start.

As I mentioned, I haven't tried Internet servers alone recently on
Windows except for one brief test with Windows-8 (because it has more
precise time routines):

http://www.satsignal.eu/ntp/Win-8+Internet.html

and I did a test with a Linux server, WAN-only as well:

http://www.satsignal.eu/ntp/2012-11-10-11-12-raspi1_ntp-b-day.png

If precise timekeeping is important, it may be worth upgrading to
Windows-8 (although I saw an improvement on only one of two PCs tested).

Re IPv6, I turn it off on all of my PCs as I don't use it either.

It may well be that 30 - 40 ms is the best you can expect without some
local server (e.g. FreeBSD/Linux/Win-8 box on your LAN), but stepping
every hour should not normally be happening.

Jeroen Mostert

unread,
Dec 9, 2012, 11:26:15 AM12/9/12
to
On 2012-12-09 17:00, David Taylor wrote:
> On 09/12/2012 15:00, Jeroen Mostert wrote:
>> It is a shame that nothing *outside* the reference documentation (in
>> particular quick-start guides) seems to describe the use of this option,
>> though. It's also unfortunate that ntpd has been around so long (with
>> relatively few changes) that outdated documentation is a dime a dozen.
>
> Nothing describes "pool"? Well, my Windows set-up page (also referenced by
> Meinberg) does for a start.
>
Yes indeed: "David Taylor created a detailed step-by-step walkthrough of the
installation on his website."

Pshaw! As if I'm going to read some step-by-step walkthrough! There's
screenshots below the thing, right? This David Taylor guy can't possibly have
something more interesting to say.

The above is, of course, steeped in irony, since your site is a veritable
treasure trove for people setting up NTP under Windows. I really wish I'd
stumbled across it sooner. I know I didn't find it through the Meinberg site.

> As I mentioned, I haven't tried Internet servers alone recently on Windows
> except for one brief test with Windows-8 (because it has more precise time
> routines):
>
> http://www.satsignal.eu/ntp/Win-8+Internet.html
>
> and I did a test with a Linux server, WAN-only as well:
>
> http://www.satsignal.eu/ntp/2012-11-10-11-12-raspi1_ntp-b-day.png
>
> If precise timekeeping is important, it may be worth upgrading to Windows-8
> (although I saw an improvement on only one of two PCs tested).
>
As long as Microsoft still doesn't make it possible to upgrade the kernel
without upgrading the shell, that's not going to happen, because the Windows 8
UI is a serious productivity killer for me.

The broader context is that I'm working on an NTP setup for a "real" network,
where the only demand is that the servers stay within each other's times to some
reasonable degree (I'll take anything <= 10 ms), with only very lax demands on
synchronizing to absolute time. I've already discovered that this is actually
fairly easy as long as all machines sync to a single local server with some
frequency (setting minpoll 4 maxpoll 4 is no problem on a LAN, of course, and
keeps even the most wayward machines in line). I'm still happy I ran into
problems locally, because upgrading to 4.2.7 significantly reduces jitter even
in this setup (as reported by your site as well), so I'll be sure to slipstream
that in.

Of course, upgrading all machines to Windows 8/Windows Server 2012 just to get
better time synchronization isn't going to happen. :-)

For now the local server is a Windows Server 2008 machine, but I've already
petitioned for a dedicated Linux machine (FreeBSD would be too "out there" for
IT, I'm afraid...) If possible, we can later upgrade that to something with a
GPS, but it's probably not necessary at the moment.

> It may well be that 30 - 40 ms is the best you can expect without some local
> server (e.g. FreeBSD/Linux/Win-8 box on your LAN), but stepping every hour
> should not normally be happening.

If 30-40 ms is realistic for syncing to internet time, I'll take it. Though it's
slightly disappointing, but I guess the limitation is in Windows more than it is
in NTP.

--
J.

David Taylor

unread,
Dec 9, 2012, 12:18:17 PM12/9/12
to
On 09/12/2012 16:26, Jeroen Mostert wrote:
[]
> As long as Microsoft still doesn't make it possible to upgrade the
> kernel without upgrading the shell, that's not going to happen, because
> the Windows 8 UI is a serious productivity killer for me.
>
> The broader context is that I'm working on an NTP setup for a "real"
> network, where the only demand is that the servers stay within each
> other's times to some reasonable degree (I'll take anything <= 10 ms),
> with only very lax demands on synchronizing to absolute time. I've
> already discovered that this is actually fairly easy as long as all
> machines sync to a single local server with some frequency (setting
> minpoll 4 maxpoll 4 is no problem on a LAN, of course, and keeps even
> the most wayward machines in line). I'm still happy I ran into problems
> locally, because upgrading to 4.2.7 significantly reduces jitter even in
> this setup (as reported by your site as well), so I'll be sure to
> slipstream that in.
>
> Of course, upgrading all machines to Windows 8/Windows Server 2012 just
> to get better time synchronization isn't going to happen. :-)
>
> For now the local server is a Windows Server 2008 machine, but I've
> already petitioned for a dedicated Linux machine (FreeBSD would be too
> "out there" for IT, I'm afraid...) If possible, we can later upgrade
> that to something with a GPS, but it's probably not necessary at the
> moment.
[]
> If 30-40 ms is realistic for syncing to internet time, I'll take it.
> Though it's slightly disappointing, but I guess the limitation is in
> Windows more than it is in NTP.

Win-8 is almost the same as Win-7, just add a start-menu add-on such as
ClassicShell. Quite tolerable, then (although DOS shell, run as
Administrator takes a little more effort!).

What I would agree for your situation is to have a local server to which
you tight-lock (low min/maxpoll) the rest. But as a result of my own
recent tests, I would make that local server a Raspberry Pi or similar
(it's very low cost) running Linux and synced to the Internet, and if it
suits, make that a stratum-1 server with a GPS/PPS for a score of
dollar/pounds more.

No need to upgrade all PCs, just one server. PCs Hydra and Narvik here
are on a LAN connection.

http://www.satsignal.eu/mrtg/performance_ntp.php

and PCs Molde, Puffin, Torvik and Ystad are Wi-Fi connected (which
doesn't help timekeeping). Note the difference between XP and Windows-7.

Your suggested arrangement of a Linux time server (which can be a very
simple, low-powered PC) and tight coupling of the rest should give you
excellent results.

Jeroen Mostert

unread,
Dec 9, 2012, 12:59:10 PM12/9/12
to
Yeah, no. I categorically refuse to "upgrade" my OS (at a non-zero cost) and
then "install some extra stuff" just to make it *acceptable* again. That's just
insane. I guess if you need the new features in Windows 8, it might be worth
considering the time and effort, but at present I know of no such features. The
increased clock accuracy isn't one.

I'm waiting for Windows NT 6.3, or 7.0, whatever the marketing name ends up
being. If that still has Metro, I'll see how I'll compensate. But I'm hoping it
won't.

> What I would agree for your situation is to have a local server to which you
> tight-lock (low min/maxpoll) the rest. But as a result of my own recent tests, I
> would make that local server a Raspberry Pi or similar (it's very low cost)
> running Linux and synced to the Internet, and if it suits, make that a stratum-1
> server with a GPS/PPS for a score of dollar/pounds more.
>
I work for a Windows shop with too much money and too little time. It's going to
be a spare server we had lying around; suggesting folks get enthusiastic with
hardware that's not from Dell isn't going to fly. :-) So the Raspberry Pi is
out, and upgrading to a stratum 1 only if there's a demonstrable need.

This is all basically a spare time effort from me, to enable more accurate
performance analysis of cross-server events. It's not exactly mission critical,
but nice to have. There was a significant period of time where we had no time
syncing whatsoever, and we only started to run into problems when one machine
was 19 seconds off from actual time. I then configured w32time on all machines,
but w32time is not particularly robust. Microsoft itself admits it's just there
to make sure Kerberos doesn't completely fail, and as long as it keeps time
within 5 minutes from a domain controller they don't care. I'm now at a point
where I think machines deviating from each other for more than 10 ms is
annoying, so that won't do anymore.

> No need to upgrade all PCs, just one server. PCs Hydra and Narvik here are on a
> LAN connection.
>
> http://www.satsignal.eu/mrtg/performance_ntp.php
>
> and PCs Molde, Puffin, Torvik and Ystad are Wi-Fi connected (which doesn't help
> timekeeping). Note the difference between XP and Windows-7.
>
Statistics like those would be more than acceptable for my purposes. Your worst
stats would be good enough for me.

> Your suggested arrangement of a Linux time server (which can be a very simple,
> low-powered PC) and tight coupling of the rest should give you excellent results.

It's actually going to be far heavier than is necessary, but I'm confident it'll
get the job done.

--
J.

David Taylor

unread,
Dec 9, 2012, 2:17:35 PM12/9/12
to
On 09/12/2012 17:59, Jeroen Mostert wrote:
[]
> Yeah, no. I categorically refuse to "upgrade" my OS (at a non-zero cost)
> and then "install some extra stuff" just to make it *acceptable* again.
> That's just insane. I guess if you need the new features in Windows 8,
> it might be worth considering the time and effort, but at present I know
> of no such features. The increased clock accuracy isn't one.
>
> I'm waiting for Windows NT 6.3, or 7.0, whatever the marketing name ends
> up being. If that still has Metro, I'll see how I'll compensate. But I'm
> hoping it won't.
[]
> I work for a Windows shop with too much money and too little time. It's
> going to be a spare server we had lying around; suggesting folks get
> enthusiastic with hardware that's not from Dell isn't going to fly. :-)
> So the Raspberry Pi is out, and upgrading to a stratum 1 only if there's
> a demonstrable need.
>
> This is all basically a spare time effort from me, to enable more
> accurate performance analysis of cross-server events. It's not exactly
> mission critical, but nice to have. There was a significant period of
> time where we had no time syncing whatsoever, and we only started to run
> into problems when one machine was 19 seconds off from actual time. I
> then configured w32time on all machines, but w32time is not particularly
> robust. Microsoft itself admits it's just there to make sure Kerberos
> doesn't completely fail, and as long as it keeps time within 5 minutes
> from a domain controller they don't care. I'm now at a point where I
> think machines deviating from each other for more than 10 ms is
> annoying, so that won't do anymore.
[]
> Statistics like those would be more than acceptable for my purposes.
> Your worst stats would be good enough for me.
[]
> It's actually going to be far heavier than is necessary, but I'm
> confident it'll get the job done.

I'm on a subscription for the Windows OS, and many of my customers are
now upgrading to WIndows-8, or buying new PCs woth that OS, so I really
have no choice to invest. Windows-7 is a fine OS for general purpose
work and, if asked, I don't see any need for folk to upgrade. In fact
Windows XP has many advantages as a lighter-weight OS than Win-7.

Of course, if you have a Dell spare box, use that. Linux or FreeBSD. I
used to be in a similar job myself, and actually introduced my branch of
the company to NTP, which we used on both Windows and UNIX boxes. We
were a DEC shop, but whether we ever got round to NTP on VMS I don't
recall. I do recall Spring and Autumn events involving time, so perhaps
we did not.

My aim is to be within 1 millisecond, as the events my systems record
have a resolution of 1 ms. It's nice, but not essential, to be able to
correlate events across four separate satellite data reception systems.

Glad the stats were of some help, and glad to have had the chat.

Uwe Klein

unread,
Dec 10, 2012, 4:23:28 AM12/10/12
to
David Taylor wrote:
> Glad the stats were of some help, and glad to have had the chat.

ages ago I needed to timestamp large datasets from a spectrometer rushing
in at 50 .. 100 Sample Sets/s. ( on SYSVR3.5 on a MVME167 System )
One way would have been to sync the system via DCF77 and ntp or similar.
This would have presented a significant porting effort.

My solution was to sample a GPIO pin attached to the DCF77 signal
and insert this information together with the freerunning system uptime variable
into my data sets.

A utility later extracted the DCF time and synced it to the uptime counter.
The only hard requirement was for the uptimecounter to be shortime stable.

uwe


Jeroen Mostert

unread,
Dec 10, 2012, 2:08:06 PM12/10/12
to
For what it's worth, after all of that, the offset is steadily zigzagging
between 27 and 41 ms, which I'm guessing is about the best you can hope for on a
Windows machine with Internet sync. There have not been any major time step
adjustments.

Thanks to all involved and especially David Taylor for his excellent site on NTP
on Windows.

--
J.

unruh

unread,
Dec 10, 2012, 3:17:13 PM12/10/12
to
Since a Linux machine under conditions of use of the internet for time
stamps is capable of sub-millisecond synchronization, this seems really
bad to me. I thought that clock interrupt went at 1ms intervals, and one
should be able to do that well even without interpolation, and with
interpolation even better.

Jeroen Mostert

unread,
Dec 10, 2012, 3:41:02 PM12/10/12
to
On 2012-12-10 21:17, unruh wrote:
> On 2012-12-10, Jeroen Mostert<jmos...@xs4all.nl> wrote:
<snip>
>> For what it's worth, after all of that, the offset is steadily zigzagging
>> between 27 and 41 ms, which I'm guessing is about the best you can hope for on a
>> Windows machine with Internet sync. There have not been any major time step
>> adjustments.
>
> Since a Linux machine under conditions of use of the internet for time
> stamps is capable of sub-millisecond synchronization, this seems really
> bad to me. I thought that clock interrupt went at 1ms intervals, and one
> should be able to do that well even without interpolation, and with
> interpolation even better.
>
If you say so. I have no idea how I would further diagnose a problem, if there
is one, and fix it, if there is a solution. The obvious fix would be to install
Linux, but for equally obvious reasons I'm not willing to go there. :-)

NTP (and other tools) are indeed reporting a 1 ms resolution of the clock, and I
can see the interrupt rate on core 0 is consistently above 1000 interrupts/sec,
so I'm guessing that holds up.

--
J.

David Woolley

unread,
Dec 10, 2012, 4:18:28 PM12/10/12
to
Jeroen Mostert wrote:

>>
> For what it's worth, after all of that, the offset is steadily
> zigzagging between 27 and 41 ms, which I'm guessing is about the best
> you can hope for on a Windows machine with Internet sync. There have not
> been any major time step adjustments.

Offsets should be scattered around zero. If they are all the same sign,
something is wrong.

Jeroen Mostert

unread,
Dec 10, 2012, 4:34:03 PM12/10/12
to
OK. Well, that's too bad, I guess.

Throw me a bone here, fellas. It's nice to know something is wrong, but if all I
can see is that all peers consistently report offsets in this range, the best I
can conclude is that either ntpd simply isn't successful in disciplining the
clock appropriately, or there is some network problem going on that someone who
understands the NTP protocol could probably diagnose. That someone would not yet
be me.

If I offset the offsets (pardon my math) by -33 milliseconds, I'd roughly have a
zero axis that they'd be swinging around with a range of -10/+10. According to
unruh, even that would still be horrible in terms of accuracy, so I'm not sure
if that's fair in its simplicity.

--
J.

unruh

unread,
Dec 10, 2012, 4:54:17 PM12/10/12
to
On 2012-12-10, Jeroen Mostert <jmos...@xs4all.nl> wrote:
Ah, if it really is 33 ms off on average, then yes something is wrong.
I assumed that you meant that the width of the scatter was 27-41ms, not
that the actual offset was always one sign and that far off.

However I do have to ask what this offset is.
What are your server entries in ntp.conf? Is this offset the offset from
the same server that is being used to set the time? Maybe a few lines
from /var/log/ntp/peerstats.xxxxxxx would let us see what the offset was
doing. Or using some of the plotting programs to plot the offsets
measured from the varios servers.

Jeroen Mostert

unread,
Dec 10, 2012, 5:13:09 PM12/10/12
to
pool nl.pool.ntp.org iburst

> Is this offset the offset from the same server that is being used to set the
> time? Maybe a few lines from /var/log/ntp/peerstats.xxxxxxx would let us see
> what the offset was doing. Or using some of the plotting programs to plot the
> offsets measured from the varios servers.
>
ntpq -np:

remote refid st t when poll reach delay offset jitter
==============================================================================
nl.pool.ntp.org .POOL. 16 p - 64 0 0.000 0.000 0.977
*213.136.0.252 .PPS. 1 u 800 1024 377 18.958 35.896 10.670
+213.239.154.12 193.190.230.65 2 u 572 1024 377 18.924 35.018 8.883
+81.171.44.131 193.190.230.66 2 u 528 1024 377 18.855 34.990 7.952
+46.19.33.5 193.79.237.14 2 u 381 1024 377 17.953 35.253 8.111
+95.211.111.53 221.238.121.188 3 u 674 1024 377 18.924 35.641 8.806
+195.191.113.251 193.79.237.14 2 u 694 1024 377 19.920 34.866 7.075
+178.251.121.16 193.67.79.202 2 u 7 1024 377 18.956 33.863 8.940
-83.98.201.133 153.60.75.10 2 u 703 1024 377 17.986 33.285 7.294
+192.87.106.2 192.87.106.3 2 u 484 1024 377 17.853 32.520 6.394


peerstats tail of today (cutting off the first column to prevent wrapping):

77050.630 46.19.33.5 1424 0.032273618 0.018946933 0.016009420 0.005895699
77422.629 178.251.121.16 141a 0.034412494 0.018950946 0.016068324 0.009463505
77724.626 213.136.0.252 161a 0.035895776 0.018958086 0.016099349 0.010622052
77792.626 83.98.201.133 133a 0.035527840 0.018976950 0.016341863 0.009205395
77925.626 213.239.154.12 141a 0.031416422 0.017919850 0.020178597 0.006312898
77982.626 81.171.44.131 143a 0.031503337 0.018806412 0.020621400 0.005369126
78106.626 46.19.33.5 1424 0.035252534 0.017952556 0.016150506 0.008054216
78867.627 83.98.201.133 133a 0.033285314 0.017985677 0.016441743 0.007294158
78876.629 195.191.113.251 143a 0.034866163 0.019920461 0.016225952 0.007074704
78896.628 95.211.111.53 1424 0.035640734 0.018924019 0.020330239 0.008805813
78998.630 213.239.154.12 141a 0.035017893 0.018924218 0.020397573 0.008882811
79042.628 81.171.44.131 143a 0.034990054 0.018854869 0.020404077 0.007952290
79086.627 192.87.106.2 141a 0.032519558 0.017852997 0.016296482 0.006394490
79563.628 178.251.121.16 141a 0.033863486 0.018955711 0.020809018 0.008940402

As far as I can tell the peer offsets are OK, if you just ignore the fact that
they're all too high -- and stay that way.

--
J.

David Taylor

unread,
Dec 11, 2012, 6:42:34 AM12/11/12
to
On 10/12/2012 20:17, unruh wrote:
[]
> Since a Linux machine under conditions of use of the internet for time
> stamps is capable of sub-millisecond synchronization, this seems really
> bad to me. I thought that clock interrupt went at 1ms intervals, and one
> should be able to do that well even without interpolation, and with
> interpolation even better.

The latest version of Windows (Win-8) can manage within 5 milliseconds
using just Internet servers:

http://www.satsignal.eu/ntp/Win-8+Internet.html

It returns a more precise clock time, but there still remains one issue
which limits its performance (the quantisation of the clock adjustment -
it reports it differently from the actual value, sigh!).

As you say, Linux timekeeping can do better than Windows, but this is
hardly news.

Cheers,
David

David Taylor

unread,
Dec 11, 2012, 6:49:15 AM12/11/12
to
What happens if the link to the Internet is rather asymmetrical? For
example, here I am stuck with 30 Mb/s down, but only 3 Mb/s up.

I also note that with my PCs being stratum-1 servers or synced to local
stratum-1, there is an offset to the WAN servers of some number of
milliseconds, typically between +3 ms and +6 ms, but the LAN servers
have a near zero average offset.

David Lord

unread,
Dec 11, 2012, 9:20:34 AM12/11/12
to
Here just now I have only MSF stratum-1 which from loop_summary
is varying widely, over last 7 days from 61+/-1070 rms=266
to 2251+/-4282 rms=527. ADSL is about 12 Mbps down, 1200 kbps up.
Other server 21+/-598 rms=166 to 177+/-794 rms=368. During that
period NetBSD kernel and userland have been updated on both
servers involving 2-3 reboots.

The different down/up latency gets lost in the noise. I should
have GPS+PPS reconnected next week but don't remember that
showing any significant offset from the server with only internet
ntp sources.

David



Rick Jones

unread,
Dec 11, 2012, 2:35:25 PM12/11/12
to
David Taylor <david-...@blueyonder.co.uk.invalid> wrote:

> What happens if the link to the Internet is rather asymmetrical? For
> example, here I am stuck with 30 Mb/s down, but only 3 Mb/s up.

Handwaving a bit... The query and the response seem to be something
like 90 bytes (including an Ethernet header). That then is 720 bits.
At 3 Mbits/s that would be 0.24 milliseconds of transmit time. It
would then be 0.024 milliseconds on the downlink. I suspect that
asymmetry is dwarfed by either queueing delays (bufferbloat) when
either/both are active and/or the rest of the delays from your system
to the server(s).

rick jones
--
It is not a question of half full or empty - the glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

David Taylor

unread,
Dec 11, 2012, 2:58:38 PM12/11/12
to
On 11/12/2012 19:35, Rick Jones wrote:
> David Taylor <david-...@blueyonder.co.uk.invalid> wrote:
>
>> What happens if the link to the Internet is rather asymmetrical? For
>> example, here I am stuck with 30 Mb/s down, but only 3 Mb/s up.
>
> Handwaving a bit... The query and the response seem to be something
> like 90 bytes (including an Ethernet header). That then is 720 bits.
> At 3 Mbits/s that would be 0.24 milliseconds of transmit time. It
> would then be 0.024 milliseconds on the downlink. I suspect that
> asymmetry is dwarfed by either queueing delays (bufferbloat) when
> either/both are active and/or the rest of the delays from your system
> to the server(s).
>
> rick jones

Yes, I see what you mean, but I was thinking of the whole network, and
not simply the up and down speeds. That we have a 10:1 ratio may
suggest how the rest of that ISP's network is configured. The delays in
the cable modem, Samknows box, router and 1 Gb/s switch are likely to be
far in excess of the 0.024 ms and even the 0.24 ms!

Jeroen Mostert

unread,
Dec 11, 2012, 3:24:35 PM12/11/12
to
I noticed the drift was ever-increasing, slowly but surely. Since I muddled with
settings quite a bit, I stopped ntpd, deleted ntp.drift and restarted it.

The results are quite interesting.

56272 68581.536 0.022139147 2.356 0.006080932 0.008049 10
56272 68781.535 0.021885071 2.357 0.005698660 0.007537 10
56272 69311.535 0.020898356 2.359 0.005342012 0.007109 10
56272 71122.534 0.023597293 0.000 0.008392764 0.000000 6
56272 71458.536 0.013381279 39.825 0.012521981 0.000000 6
56272 71524.551 0.009503058 39.825 0.011793222 0.000000 6
56272 71527.552 0.000754424 39.825 0.011456980 0.000000 6
56272 71528.552 0.000233147 39.825 0.010722584 0.000000 6
56272 71930.577 -0.013119319 39.511 0.011085491 0.111141 6
56272 72003.578 -0.016005287 39.441 0.010419607 0.106838 6
56272 72207.579 -0.031178859 39.062 0.011125504 0.167193 6
56272 72331.588 -0.032740845 38.820 0.010421598 0.178267 6
56272 72607.596 -0.036786552 38.215 0.009852891 0.271267 7
56272 72611.597 -0.048063956 38.212 0.010042013 0.253749 7
56272 72667.599 -0.050119893 38.170 0.009421524 0.237821 7
56272 72960.613 -0.050045193 37.952 0.008819790 0.235492 6
56272 73291.594 -0.049274393 36.980 0.008257379 0.408236 6
56272 73419.587 -0.049355146 36.603 0.007731784 0.404411 6

So now NTP is betting very high on the drift and then shoots the clock in the
other direction as a result.

I'll give it time to stabilize. Assuming it'll stabilize. I may end up with the
same offset at the opposite sign, who knows?

--
J.

E-Mail Sent to this address will be added to the BlackLists

unread,
Dec 11, 2012, 4:10:08 PM12/11/12
to
Jeroen Mostert wrote:
> I noticed the drift was ever-increasing, slowly but surely.
> Since I muddled with settings quite a bit, I stopped ntpd,
> deleted ntp.drift and restarted it.
> The results are quite interesting.
> 56272 68581.536 0.022139147 2.356 0.006080932 0.008049 10
> 56272 68781.535 0.021885071 2.357 0.005698660 0.007537 10
> 56272 69311.535 0.020898356 2.359 0.005342012 0.007109 10
> 56272 71122.534 0.023597293 0.000 0.008392764 0.000000 6
> 56272 71458.536 0.013381279 39.825 0.012521981 0.000000 6
> 56272 71524.551 0.009503058 39.825 0.011793222 0.000000 6
> 56272 71527.552 0.000754424 39.825 0.011456980 0.000000 6
> 56272 71528.552 0.000233147 39.825 0.010722584 0.000000 6
> 56272 71930.577 -0.013119319 39.511 0.011085491 0.111141 6
> 56272 72003.578 -0.016005287 39.441 0.010419607 0.106838 6
> 56272 72207.579 -0.031178859 39.062 0.011125504 0.167193 6
> 56272 72331.588 -0.032740845 38.820 0.010421598 0.178267 6
> 56272 72607.596 -0.036786552 38.215 0.009852891 0.271267 7
> 56272 72611.597 -0.048063956 38.212 0.010042013 0.253749 7
> 56272 72667.599 -0.050119893 38.170 0.009421524 0.237821 7
> 56272 72960.613 -0.050045193 37.952 0.008819790 0.235492 6
> 56272 73291.594 -0.049274393 36.980 0.008257379 0.408236 6
> 56272 73419.587 -0.049355146 36.603 0.007731784 0.404411 6
day, second, offset, drift, est error, stability, poll
comp

I looked at several of the last few days of loopstats
on a few machines; it appears your drift compensation
change rate is perhaps hundreds of times what mine are?

I'm not certain what that means, is the PC/Device experiencing
large Temperature Swings, or Cpu/Core Frequency/Power Management?


Ref: <http://dx.eng.uiowa.edu/dave/ntptemptext.php> ?

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

Rick Jones

unread,
Dec 11, 2012, 4:23:11 PM12/11/12
to
David Taylor <david-...@blueyonder.co.uk.invalid> wrote:
> On 11/12/2012 19:35, Rick Jones wrote:
> > Handwaving a bit... The query and the response seem to be
> > something like 90 bytes (including an Ethernet header). That then
> > is 720 bits. At 3 Mbits/s that would be 0.24 milliseconds of
> > transmit time. It would then be 0.024 milliseconds on the
> > downlink. I suspect that asymmetry is dwarfed by either queueing
> > delays (bufferbloat) when either/both are active and/or the rest
> > of the delays from your system to the server(s).

> Yes, I see what you mean, but I was thinking of the whole network,
> and not simply the up and down speeds. That we have a 10:1 ratio
> may suggest how the rest of that ISP's network is configured. The
> delays in the cable modem, Samknows box, router and 1 Gb/s switch
> are likely to be far in excess of the 0.024 ms and even the 0.24 ms!

Well, I have zero direct knowledge of the internals of an ISP these
days, but my understanding of the/a rational behind the asymmetry to
the home has to do with allocating limited
bandwidth/frequency/spectrum on the physical plant from the ISP's end
to the home.

If the ISP's internal network (beyond the "head end" or what ever it
would be called) is built on more standard ethernet-ish things, I
would expect there to be equivalent bandwidth in each direction since
to my knowledge those things have symmetric bandwidth. Of course that
doesn't guarantee symmetric routing...

rick jones
--
The glass is neither half-empty nor half-full. The glass has a leak.

David Woolley

unread,
Dec 11, 2012, 4:46:29 PM12/11/12
to
David Taylor wrote:
> On 10/12/2012 21:18, David Woolley wrote:
>> Jeroen Mostert wrote:
>>
>>>>
>>> For what it's worth, after all of that, the offset is steadily
>>> zigzagging between 27 and 41 ms, which I'm guessing is about the best
>>> you can hope for on a Windows machine with Internet sync. There have
>>> not been any major time step adjustments.
>>
>> Offsets should be scattered around zero. If they are all the same sign,
>> something is wrong.
>
> What happens if the link to the Internet is rather asymmetrical? For
> example, here I am stuck with 30 Mb/s down, but only 3 Mb/s up.

If there is an actual asymmetry in the delays (the slow uplink may have
lower delays, because it is unloaded!), the offsets will still be spread
across zero, but zero will not correspond to the true time.

Jeroen Mostert

unread,
Dec 11, 2012, 4:59:54 PM12/11/12
to
On 2012-12-11 22:10, E-Mail Sent to this address will be added to the BlackLists
wrote:
Improbable. I've admittedly not measured, but the machine has been running
constantly for three days in a room with even heating, with no significant
change in load. I don't think there's any way temperature could explain this
much drift. CPU frequency remains constant and power management has been turned off.

I suspect some driver or other is giving me trouble, preventing ntpd from
getting decent readings from the clock. Or maybe the machine is just broken
(clockwise, I mean), I guess that happens.

For kicks, I repeated the procedure -- stop ntpd, delete the drift file, start
ntpd, forcing it to do yet another drift calibration. (The servers must love
me.) The difference isn't quite so dramatic, but still notable.

56272 76491.581 -0.039118819 30.383 0.004118758 0.429325 6
56272 76748.567 -0.038717153 29.790 0.003868185 0.453044 6
56272 76884.559 -0.035856775 29.500 0.003757023 0.436066 6
56272 76929.557 -0.023728814 29.436 0.005544073 0.408522 6
56272 77378.579 -0.030387567 0.000 0.010782393 0.000000 6
56272 77719.530 0.011532945 33.821 0.031973641 0.000000 6
56272 77719.532 0.012297079 33.821 0.029910596 0.000000 6
56272 77783.546 -0.000656742 33.821 0.028351163 0.000000 6
56272 77800.543 -0.000072661 33.821 0.026522332 0.000000 6
56272 78118.559 0.001651644 33.852 0.024816859 0.011068 6
56272 78127.560 -0.003217920 33.851 0.023277801 0.010371 6
56272 78193.560 -0.007368853 33.822 0.021823790 0.014112 6
56272 78197.561 -0.014632962 33.818 0.020575203 0.013258 6
56272 78257.562 -0.015837880 33.761 0.019251054 0.023555 7
56272 78989.594 -0.023846042 33.501 0.018228934 0.094564 7
56272 79047.605 -0.024689830 33.480 0.017055101 0.088777 7

The moral of this story is, uhm... time is not on my side.

--
J.

E-Mail Sent to this address will be added to the BlackLists

unread,
Dec 11, 2012, 7:06:19 PM12/11/12
to
Jeroen Mostert wrote:
> BlackLists wrote:
>> Jeroen Mostert wrote:
>>> I noticed the drift was ever-increasing, slowly but surely.
>>> Since I muddled with settings quite a bit, I stopped ntpd,
>>> deleted ntp.drift and restarted it.
>>> The results are quite interesting.
>>> 56272 68581.536 0.022139147 2.356 0.006080932 0.008049 10
>>> 56272 68781.535 0.021885071 2.357 0.005698660 0.007537 10
>>> 56272 69311.535 0.020898356 2.359 0.005342012 0.007109 10
>>> 56272 71122.534 0.023597293 0.000 0.008392764 0.000000 6
>>> 56272 71458.536 0.013381279 39.825 0.012521981 0.000000 6 *
>>> 56272 71524.551 0.009503058 39.825 0.011793222 0.000000 6 *
>>> 56272 71527.552 0.000754424 39.825 0.011456980 0.000000 6 *
>>> 56272 71528.552 0.000233147 39.825 0.010722584 0.000000 6 *
>>> 56272 71930.577 -0.013119319 39.511 0.011085491 0.111141 6 *
>>> 56272 72003.578 -0.016005287 39.441 0.010419607 0.106838 6 *
>>> 56272 72207.579 -0.031178859 39.062 0.011125504 0.167193 6 *
>>> 56272 72331.588 -0.032740845 38.820 0.010421598 0.178267 6 *
>>> 56272 72607.596 -0.036786552 38.215 0.009852891 0.271267 7 *
>>> 56272 72611.597 -0.048063956 38.212 0.010042013 0.253749 7 *
>>> 56272 72667.599 -0.050119893 38.170 0.009421524 0.237821 7 *
>>> 56272 72960.613 -0.050045193 37.952 0.008819790 0.235492 6 *
>>> 56272 73291.594 -0.049274393 36.980 0.008257379 0.408236 6 *
>>> 56272 73419.587 -0.049355146 36.603 0.007731784 0.404411 6 *
> 56272 77719.530 0.011532945 33.821 0.031973641 0.000000 6 *
> 56272 77719.532 0.012297079 33.821 0.029910596 0.000000 6 *
> 56272 77783.546 -0.000656742 33.821 0.028351163 0.000000 6 *
> 56272 77800.543 -0.000072661 33.821 0.026522332 0.000000 6 *
> 56272 78118.559 0.001651644 33.852 0.024816859 0.011068 6 *
> 56272 78127.560 -0.003217920 33.851 0.023277801 0.010371 6 *
> 56272 78193.560 -0.007368853 33.822 0.021823790 0.014112 6 *
> 56272 78197.561 -0.014632962 33.818 0.020575203 0.013258 6 *
> 56272 78257.562 -0.015837880 33.761 0.019251054 0.023555 7 *
> 56272 78989.594 -0.023846042 33.501 0.018228934 0.094564 7 *
> 56272 79047.605 -0.024689830 33.480 0.017055101 0.088777 7 *
>
> The moral of this story is, uhm... time is not on my side.

Those latest 22 minute {stdev (0.1357614218)}
drift compensation values * appear to be changing < 13%
of the 33 minute set above {stdev (1.0766335301)} ?

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

David Lord

unread,
Dec 11, 2012, 7:20:06 PM12/11/12
to
Here if ntpd is in sync, stopped, driftfile deleted, ntpd
restarted the time taken to resync with creation of a new
driftfile can be anything from a few hours to a few days.

On the pcs that have taken a few days to resync the drift
will probably be quite large, maybe > 50 ppm but offset
after a restart will usually be < 1 ms within around around
30 minutes (only slightly longer than a restart of a pc
with drift nearer to 0 ppm).

I've had pcs with a system clocks so far off that ntpd is
unable to correct and sometimes a fixed manual correction
has achieved both a low drift and offset but more recent
ntpd seems to have automatic calibration which defeats such
manual correction (at least ntpd 4.2.6p5 on NetBSD-6).

What's the value of your driftfile?

pc offset(ms) frequency(ppm)
ntp0 0.275 -50.8
ntp1 -0.106 -10.9

The offsets above are subject to a fair degree of jitter but
frequency is fairly constant depending on temperature.


David

Jeroen Mostert

unread,
Dec 12, 2012, 2:35:22 AM12/12/12
to
I've just observed the same behavior on a server running Windows Server 2012
which is far better at keeping track of time -- it can take a really, really
long time before the drift stabilizes.

I've decided to terminate the NTP experiment for my home machine. Normally, it's
turned off for the night, and in the interest of conserving power I'll return to
that policy.

Even with the problems ntpd has it's still better than nothing, and as long as
it can keep the actual offset within one second (a very loose goal it has no
trouble meeting) it's acceptable. My machine isn't used for anything where
accurate time is critical, or even useful.

> What's the value of your driftfile?
>
At present, 32.980, but the value is still trending downwards.

--
J.

David Taylor

unread,
Dec 12, 2012, 3:15:12 AM12/12/12
to
So, if the PC also has a stratum-1 clock connected (e.g. PPS), to which
it was synced, then the offsets would be spread across the the non-zero
difference between "computed true time from the Internet" and UTC,
wouldn't they? In this example:

C:\Users\David>ntpq -p narvik
remote refid st t when poll reach delay offset
jitter
==============================================================================
*pixie .PPS. 1 u 20 32 377 0.153 0.371
0.024
+FEENIX .PPS. 1 u 25 32 377 0.217 0.308
0.018
+Stamsund .PPS. 1 u 25 32 377 0.221 0.420
0.050
uk.pool.ntp.org .POOL. 16 p - 1024 0 0.000 0.000
0.001
-utserv.mcc.ac.u 194.66.31.14 2 u 800 1024 377 23.998 10.566
7.972
-linnaeus.inf.ed 129.215.64.32 2 u 9 1024 377 29.272 3.559
3.337
-ntp.uk.syrahost 192.93.2.20 2 u 36 1024 377 27.957 4.717
1.551
-cpc2-derb13-2-0 213.248.206.227 2 u 922 1024 377 26.324 3.124
3.752
-ntp1.exa-networ 33.117.170.50 2 u 750 1024 377 42.560 4.435
1.156
-ntp2.warwicknet 195.66.241.2 2 u 660 1024 377 22.819 4.221
4.440

PC Narvik is synced to a local stratum-1 server, and the WAN servers
listed below the POOL entry, mostly have offsets in the 3-4 millisecond
region, suggesting (if I understand this correctly) that the delay
introduced by the asymmetry of the WAN may be in the 3-4 millisecond region.

Jeroen Mostert

unread,
Dec 12, 2012, 3:58:15 PM12/12/12
to
Here's some loopstats from a brand new machine running Windows Server 2012:

56273 56211.040 -0.000185248 20.113 0.000065610 0.002127 8
56273 57117.052 0.000487392 20.115 0.000245606 0.002073 8
56273 57381.058 0.000771702 20.115 0.000250771 0.001958 8
56273 58424.072 0.000542051 20.118 0.000248229 0.001977 8
56273 59490.107 0.000320045 20.119 0.000245105 0.001903 8
56273 61613.141 0.000101314 20.120 0.000241965 0.001801 8
56273 62411.152 -0.000139791 20.119 0.000241858 0.001691 8
56273 63476.176 -0.000219043 20.118 0.000227966 0.001611 8
56273 65585.223 -0.000361511 20.116 0.000219111 0.001795 8
56273 66112.226 -0.000879125 20.114 0.000274771 0.001787 8
56273 66382.233 -0.000775453 20.113 0.000259625 0.001694 8
56273 66645.246 -0.000818638 20.112 0.000243336 0.001610 8
56273 68744.282 -0.000901985 20.105 0.000229520 0.002861 8
56273 69007.289 -0.000850863 20.105 0.000215455 0.002693 8
56273 69542.304 -0.000541853 20.103 0.000229247 0.002548 8
56273 69801.304 -0.000579627 20.103 0.000214857 0.002391 8
56273 70070.309 -0.000507633 20.102 0.000202585 0.002244 8
56273 70335.311 -0.000527417 20.102 0.000189630 0.002107 8
56273 72431.353 -0.000577564 20.097 0.000178267 0.002512 8
56273 72968.365 -0.000503697 20.096 0.000168786 0.002377 8

This is syncing with Internet time through the NL NTP pool.

I've set up four other machines to sync with this server, two virtual and two
physical. The virtual machines are running NT 6.0, the physical machines 6.1
(client/server flavors not relevant here). The virtuals do not have the
multimedia timer enabled, since I've read that this can negatively affect
performance of the host (which makes sense, since driving all virtuals with 1000
interrupts/sec can't be easy). Even so, ntpd achieves sub-millisecond sync with
the local server on all of them almost all of the time. I've registered three
spikes on the virtual machines so far, of 2, 5 and 10 ms.

To say I'm pleased with the results would be an understatement. In fact, I'm now
tempted to try and solve the second problem, which is that no matter how
accurately ntpd syncs the clock, there is no Windows API which will allow me to
read its value with that accuracy! On the 2012 Server,
GetSystemTimePreciseAsFileTime() is available, but most of our machines are
still Windows Server 2008/2008 R2 (even with some 2003 server still hanging
around). On the physical machines, GetSystemTimeAsFileTime() will get the time
with 1 ms resolution since the MM timer will be enabled, which is good enough.
On the virtual machines, however, GSTAFT cannot do better than the 15.625 ms
resolution it has by default.

I suppose it ought to be possible to expose enough of ntpd's internal state
through shared memory to reconstruct what it thinks of the time on demand. There
are obvious challenges (like appropriate locking and quickly detecting if ntpd
has gone away so we can switch back to GSTAFT). I don't suppose anyone has
experience with this? :-)

--
J.

David Taylor

unread,
Dec 12, 2012, 5:40:29 PM12/12/12
to
On 12/12/2012 20:58, Jeroen Mostert wrote:
[]
Yes, that's looking better! I have tried two Win-8/Server-2012 PCs, but
only one showed that better performance. At least you can get that
performance without needing a Linux or FreeBSD box.

There is an issue with getting the time more precisely, and I found some
code on the Internet which I translated to Delphi and had a play with.
Results are here:

http://www.satsignal.eu/ntp/TSCtime.html

original page:

http://www.lochan.org/2005/keith-cl/useful/win32time.html

Perhaps something like that will help you? But please check it
carefully! Virtual PCs are another issue altogether, of course.

Jeroen Mostert

unread,
Dec 12, 2012, 7:22:56 PM12/12/12
to
Yes... but I'm probably getting one anyway, because, well, it's more fun that
way. :-) We had the spare server anyway. And Linux still has a better NTP track
record than Windows, even with the recent improvements.

> There is an issue with getting the time more precisely, and I found some code on
> the Internet which I translated to Delphi and had a play with. Results are here:
>
> http://www.satsignal.eu/ntp/TSCtime.html
>
> original page:
>
> http://www.lochan.org/2005/keith-cl/useful/win32time.html
>
> Perhaps something like that will help you? But please check it carefully!

That looks promising, I'll look into it. The calibration stuff looks fiddly,
though, I'd need to automate that effectively before I could put it into
production use.

We should be able to do even better by simply calling the
get_sys_time_as_filetime pointer from nt_clockstuff.c, which will point to
either GetSystemTimePreciseAsFileTime(), GetSystemTimeAsFileTime() or ntpd's own
GetInterpTimeAsFileTime() routine, depending on what the OS supports. That would
use ntpd's own interpolation if interpolation is being used, which has proven to
be highly trustworthy (in my case, at least, I understand mileage varies). The
basic approach cannot reliably get sub-millisecond timestamps (since the best
GSTAFT can do is a millisecond-accurate timestamp if the MM timer is active) but
I don't need that anyway.

In theory, I could just extract the NT code from ntpd (minus clock disciplining,
of course) and see how well it works when it's not running in a realtime thread,
but I suspect the answer is "not at all". It looks incredibly tight. Even so it
should be possible to have ntpd do the fiddly stuff with realtime threads and
keep the results that can't be replicated in shared memory.

I definitely need to effectively steal other people's code, because if I
actually dug in and spent time to understand this stuff well enough to write my
own code I suspect I might as well kiss Christmas goodbye. I probably need to
remind myself that I have already met the original goal and then some.

> Virtual PCs are another issue altogether, of course.

Yes, on the plus side, VMs running on the same host have almost identical drift
and network characteristics; on the negative side it's easy for a VM to slip out
of sync when there's load.

--
J.

Jeroen Mostert

unread,
Dec 12, 2012, 7:30:14 PM12/12/12
to
VM #1:

56273 85876.796 0.000018374 50.694 0.000021662 0.002233 4
56273 86004.805 0.000008291 50.695 0.000020574 0.002119 4
56273 86052.822 0.000033144 50.696 0.000021156 0.002054 4
56273 86180.829 -0.000007628 50.695 0.000024483 0.001949 4
56273 86196.829 0.000020398 50.696 0.000024953 0.001827 4
56273 86260.831 0.000005560 50.696 0.000023924 0.001713 4
56273 86388.856 0.000038796 50.701 0.000025276 0.002317 4

VM #2:

56273 85960.089 0.000015957 50.623 0.000131482 0.005627 4
56273 86039.794 0.000027317 50.625 0.000123056 0.005314 4
56273 86167.801 0.000089203 50.636 0.000117169 0.006288 4
56273 86183.801 0.000108884 50.637 0.000109822 0.005911 4
56273 86199.807 0.000062115 50.638 0.000104051 0.005539 4
56273 86327.809 0.000041898 50.644 0.000097593 0.005488 4
56273 86343.811 0.000038536 50.644 0.000091298 0.005138 4
56273 86375.811 0.000047414 50.646 0.000085459 0.004833 4

I mean, seriously, wow. This is just begging for some high accuracy timestamp
reading.

--
J.

E-Mail Sent to this address will be added to the BlackLists

unread,
Dec 12, 2012, 8:41:25 PM12/12/12
to
Jeroen Mostert wrote:
> GetSystemTimePreciseAsFileTime() is available, but most
> of our machines are still Windows Server 2008/2008 R2
> (even with some 2003 server still hanging around).
> On the physical machines, GetSystemTimeAsFileTime() will
> get the time with 1 ms resolution since the MM timer
> will be enabled, which is good enough.
> On the virtual machines, however, GSTAFT cannot do better
> than the 15.625 ms resolution it has by default.

ntpq -c "rv &0 precision"
should give a clue how close you can expect to get on a given machine.

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

Jeroen Mostert

unread,
Dec 13, 2012, 12:53:57 AM12/13/12
to
On 2012-12-13 02:41, E-Mail Sent to this address will be added to the BlackLists
wrote:
> Jeroen Mostert wrote:
>> GetSystemTimePreciseAsFileTime() is available, but most
>> of our machines are still Windows Server 2008/2008 R2
>> (even with some 2003 server still hanging around).
>> On the physical machines, GetSystemTimeAsFileTime() will
>> get the time with 1 ms resolution since the MM timer
>> will be enabled, which is good enough.
>> On the virtual machines, however, GSTAFT cannot do better
>> than the 15.625 ms resolution it has by default.
>
> ntpq -c "rv&0 precision"
> should give a clue how close you can expect to get on a given machine.
>
>
I assume you mean 'ntpq -c "rv 0 precision"'. Your command gives me an error.

I'm not sure that means much. On the physical machines I get precision=-10,
which makes sense since the multimedia timer is on and the resolution of GSTAFT
is 1 ms. On the virtuals, however, I get -19 (ntpd 4.2.6p5) and -20 (ntpd
4.2.7p310). That may reflect the numerical precision of the interpolated clock,
but that doesn't mean actually reading it with that precision is useful.

--
J.

David Woolley

unread,
Dec 13, 2012, 3:08:22 AM12/13/12
to
Jeroen Mostert wrote:

> negatively affect performance of the host (which makes sense, since
> driving all virtuals with 1000 interrupts/sec can't be easy). Even so,

You wouldn't expect 1ms ticks. You might get 20 ticks back to back, and
then a gap of 20ms, or even larger numbers. VMs generally only get a
virtual 1 second per second as a long term average.

There has been conflicting advice on running ntpd on VMs, but I suspect
the current advice to run it is an easy way of getting coarse timing
right. Fine timing should use the host timing and any special VM
support for the OS.

Mischanko, Edward T

unread,
Dec 13, 2012, 3:53:16 AM12/13/12
to
> _______________________________________________
> questions mailing list
> ques...@lists.ntp.org
> http://lists.ntp.org/listinfo/questions
[Mischanko, Edward T]

I have always read that NTP should not be run Virtual Machines. NTP should only be running on the "Real" machine with a hardware system clock. If the hardware machine is in synch, then the VM on the hardware machine should also be in synch.

Jeroen Mostert

unread,
Dec 13, 2012, 3:27:02 PM12/13/12
to
On 2012-12-13 09:53, Mischanko, Edward T wrote:
> I have always read that NTP should not be run Virtual Machines. NTP should
> only be running on the "Real" machine with a hardware system clock. If the
> hardware machine is in synch, then the VM on the hardware machine should also
> be in synch.

That can only work if there's some sort of driver or integration patch that
makes the guest OS retrieve its time directly from the host's hardware clock.
Otherwise, the guest OS maintains its own notion of time based on whatever
hardware is being virtualized.

In the case of Windows on VMWare (which is what I'm using), Windows simply
maintains time just like it does on actual hardware, which is by taking the
initial time from the CMOS and then maintaining ticks with timer interrupts.
VMWare will simply emulate the interrupts, but Windows keeps its own time. Thus
the guest OS clock is subject to losing or gaining ticks depending on the
hypervisor's scheduling, which has nothing to do with the host's clock.

VMWare has a feature that integrates time synchronization with the guest OS, but
this is not meant as an accurate timekeeper, just as a stopgap measure for
keeping up the clock if the VM lags behind too far. It will look once every
minute to see if the guest is too far behind, and if it is, it will simply step
the clock to the "correct" time. This is far more crude than what ntpd does.

It's still important to keep the host clock synchronized as well, to make sure
the VM has accurate time if it's turned off or migrated to another host in the
cluster, but (at least in the case of VMWare) you cannot count on the host clock
to keep the guest clock accurate to the kind of accuracy that NTP can achieve.
VMWare has a whitepaper covering it in more detail, see
http://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf.


--
J.

Jeroen Mostert

unread,
Dec 13, 2012, 3:30:52 PM12/13/12
to
On 2012-12-13 09:08, David Woolley wrote:
> Jeroen Mostert wrote:
>
>> negatively affect performance of the host (which makes sense, since driving
>> all virtuals with 1000 interrupts/sec can't be easy). Even so,
>
> You wouldn't expect 1ms ticks. You might get 20 ticks back to back, and then a
> gap of 20ms, or even larger numbers. VMs generally only get a virtual 1 second
> per second as a long term average.
>
If your virtualization is that poor, get a refund. :-) However, it's certainly
true that you cannot count on high-frequency ticks in a VM. There's simply not
enough horsepower on the guest to achieve that, unless you've got a one-to-one
mapping between actual and virtual hardware (and then there's not much point to
virtualizing).

> There has been conflicting advice on running ntpd on VMs, but I suspect the
> current advice to run it is an easy way of getting coarse timing right. Fine
> timing should use the host timing and any special VM support for the OS.

I agree. In the case of Windows on VMWare, however, there doesn't seem to be any.

--
J.

E-Mail Sent to this address will be added to the BlackLists

unread,
Dec 13, 2012, 3:52:54 PM12/13/12
to
Jeroen Mostert wrote:> BlackLists wrote:
>> ntpq -c "rv&0 precision"
>> should give a clue how close you can expect to get on
>> a given machine.
>>
> I assume you mean 'ntpq -c "rv 0 precision"'.
> Your command gives me an error.

Yes, however it should work with the '&' on newer 4.2.7 flavors.

> I'm not sure that means much. On the physical machines
> I get precision=-10, which makes sense since the multimedia
> timer is on and the resolution of GSTAFT is 1 ms.
> On the virtuals, however, I get -19 (ntpd 4.2.6p5) and -20
> (ntpd 4.2.7p310).
> That may reflect the numerical precision of the interpolated
> clock, but that doesn't mean actually reading it with
> that precision is useful.

That Precision is supposed to represent the time it takes
to read the system clock.

So whatever that is, you shouldn't expect a application
to be able to "get time" more often.

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

unruh

unread,
Dec 13, 2012, 6:06:59 PM12/13/12
to
Perhaps you should follow the advice you gave in the first sentence of
you response.

>

unruh

unread,
Dec 13, 2012, 6:08:48 PM12/13/12
to
On 2012-12-13, E-Mail Sent to this address will be added to the BlackLists <Nu...@BlackList.Anitech-Systems.invalid> wrote:
> Jeroen Mostert wrote:> BlackLists wrote:
>>> ntpq -c "rv&0 precision"
>>> should give a clue how close you can expect to get on
>>> a given machine.
>>>
>> I assume you mean 'ntpq -c "rv 0 precision"'.
>> Your command gives me an error.
>
> Yes, however it should work with the '&' on newer 4.2.7 flavors.
>
>> I'm not sure that means much. On the physical machines
>> I get precision=-10, which makes sense since the multimedia
>> timer is on and the resolution of GSTAFT is 1 ms.
>> On the virtuals, however, I get -19 (ntpd 4.2.6p5) and -20
>> (ntpd 4.2.7p310).
>> That may reflect the numerical precision of the interpolated
>> clock, but that doesn't mean actually reading it with
>> that precision is useful.
>
> That Precision is supposed to represent the time it takes
> to read the system clock.

It is also supposed to reflect the time between changes of the system

Jeroen Mostert

unread,
Dec 13, 2012, 6:25:45 PM12/13/12
to
I'm an engineer. I don't pay bills, I just foot new ones working around the
limitations caused by the people paying the bills.

--
J.

Jeroen Mostert

unread,
Dec 13, 2012, 6:37:03 PM12/13/12
to
On 2012-12-14 00:25, Jeroen Mostert wrote:
> I'm an engineer. I don't pay bills, I just foot new ones

Ugh. Of course I'm not footing the new ones either. Current system uptime is
19.5 hours; wizard needs sleep, badly.

--
J.

David Woolley

unread,
Dec 13, 2012, 7:12:04 PM12/13/12
to
E-Mail Sent to this address will be added to the BlackLists wrote:

>
> That Precision is supposed to represent the time it takes
> to read the system clock.

It is the smallest non-zero difference between two readings of the
clock. If the clock is very high resolution, it may be close to the
time to read the clock. If the clock is low resolution, it is
determined by the resolution of the clock.
>
> So whatever that is, you shouldn't expect a application
> to be able to "get time" more often.

If the clock can only be read to 10ms (no interpolation) you can expect
multiple reads.
>

E-Mail Sent to this address will be added to the BlackLists

unread,
Dec 13, 2012, 7:49:38 PM12/13/12
to
... without the timestamp changing though.

I guess I could have worded it better.

Whatever ntpq -c "rv 0 precision" returns;
an application is unlikely to be able to get "changing"
timestamps at any higher rate.

Better?

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

Jan Ceuleers

unread,
Dec 22, 2012, 9:36:21 AM12/22/12
to
On 12/11/2012 12:49 PM, David Taylor wrote:

Sorry: catching up.

> What happens if the link to the Internet is rather asymmetrical? For
> example, here I am stuck with 30 Mb/s down, but only 3 Mb/s up.

The actual bitrate is not so important. True: it determines the time a
packet spends on the wire. But more important is (or can be) the amount
of time a packet spends in various queues before actually being sent.
This time varies with instantaneous network load, and with the size of
the queue. Google for "bufferbloat", and apologies if everyone here
already knows all of this.

Having said that: there can indeed be asymmetrical transmission delays
that are linked to the technology being used. My VDSL2 modem tells me
that the downstream delay is 14.1ms and the upstream delay is 4.4ms. The
ratio of these numbers is not equal to the ratio of the downstream and
upstream bitrates (which are 16544 kbit/s and 2056 kbit/s respectively).
So note also that the downstream delay is greater than the upstream
delay, although the downstream bitrate is higher.

HTH, Jan

David Taylor

unread,
Dec 22, 2012, 9:57:17 AM12/22/12
to
Jan, yes, I appreciate that it's not the rate as such, I was really
trying to show how asymmetrical (10:1) the connection provided by my ISP
was. If they are trying to pack in as many customers as possible, who
knows what the actual backbone might be like? This is a cable modem
connection, by the way, and not ADSL.

I wonder how I could get the delay figures for my own modem? The only
data I can extract is: Latency: ~20 milliseconds, RTP jitter - up: 4
milliseconds, down: ~0.12 milliseconds.

Fascinating to see you have less delay on the slower upstream!

Rob

unread,
Dec 22, 2012, 11:25:43 AM12/22/12
to
Jan Ceuleers <jan.ce...@computer.org> wrote:
> Having said that: there can indeed be asymmetrical transmission delays
> that are linked to the technology being used. My VDSL2 modem tells me
> that the downstream delay is 14.1ms and the upstream delay is 4.4ms. The
> ratio of these numbers is not equal to the ratio of the downstream and
> upstream bitrates (which are 16544 kbit/s and 2056 kbit/s respectively).
> So note also that the downstream delay is greater than the upstream
> delay, although the downstream bitrate is higher.

This is caused by interleaving.

Packets are not sent back to back sequentially, but they are split into
smaller chunks that are interleaved serially.
This means a single packet is spread out over time, meaning that a hit
of interference takes out a smaller part of each packet. And that again
means that the error correction coding that is added to each packet has
more chance of recovering the packet.
(it is better to hit 8 packets each with a small amount of interference
than to hit a single packet with a big blow)

This improves the performance for streaming, but it increases the delay
because the receiving modem has to wait longer for the packet to be
complete. As the downstream with its higher bitrate is more affected
by interference, more interleaving is usually applied to it as well.

David Lord

unread,
Dec 22, 2012, 12:49:31 PM12/22/12
to
My own usage is more like 15:1 down:up so my 13:1.2 Mbps down:up
data rates seem to fit quite well.

When on 2 Mbps I had problems with very high latency on uploads
and implemented altq traffic shaping which increased my maximum
upload speed from around 150 kbps to around 250 kbps. There is
also higher priority for icmp, dns and ntp.

David

David Taylor

unread,
Dec 22, 2012, 2:06:29 PM12/22/12
to
On 22/12/2012 17:49, David Lord wrote:
[]
> My own usage is more like 15:1 down:up so my 13:1.2 Mbps down:up
> data rates seem to fit quite well.
>
> When on 2 Mbps I had problems with very high latency on uploads
> and implemented altq traffic shaping which increased my maximum
> upload speed from around 150 kbps to around 250 kbps. There is
> also higher priority for icmp, dns and ntp.
>
> David

I don't have the figures to hand, but for some protocols (FTP?) there
are acknowledgement packets sent back every so often, and I think that
10:1 is somewhere near the limit. With a greater up:down ratio the
limiting factor in downstream speed is actually the upstream bandwidth!
Something like that, anyway. It matters less what your usage is.

I just happened to find a page which tells me that in the last 9 days I
have downloaded 15 GB and uploaded 6 GB.

Hal Murray

unread,
Dec 22, 2012, 7:11:49 PM12/22/12
to
In article <kb4hoe$9cm$1...@dont-email.me>,
David Taylor <david-...@blueyonder.co.uk.invalid> writes:

>I wonder how I could get the delay figures for my own modem? The only
>data I can extract is: Latency: ~20 milliseconds, RTP jitter - up: 4
>milliseconds, down: ~0.12 milliseconds.

You can measure the delays. After the typical client-server
exchange you have 4 time stamps. If you assume the network
latency is symmetric you can compute the clock offset.
If you assume both clocks are accurate you can compute
the network delays.

If you collect a bunch of data, it's reasonable to assume
that the lowest delays are when the queues are empty. Any
longer delays are due to queuing.


>Fascinating to see you have less delay on the slower upstream!

The queuing delays depend upon the traffic. You control that.

--
These are my opinions. I hate spam.

Jan Ceuleers

unread,
Dec 23, 2012, 8:03:04 AM12/23/12
to
On 12/23/2012 01:11 AM, Hal Murray wrote:
>> Fascinating to see you have less delay on the slower upstream!
>
> The queuing delays depend upon the traffic. You control that.

The delays I quoted are fixed delays linked to the modulation, encoding
and other parameters used on my VDSL2 line (the physical layer, if you
like). Another contributor suggested that the greater downstream delay
than the upstream delay may be due to interleaving, and this sounds like
a good hypothesis.

Queuing delays come on top.

David Taylor

unread,
Dec 24, 2012, 6:52:47 AM12/24/12
to
Thanks for that suggestion Hal. I've written a program to do just that,
and I am seeing down delays of a moderately consistent 4 ms, and up
delays of between 20 and 30 ms.

The program was interesting in that measuring times to better than 15 ms
resolution on Windows requires interpolation, and I had already
developed that from for code I found on the Internet. I think this was
it's first use "in anger".

http://www.satsignal.eu/ntp/TSCtime.html
0 new messages