On 2012-12-09 15:14, David Taylor wrote:
> On 09/12/2012 10:17, Jeroen Mostert wrote:
> []
>> Drift is currently at -2.3, and no abnormally high/low values have been
>> recorded.
>>
>> Loopstats collection has been on since the beginning. From the period
>> where these big adjustments happen, I get some suspicious data:
>>
>> 56269 38100.218 0.000000000 19.486 0.000000238 0.120120 9
>> 56269 38105.421 -0.054125528 19.486 0.019136264 0.112362 9
>> 56269 38109.421 -0.054433286 19.486 0.017900666 0.105105 9
>> 56269 39160.464 -0.072777326 19.415 0.017956687 0.101492 9
>> 56269 41770.153 0.000000000 19.415 0.000000238 0.094937 9
>> 56269 41775.482 -0.002025903 19.415 0.000716265 0.088805 9
>> 56269 41905.412 -0.044579941 19.409 0.015060036 0.083092 9
>> 56269 43888.511 -0.066275609 19.287 0.016040319 0.088960 9
>> 56269 47068.130 0.000000000 19.287 0.000000238 0.083214 9
>>
>> The 0 offsets suggest ntpd regularly thinks we're now in perfect sync,
>> something which is certainly not true. I don't know how to properly
>> interpret the error and stability values.
>>
<snip>
>> The "pool" directive had some effect (I now have 9 servers instead of 4)
>> and initially the offset stayed under 10 ms, but it seems that as soon
>> as the poll interval goes above 64, the offset starts slipping --
>> currently at 30 ms. I'll keep things under observation; I get the
>> feeling it hasn't quite stabilized yet.
>
> I wonder whether you 0 offset values mean that NTP has restarted itself, perhaps
> after making one the time jumps?
>
No, the service is definitely running continuously.
> My NTP Plotter program will take an input file or directory from the
> command-line, as documented in the read-me, so you need only set up a batch
> command once and double-click it to see the current data.
Fair enough. I have no problem not getting a File -> Open menu from the likes of
gnuplot, but that's because that *only* has a command-line interface so I expect
no better. If something starts up with a GUI, though, I expect to be able to use
it without reading documentation. Funny how that goes. :-)
> I'm always open to user suggestions, and so far no-one else has asked for a
> File|Open dialogue! But you are right that the program expects at least a
> loopstats file (or directory) to be dropped. It finds the peerstats
> automatically.
>
Provided it is named "peerstats.somethingsomething", right?
Unfortunately, the easy-to-find pages for troubleshooting NTP at
http://www.ntp.org/ntpfaq/NTP-s-trouble.htm arbitrarily rename the files to
"loops" and "peers", which is what I've been using (and other folks too, I'd
wager). I'm going to remove those options, but not just yet (since I don't want
to stitch files together).
> The primary documentation for NTP is the set of HTML pages, not "manpages".
Well, you're right.
http://www.eecis.udel.edu/~mills/ntp/html/confopt.html
describes this option, as well as the Meinberg docs (which are an earlier
version). The information seems to be "ungoogleable" in that you must read the
whole thing top to bottom before you know it exists, but that is of course no
excuse.
It is a shame that nothing *outside* the reference documentation (in particular
quick-start guides) seems to describe the use of this option, though. It's also
unfortunate that ntpd has been around so long (with relatively few changes) that
outdated documentation is a dime a dozen.
> If you are still seeing stepping, I would want to investigate that further. It
> could be that your Internet connection is not so good (any Wi-Fi involved?), but
> it didn't look like that from the stats. One possibility may be to set maxpoll
> lower than 10 in the pool directive. It should be higher than 6 (64 seconds) to
> avoid being /too/ unfriendly to the servers you are using, but perhaps 8 (256
> seconds) might prevent things for getting so far out that a reset is required.
>
I would figure this might fix things, but I'm holding off on that because (as
you say) this is not friendly on servers and would also not fix/identify the
root cause -- if my admittedly meager understanding of how NTP works is correct,
then it shouldn't be necessary to poll the servers very frequently unless
there's something inherently non-linearly wrong with your clock.
> I don't think we're at the bottom of this yet. Are you running any other
> software which might attempt to set the time? The W32time service is disabled
> and stopped? No fancy audio-visual programs being run? Nothing which completely
> hogs the CPU or saturates the network connection? Just some odd things to think
> about!
Nothing of the sort. It is a consumer PC used for everything, including gaming,
and although no gaming has been going on there could in theory be some exotic
driver or piece of software mucking things up in a non-obvious way. If there is,
though, I have no idea how to find it other than through an extremely tedious
bisection that's really not worth it. If it turns out that NTP cannot work
without stepping my clock every so often, I can actually live with that. Not so
much if it happens on "real" machines, of course. This is strictly a research
project.
It is worth reporting, however, that no stepping has been occurring since I
first reported the problem. This could be due to a simple restart of ntpd and
not through any options I've tinkered with. I've tinkered with a *lot* of them,
and since I'm not a professional scientist, I have made no attempt at systematic
analysis of all options separately and combined, of course, just continued
tweaking while I wasn't satisfied :-)
I'm going to leave it alone for a bit to see the results after ntpd gets some
time to stabilize. For reference, I've done all of the following so far:
- Upgraded to ntpd 4.2.7p310.
- Replaced individual "server" lines with a single "pool" line.
- Adjusted power management options within Windows to make processors run at 100%.
- Updated the BIOS. (Also scratched my nose, which I expect has the same effect.)
- Turned off explicit processor frequency stepping options in the BIOS.
- Used "bcdedit /set useplatformclock on" to force exclusive use of the HPET
(I'm guessing this does nothing at the moment since interpolation isn't used).
- Last but not least, restarted ntpd.
Offset are still horrid (>30 ms) but I notice the drift is swinging up too, so
my guess is that ntpd hasn't fixed on a good value yet after all the tinkering
(and/or my clock is just bogus for reasons yet unknown).
If that doesn't work out, I may take apart the network traffic (including the
DSL router) to see if that has anything to do with anything. Delay and jitter on
the NTP packets seem fairly high (although I don't know if that would explain
continuous bad offsets).
--
J.