It didn't take too long before everything destabilized and ntpd stopped.
In this case it just spun out and dropped sync but still remained
responsive.
Belowis the main log snippet just before it went haywire. The rest of
the logs will get posted online for download. Everything was working
pretty well (just sys_fuzz messages showing up in the logs prior to
this) and then there's a sudden set of spike detects as the clock
wobbles and falls over (no_sys_peer). It's still performing
calculations and responding to ntpq queries but it's no longer
controlling the clock and, according to the offsets with the external
servers, the clock was now over 30 seconds off.
The times below are UTC. In the remaining logs, the incident occurs
somewhere near 56454d 32900s. I'm leaving the full log intact since
0000 UTC so you can see any trends. Everything is quite stable right up
until very near 56454 32906. According to the clockstats file, the
offset jumps from -0.000022767 to 0.004875262 in one second. Ten
seconds later the offset is 0.034419235. It ramps up until it hits
0.497311954 and suddenly changes sign to -0.497750784. Then it ramps
down to zero, ramps up to 0.496989164 and the sign flips again.
The clockstats doesn't show anything else out of the ordinary. It has
the line from the SHM driver showing 16 samples received and some of
them being bad but the majority good. But this happens for quite a
while and can be seen throughout the clockstats file.
In the loops file, the PPM value was sitting stable near -75.68 when it
suddenly starts to change (dropping to -74.3, then -71.6 then -67.85)
and then a rapid jump to -139.457 where it steadily dropped to -376
until I killed the process.
The loops file doesn't seem to show anything abnormal until the event
occurs but I'm not certain. Same for the rawstats and sysstats files.
I also don't really see anything in peers.
The diagnostic log files are at
http://acarver.net/ntpd/crashdata/
Jun 11 06:16:54 sunipx2 ntpd[7545]: ts_min 1370931414.425737206 ts_prev
1370931414.425632206 ts 1370931414.462732423
Jun 11 06:16:54 sunipx2 ntpd[7545]: sys_fuzz 105000 nsec, this fuzz
0.000041586, prior 0.000088949
Jun 11 06:16:54 sunipx2 ntpd[7545]: get_systime prev result
0xd5613f56.7a3a2e65 is 0.014676266 later than 0xd5613f56.76785b81
Jun 11 09:09:02 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.167383 s
Jun 11 09:11:26 sunipx2 ntpd[7545]: 0.0.0.0 0115 05 clock_sync
Jun 11 09:12:22 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.157203 s
Jun 11 09:14:54 sunipx2 ntpd[7545]: 0.0.0.0 0115 05 clock_sync
Jun 11 09:15:42 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.147021 s
Jun 11 09:18:14 sunipx2 ntpd[7545]: 0.0.0.0 0115 05 clock_synckk
Jun 11 09:19:02 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.136838 s
Jun 11 09:21:34 sunipx2 ntpd[7545]: 0.0.0.0 0115 05 clock_sync
Jun 11 09:22:30 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.166043 s
Jun 11 09:24:54 sunipx2 ntpd[7545]: 0.0.0.0 0115 05 clock_sync
Jun 11 09:25:50 sunipx2 ntpd[7545]: 0.0.0.0 0113 03 spike_detect +0.155848 s
Jun 11 09:26:59 sunipx2 ntpd[7545]: 0.0.0.0 0118 08 no_sys_peer
Jun 11 09:50:12 sunipx2 ntpd[7545]: ts_min 1370944211.340914811 ts_prev
1370944211.340809811 ts 1370944212.335866579
Jun 11 09:50:12 sunipx2 ntpd[7545]: sys_fuzz 105000 nsec, this fuzz
0.000002690, prior 0.000025571
Jun 11 09:50:12 sunipx2 ntpd[7545]: get_systime prev result
0xd5617154.564041fa is 0.001012620 later than 0xd5617154.55fde50a
Jun 11 12:44:12 sunipx2 ntpd[7545]: ts_min 1370954651.338258199 ts_prev
1370954651.338153199 ts 1370954652.344166908
Jun 11 12:44:12 sunipx2 ntpd[7545]: sys_fuzz 105000 nsec, this fuzz
0.000079607, prior 0.000012962
Jun 11 12:44:12 sunipx2 ntpd[7545]: get_systime prev result
0xd5619a1c.58628820 is 0.001006960 later than 0xd5619a1c.58208a22
Jun 11 14:26:14 sunipx2 ntpd[7545]: ntpd exiting on signal 15 (Terminated)