Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Leap Second testing

153 views
Skip to first unread message

Phil Fisher

unread,
May 9, 2012, 4:44:23 AM5/9/12
to
Hi folks
Some of you may recall a question I posted several weeks ago (and to which principally Dave Hart replied) about Leap Second and a testing scenario for our systems.

I am processing this at the moment and found what seems to be interesting information, unexpected to me.

I was using ntptime to set the status bits for a system running a server (Centos 5, ntpd 4.2.2p1). Tis allowed me to set the LI indicator and apparently to clear it. The output from ntptime is shown at the end of this message. When I look at the syslog (/var/log/messages) I see what seems to be a +1 second jump at about 0200 earlier today -- my system runs BST (== GMT+1/UTC+1). Even more bizarre I see that a leap second was inserted at about 0100!

Why bizarre? Because although I had turned on the LI bit via ntptime/adjtimex I had also cleared it almost immediately since I was testing possibilities not implementation. Subsequent checks seemed to show that the LI was clear in the status. Therefore I was not expecting any Leap Second activity.

So, returning to one of my original post questions, am I seeing the effect of Linux/NTP history here? Or is it, regretfully, just plain stupidity on my part somewhere? And even if it is stupidity, please can someone explain (via personal email if necessary) exactly how and why my stupidity occurred?

Phil.
<information>
# ntptime commands and results
$ /usr/sbin/ntptime -r
ntp_gettime() returns code 1 (INS)
time d353c908.34ae7000 Tue, May 8 2012 17:32:08.205, (.205787),
maximum error 383107 us, estimated error 1037 us ntptime=d353c908.34ae7000 unixtime=4fa94a88.205787 Tue May 8 17:32:08 2012

ntp_adjtime() returns code 1 (INS)
modes 0x0 (),
offset 1227.000 us, frequency 57.619 ppm, interval 1 s,
maximum error 383107 us, estimated error 1037 us,
status 0x1 (PLL),
time constant 6, precision 1.000 us, tolerance 512 ppm,

$ sudo /usr/sbin/ntptime -s 17
ntp_gettime() returns code 1 (INS)
time d353c930.a1626000 Tue, May 8 2012 17:32:48.630, (.630407),
maximum error 403587 us, estimated error 1037 us
ntp_adjtime() returns code 1 (INS)
modes 0x10 (STATUS),
offset 1215.000 us, frequency 57.619 ppm, interval 1 s,
maximum error 403587 us, estimated error 1037 us,
status 0x11 (PLL,INS),
time constant 6, precision 1.000 us, tolerance 512 ppm,
$ /usr/sbin/ntptime -r
ntp_gettime() returns code 1 (INS)
time d353c938.fd23d000 Tue, May 8 2012 17:32:56.988, (.988828),
maximum error 407683 us, estimated error 1037 us ntptime=d353c938.fd23d000 unixtime=4fa94ab8.988828 Tue May 8 17:32:56 2012

ntp_adjtime() returns code 1 (INS)
modes 0x0 (),
offset 1213.000 us, frequency 57.619 ppm, interval 1 s,
maximum error 407683 us, estimated error 1037 us,
status 0x11 (PLL,INS),
time constant 6, precision 1.000 us, tolerance 512 ppm,

$ sudo /usr/sbin/ntptime -s 1
ntp_gettime() returns code 1 (INS)
time d353c95c.0a4e2000 Tue, May 8 2012 17:33:32.040, (.040255),
maximum error 426115 us, estimated error 1037 us
ntp_adjtime() returns code 1 (INS)
modes 0x10 (STATUS),
offset 1202.000 us, frequency 57.619 ppm, interval 1 s,
maximum error 426115 us, estimated error 1037 us,
status 0x1 (PLL),
time constant 6, precision 1.000 us, tolerance 512 ppm,

# Other NTP info checked
ntpdc> sysinfo
system peer: xxxxxx.ipaccess.com
system peer mode: client
leap indicator: 00
stratum: 3
precision: -20
root distance: 0.00099 s
root dispersion: 0.06125 s
reference ID: [172.28.0.133]
reference time: d353ec61.5dfe7016 Tue, May 8 2012 20:02:57.367
system flags: auth monitor ntp kernel stats
jitter: 0.000351 s
stability: 0.000 ppm
broadcastdelay: 0.003998 s
authdelay: 0.000000 s

ntpq> readvar
assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
version="ntpd 4.2...@1.1570-o Sat Dec 19 00:58:16 UTC 2009 (1)",
processor="i686", system="Linux/2.6.18-128.el5", leap=00, stratum=3,
precision=-20, rootdelay=1.005, rootdispersion=65.252, peer=39032,
refid=172.28.0.133,
reftime=d353ec61.5dfe7016 Tue, May 8 2012 20:02:57.367, poll=10,
clock=d353f418.44749276 Tue, May 8 2012 20:35:52.267, state=4,
offset=0.767, frequency=57.638, jitter=0.425, noise=0.778,
stability=0.023, tai=0

# Final check before leaving
$ /usr/sbin/ntptime -r
ntp_gettime() returns code 1 (INS)
time d353ff78.0b062000 Tue, May 8 2012 21:24:24.043, (.043062),
maximum error 981562 us, estimated error 715 us ntptime=d353ff78.b062000 unixtime=4fa980f8.043062 Tue May 8 21:24:24 2012

ntp_adjtime() returns code 1 (INS)
modes 0x0 (),
offset 16.000 us, frequency 57.638 ppm, interval 1 s,
maximum error 981562 us, estimated error 715 us,
status 0x1 (PLL),
time constant 6, precision 1.000 us, tolerance 512 ppm,

# /var/log/messages (syslog) extracts earlier today 9 May
May 8 21:28:23 rhel006 ntpd[2531]: synchronized to 172.28.0.2, stratum 2
May 9 00:59:59 rhel006 kernel: Clock: inserting leap second 23:59:60 UTC
May 9 01:40:01 rhel006 ntpd[2531]: synchronized to 172.28.0.133, stratum 2
May 9 02:01:42 rhel006 ntpd[2531]: time reset +0.997521 s
May 9 02:05:24 rhel006 ntpd[2531]: synchronized to 172.28.0.2, stratum 2
May 9 02:12:53 rhel006 ntpd[2531]: synchronized to 172.28.0.133, stratum 2

</information>
--
Phil Fisher





This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.

ip.access Ltd, registration number 3400157, Building 2020,
Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom

Dave Hart

unread,
May 9, 2012, 10:24:27 AM5/9/12
to
On Wed, May 9, 2012 at 8:44 AM, Phil Fisher <Phil....@ipaccess.com> wrote:
> I was using ntptime to set the status bits for a system running a server
> (Centos 5, ntpd 4.2.2p1).  Tis allowed me to set the LI indicator and
> apparently to clear it.  The output from ntptime is shown at the end of this
> message. When I look at the syslog (/var/log/messages) I see what
> seems to be a +1 second jump at about 0200 earlier today -- my system
> runs BST (== GMT+1/UTC+1). Even more bizarre I see that a leap
> second was inserted at about 0100!
>
> Why bizarre?  Because although I had turned on the LI bit via
> ntptime/adjtimex I had also cleared it almost immediately since I was
> testing possibilities not implementation.  Subsequent checks seemed
> to show that the LI was clear in the status.  Therefore I was not expecting
> any Leap Second activity.

It appears there are two ways a pending leap insertion is indicated by
ntp_gettime/ntp_adjtime. You were paying attention to the status word
0x10 bit. The other way is the return value of the function. Notice
even your last ntptime invocation reported (INS) regarding the return
values from ntp_gettime and ntp_adjtime.

No, I don't know why the two are not in sync. I'm not particularly
worried about it, either, unless it causes a real-world problem.

Cheers,
Dave Hart

Phil Fisher

unread,
May 9, 2012, 11:12:12 AM5/9/12
to
Dave
Thanks for the reply.

I have now checked what has happened even more carefully and I conclude the problem originated with the installation of the adjtimex program/RPM.

It would seem I (or someone else not sure) ran adjtimex -status to set the leap indicator after it was installed (probably me but my history setting ain't that good).

Once this was done, then ntptime indicates that it has been set even if the current status shows it is not. In my opinion this is at best fallacious if not a bug. If the Leap indicator is cleared then I would not expect it to be remembered somewhere else why bother clearing it and providing the facilities to clear it.

Therefore my testing scenario will have to ensure that LI has not been set at any point. I assume from your comments both in response to this post and earlier ones that LI can only be propagated by an upstream NTP server (assuming here that the client is not a stratum 1 server of course) to the downstream client on the day that a Leap Second should be implemented (and by day I understand this to be the UTC day which could be a whole can of worms in itself when people think of local time and when this can occur).

If this is correct, then using ntptime -r once can check that a LI has not already been seen (since that will show a return status of 1 (INS) while the status will show as 17 (0x0011) (PLL,INS) typically. If we try to cancel it then the results will show return status of 1 (INS) but the status as 1 (PLL). It will be important for me to detect this (even if the consensus of NTP gurus is it would not occur and is not a problem) since I might need to take action that stops the possibility of the Linux kernel bug occurring.

I think it might well be important in real world scenarios where an incorrect deployment may have triggered an incorrect LI to downstream clients. There should be a way to clear this (and ntptime -s clearly allows this) but not completely hence leading to an incorrect adding of a leap second and possibly in older Linux kernels a system crash when logging the leap second event.

Phil

Dave Hart

unread,
May 9, 2012, 11:43:06 AM5/9/12
to
On Wed, May 9, 2012 at 3:12 PM, Phil Fisher <Phil....@ipaccess.com> wrote:
> Once this was done, then ntptime indicates that it has been set even if the current status shows it is not.  In my opinion this is at best fallacious if not a bug.  If the Leap indicator is cleared then I would not expect it to be remembered somewhere else why bother clearing it and providing the facilities to clear it.

It's not clear to me why the discrepancy can exist on Linux or if it
can be seen in any other implementations of the precision kernel
extensions for NTP.

> Therefore my testing scenario will have to ensure that LI has not been set at any point.  I assume from your comments both in response to this post and earlier ones that LI can only be propagated by an upstream NTP server (assuming here that the client is not a stratum 1 server of course) to the downstream client on the day that a Leap Second should be implemented (and by day I understand this to be the UTC day which could be a whole can of worms in itself when people think of local time and when this can occur).

There is no ambiguity about when a leap insert can happen. It happens
between the last day of a UTC quarter and the first day of the
following quarter. In practice, it's always been between the last day
of a half-year and the following half-year, but the specs allow for
April 1 and October 1 as well. Local time plays no part in scheduling
the insertion. ntpd enforces the end-of-quarter requirement. Your
testing shows Linux ntp_adjtime is willing to schedule and execute a
leap insertion any day of the year.

> If this is correct, then using ntptime -r once can check that a LI has not already been seen (since that will show a return status of 1 (INS) while the status will show as 17 (0x0011) (PLL,INS) typically. If we try to cancel it then the results will show return status of 1 (INS) but the status as 1 (PLL).  It will be important for me to detect this (even if the consensus of NTP gurus is it would not occur and is not a problem) since I might need to take action that stops the possibility of the Linux kernel bug occurring.
>
> I think it might well be important in real world scenarios where an incorrect deployment may have triggered an incorrect LI to downstream clients.  There should be a way to clear this (and ntptime -s clearly allows this) but not completely hence leading to an incorrect adding of a leap second and possibly in older Linux kernels a system crash when logging the leap second event.

If you were testing on a server while it was used by production
clients, you could well induce them to schedule a leap insertion, and
based on your results, even if that decision were reversed before
midnight UTC, I bet the insertion would proceed (as their kernels
would not deschedule the insertion when status bit 0x10 is cleared by
ntpd). It would be interesting to hear if the latest Linux kernels
have a similarly sticky notion of a pending leap insertion.

Cheers,
Dave Hart

E-Mail Sent to this address will be added to the BlackLists

unread,
May 10, 2012, 5:08:21 PM5/10/12
to
On 3/6/2012 2:28 AM, Phil Fisher wrote:
> I would appreciate help since we need to cover the scenario where our
> 2.6.9 Linux kernel
> may crash/hang when a leap Second insert occurs (due to the printk bug).
>
> Some context to avoid repetitive simple answers along lines of upgrade XYZ:
> * we cannot move to the later 2.6.29 kernel for commercial reasons
> * we have an old NTP implementation:
> ntpd: ntpd 4.2...@1.1190-r Mon Oct 11 09:10:20 EDT 2004

REHL 4 Linux kernel 2.6.9 , circa 2004 Oct
Linux kernel 2.6.29 , circa 2009 Mar-Jul
NTP 4.2.0 , circa 2003 Oct


On 5/9/2012 1:44 AM, Phil Fisher wrote:
> Hi folks
> I was using ntptime to set the status bits for a system running a server
> (Centos 5, ntpd 4.2.2p1).
> This allowed me to set the LI indicator and apparently to clear it.

RHEL 5 Linux kernel 2.6.18 , circa 2006 Sept
NTP 4.2.2p1 , circa 2006 Jul


While I understand you may not be allowed to update your
production systems to a current kernel and current ntp;

Have you tried current versions on a test system,
to see if the issue has already been fixed?

Then you could just research the changes,
and see if you could make your system avoid? the issue
until you can upgrade to a version that no longer has the issue.

RHEL 6 Linux kernel 2.6.32 , circa 2009 Dec
Linux kernel 3.3.5 , circa 2012 May
NTP 4.2.6p5 , circa 2011 Dec
NTP 4.2.7p275 , circa 2012 Apr

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.
0 new messages