Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ntp, adjtimex, etc.

138 views
Skip to first unread message

david carlton

unread,
Jul 8, 2002, 5:09:21 PM7/8/02
to
Before I start, I apologize if any of this is in the FAQ or has
recently been discussed here; I tried poking around in
groups.google.com, but I'm sure I missed some useful threads. (And I
apologize for posting so many adjtimex questions on an NTP newsgroup.)

I bought a laptop (Dell Inspiron 8200) that I'm running Linux on (Red
Hat 7.3). It has pretty serious time problems when running Linux,
though not when running Windows (it's off by about one percent); this
seems to be endemic to these laptops and related models.

I'm trying to manage this; it seems like perhaps a combination of NTP
and adjtimex is the way to go. I'm running an NTP server on a desktop
machine that I control, and I'm synchronizing to that; that server
seems reliable enough for my purposes.

More info before I get to my questions: the NTP RPM is ntp-4.1.1-1,
and the adjtimex RPM is adjtimex-1.12-2. NTP is being invoked as
"ntpd -U ntp -g". The laptop is connected to the internet most of the
time that it's running Linux; it's rare that it runs Linux for more
than a couple of hours at a stretch.

Here are some questions:

* Why does NTP require me to punch a hole in my firewall for it, even
when it's only acting as a client? I'm sure there's a good reason
for it to be designed that way, but I'm curious what it is.

* Is the basic idea of using adjtimex --tick <tick> --frequency <freq>
to get the clock set more or less correctly, and then using NTP to
manage small differences, a sound one?

* If so, how do I determine the correct value of <tick> and <freq>?
Here is where things get bizarre: if I run adjtimex --compare, I
sometimes get a <tick> value of 10096, and sometimes get a value of
9996! This seems really freaky to me. I haven't figured out
exactly when one happens and when the other happens; it might be the
case that it's 10096 when I'm running X and 9996 when I'm not, but
I'm not at all sure about that. (And I haven't done enough
observations to be sure that those are the only two possible
values.)

* Is there likely to be anything in my startup/shutdown scripts that
save the kernel time variables, or should I run adjtimex myself
every time I boot? If the latter, should I try to ensure that this
happens before ntpd gets started? I guess that's not all that hard:
I could put a script in the appropriate rcN.d that saves the kernel
time variables in a file whenever it's stopped and runs adjtimex
with those values. Though that raises the question of what to do
about the fact that multiple values of the variables seem
appropriate. (If I start up in runlevel 3 (text mode) and then do
'startx', am I then in runlevel 3 or runlevel 5?)

* How important is it for me to get <freq> exactly right, and what's
the best way to calculate it? Assuming that I have <tick> right but
not <freq>, will running NTP long enough cause <freq> to be set
correctly? If so, how long is long enough?

* I want to make sure that, when the system clock isn't working well,
the hardware clock doesn't get synced to it. Am I correct in
thinking that, when NTP is working reasonably well, the hardware
clock will get synced every 11 minutes? And that this will be the
case if (though not necessarily only if) the status indicator of
adjtimex --print is 1? So, in this case, it might be reasonable for
me to never sync the hardware clock explicitly when halting, and
instead have all syncing done by this method?

* Just what does the number in /etc/ntp/drift mean, and when/how is it
used?

* Any guesses as to why the clock is behaving in this strange way?

Sorry for the long list of questions. I'm trying to figure this out
myself, but it takes a while...

David Carlton | <http://math.stanford.edu/~carlton/>
car...@math.stanford.edu | Go books: <http://www.gobooks.info/>

I fill MY industrial waste containers with old copies of the
``WATCHTOWER'' and then add HAWAIIAN PUNCH to the top.. They
look NICE in the yard--

Johan Swenker

unread,
Jul 9, 2002, 6:39:21 AM7/9/02
to
In article <ro1fzyu...@jackfruit.Stanford.EDU>,
car...@math.stanford.edu says...

>
>I bought a laptop (Dell Inspiron 8200) that I'm running Linux on (Red
>Hat 7.3). It has pretty serious time problems when running Linux,
>though not when running Windows (it's off by about one percent); this
>seems to be endemic to these laptops and related models.
>
>
<skip on the other questions>

>* Any guesses as to why the clock is behaving in this strange way?
This happens because the laptop thinks it is more important to
save battery power (even when on mains) that to have accurate time.
(as you might know, when computers start to think, they go wrong),

On one of my laptops the required frequency correction shifts
so madly, ntpd has to set the time (approx. 0.3 sec) every half hour.
This was the case untill I accedently noticed the laptop has a
stable time frequency when the program xosview was running. So now
I start X-windows, just for a decent clock :(

Succes, Johan

david carlton

unread,
Jul 9, 2002, 11:38:59 PM7/9/02
to
In article <ageegp$7m1$1...@hdxl22.telecom.ptt.nl>, J.B.S...@kpn.com (Johan Swenker) writes:
> In article <ro1fzyu...@jackfruit.Stanford.EDU>,
> car...@math.stanford.edu says...

>> * Any guesses as to why the clock is behaving in this strange way?

> This happens because the laptop thinks it is more important to save
> battery power (even when on mains) that to have accurate time. (as
> you might know, when computers start to think, they go wrong),

Are you sure about this? (As you say, this happens even when on
mains.) If so, is there any way that I can see within Linux what is
causing this to happen (e.g. by looking somewhere in /proc)?

I confess, it seems strange to me that some sort of power-saving
behavior would cause a systematic error of, as far as I can tell from
my observations since my last post, almost exactly 1% (setting the
kernel time variable 'tick' to 10100 rather than 10000 gets good
behavior). Then again, whatever explanation I get will end up being
strange, given the sporadic nature of this problem (I've heard today
from people on the linux-dell-laptops mailing list who have had this
problem spontaneously appear or go away recently after having their
laptop keep time correctly or incorrectly for months).

David Carlton | <http://math.stanford.edu/~carlton/>
car...@math.stanford.edu | I8200: <http://math.stanford.edu/~carlton/i8200/>

Johan Swenker

unread,
Jul 10, 2002, 6:12:19 AM7/10/02
to
In article <ro1elec...@jackfruit.Stanford.EDU>, car...@math.stanford.edu
says...

>
>In article <ageegp$7m1$1...@hdxl22.telecom.ptt.nl>, J.B.S...@kpn.com (Johan
Swenker) writes:
>> In article <ro1fzyu...@jackfruit.Stanford.EDU>,
>> car...@math.stanford.edu says...
>
>>> * Any guesses as to why the clock is behaving in this strange way?
>
>> This happens because the laptop thinks it is more important to save
>> battery power (even when on mains) that to have accurate time. (as
>> you might know, when computers start to think, they go wrong),
>
>Are you sure about this? (As you say, this happens even when on
>mains.) If so, is there any way that I can see within Linux what is
>causing this to happen (e.g. by looking somewhere in /proc)?
>
No, I am not sure. It is on the other hand the only explanation I
can think of. A laptop is a thightly integrated system of power supply,
"modem", mouse emulator, keyboard, etc. linked together with firmware.
That firmware can do almost anything with the computer. My laptop
for instance starts a windows program to go into hybernation (okey,
I admit, it crashes when I hit that key). So a lot goes on, where that
you cannot really influence.

I din't look good enough in /proc to find anything usefull.
Maybe you find something interesting in /proc/interrupts (the
line NMI or the line ERR).

When anyone has a better explanation on what happens inside a
laptop, I would like to learn.

Regards, Johan Swenker

Michael Deutschmann

unread,
Jul 10, 2002, 1:26:05 PM7/10/02
to
On 10 Jul 2002, Johan Swenker wrote:
> [... discussing about time deviations of laptop]

> When anyone has a better explanation on what happens inside a
> laptop, I would like to learn.

Laptops are complicated and exotic beasts, sure. But I've heard that this
particular problem is relatively simple.

Since the later model 486s, IA-32 CPUs have provided a feature called
System Management Mode. This allows the motherboard to pre-empt the
operating system (using an extra interrupt pin) and run specific BIOS
routines at any time. The OS cannot control or avoid this pre-emption.

Apparently some computer manufacturers use SMM to make the processor do
long-term monitoring of temperature, battery, whatever. So they connect
the SMM pin to a clock, and takeover for brief periods several (many?)
times a second.

On such a machine, the system timer appears slow. But it's really at
normal speed -- it's just that the OS is missing a certain portion of the
ticks. Apparently laptops are most often built this way, since there's
lots of stuff to monitor. If the monitoring interrupt takes constant
time to run, then the clock looks consistently slow over the long term.

Crude, exaggerated ascii-art explaining it (monospace font please):

Ticks for normal clock:
X X X X X X X X X X X X X

Ticks for SMM-addled clock:
X X X X X X X X X

Ticks for genuinely slow clock:
X X X X X X X X X X X

There is of course a seperate issue with laptops and time -- if an
OS-bypassing suspend-to-disk is performed, NTP will lose zillions of
ticks and be quite upset when restored.

Some laptops have variable processor speed, which completely breaks TSC
and busyloop methods of keeping time. But I think NTP uses that only for
fine tuning anyway.

---- Michael Deutschmann <mic...@talamasca.ocis.net>

Hal Murray

unread,
Jul 11, 2002, 3:43:11 AM7/11/02
to

>Apparently some computer manufacturers use SMM to make the processor do
>long-term monitoring of temperature, battery, whatever. So they connect
>the SMM pin to a clock, and takeover for brief periods several (many?)
>times a second.

I'm slightly surprised that it takes long enough to lose normal
clock/scheduler interrupts.


>On such a machine, the system timer appears slow. But it's really at
>normal speed -- it's just that the OS is missing a certain portion of the
>ticks. Apparently laptops are most often built this way, since there's
>lots of stuff to monitor. If the monitoring interrupt takes constant
>time to run, then the clock looks consistently slow over the long term.

This would work OK if NTP used the cycle counter rather than
depending upon the scheduler/clock interrupts. Right?


>Some laptops have variable processor speed, which completely breaks TSC
>and busyloop methods of keeping time. But I think NTP uses that only for
>fine tuning anyway.

This, of course, breaks using the cycle counter. :)

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.

david carlton

unread,
Jul 11, 2002, 8:42:36 PM7/11/02
to
In article <agh1a3$fj7$1...@hdxl22.telecom.ptt.nl>, J.B.S...@kpn.com (Johan Swenker) writes:

> I din't look good enough in /proc to find anything usefull.
> Maybe you find something interesting in /proc/interrupts (the
> line NMI or the line ERR).

Thanks for the suggestion.

David

david carlton

unread,
Jul 12, 2002, 12:19:01 AM7/12/02
to
In article <uiqdofr...@corp.supernews.com>, hmu...@suespammers.org (Hal Murray) writes:
> Michael Deutschmann <mic...@talamasca.ocis.net> wrote:

>> Apparently some computer manufacturers use SMM to make the
>> processor do long-term monitoring of temperature, battery,
>> whatever. So they connect the SMM pin to a clock, and takeover for
>> brief periods several (many?) times a second.

> I'm slightly surprised that it takes long enough to lose normal
> clock/scheduler interrupts.

Now that you mention it, I am too, especially to miss a full 1% of
them.

Still, his explanation is better than any others I've heard so far...

Jonathan Buzzard

unread,
Jul 12, 2002, 11:56:45 AM7/12/02
to
In article <%ULGL...@khar-pern.talamasca.ocis.net>,

Michael Deutschmann <mic...@talamasca.ocis.net> writes:
> On 10 Jul 2002, Johan Swenker wrote:
>> [... discussing about time deviations of laptop]
>> When anyone has a better explanation on what happens inside a
>> laptop, I would like to learn.
>
> Laptops are complicated and exotic beasts, sure. But I've heard that this
> particular problem is relatively simple.
>
> Since the later model 486s, IA-32 CPUs have provided a feature called
> System Management Mode. This allows the motherboard to pre-empt the
> operating system (using an extra interrupt pin) and run specific BIOS
> routines at any time. The OS cannot control or avoid this pre-emption.

Nothing can, even NMI are masked when in SMM mode.

> Apparently some computer manufacturers use SMM to make the processor do
> long-term monitoring of temperature, battery, whatever. So they connect
> the SMM pin to a clock, and takeover for brief periods several (many?)
> times a second.

Almost all laptops and many desktop and server motherboards use SMM.
In laptops it is often used to do many of the power saving things.
It is certainly present in *all* Toshiba laptops, and most Dell, HP
and Compaq ones as well.

> On such a machine, the system timer appears slow. But it's really at
> normal speed -- it's just that the OS is missing a certain portion of the
> ticks. Apparently laptops are most often built this way, since there's
> lots of stuff to monitor. If the monitoring interrupt takes constant
> time to run, then the clock looks consistently slow over the long term.
>
>

> There is of course a seperate issue with laptops and time -- if an
> OS-bypassing suspend-to-disk is performed, NTP will lose zillions of
> ticks and be quite upset when restored.

Not that bad, it complains it has lost sync and then steps the clock
about a couple of minutes after it comes out of suspend.

>
> Some laptops have variable processor speed, which completely breaks TSC
> and busyloop methods of keeping time. But I think NTP uses that only for
> fine tuning anyway.
>

Lots of laptops have that.

JAB.

--
Jonathan A. Buzzard Email: jona...@buzzard.org.uk
Northumberland, United Kingdom. Tel: +44(0)1661-832195

david carlton

unread,
Jul 16, 2002, 10:13:55 PM7/16/02
to
In article <ro1fzyu...@jackfruit.Stanford.EDU>, david carlton <car...@math.stanford.edu> writes:

> * Any guesses as to why the clock is behaving in this strange way?

If anybody cares, it seems to be linked to /proc/apm: if I (or other
people with this laptop) do

while /bin/true; do cat /proc/apm > /dev/null; done

the clock lossage becomes enormous (the system clock runs at about
half speed).

I'll stop posting about this here now, since this issue really isn't
related to NTP at all.

David Carlton | <http://math.stanford.edu/~carlton/>
car...@math.stanford.edu | I8200: <http://math.stanford.edu/~carlton/i8200/>


lag

unread,
Jul 19, 2002, 12:45:39 AM7/19/02
to
On Thursday 11 July 2002 21:19 david carlton wrote:

> In article <uiqdofr...@corp.supernews.com>, hmu...@suespammers.org
> (Hal Murray) writes:
>> Michael Deutschmann <mic...@talamasca.ocis.net> wrote:
>
>>> Apparently some computer manufacturers use SMM to make the
>>> processor do long-term monitoring of temperature, battery,
>>> whatever. So they connect the SMM pin to a clock, and takeover for
>>> brief periods several (many?) times a second.
>
>> I'm slightly surprised that it takes long enough to lose normal
>> clock/scheduler interrupts.
>

I would also like to suggest that you try to find any sort of pattern
to this time loss - In the past I've heard that sometimes dynamically
loaded modules might not be very nice to other apps (such as ntpd) around
them. So say if you have a driver for say a sound card/video card/etc
that is continually being thrashed then you may run into a situation of
losing time accuracy as the kernel may be busy giving up the CPU to the
modules. Try loading them all if you can upfront and see if this helps.

-lag

0 new messages