after ids

vit...@gmail.com

unread,

Apr 9, 2008, 8:57:28 PM4/9/08

to

I've noticed that in [set id [after $time cmd]], the $id gets
incremented sequentially as you use it more and more (the old ids do
not get reused). For example, you would have after ids like this:

after#1
after#2
...
...
after#999999
...

Normally, his is not an issue, but what would happen in some cases,
like in a long running web server, that uses [after] for connection
time-out/clean up? What if that server stays up for a year or more?
Would it eventually run out of it's limit of ids? Is there a limit? Or
the ids eventually wrap around to the beginning?

Eric Hassold

unread,

Apr 9, 2008, 9:34:15 PM4/9/08

to

vit...@gmail.com wrote :

Here is verbatim copy of comment found in tclTimer.c:

/*
* The variable below is used to generate unique identifiers for after
* commands. This id can wrap around, which can potentially cause
* problems. However, there are not likely to be problems in practice,
* because after commands can only be requested to about a month in
* the future, and wrap-around is unlikely to occur in less than about
* 1-10 years. Thus it's unlikely that any old ids will still be
* around when wrap-around occurs.
*/

That said, replacing
int afterId;
with
Tcl_WideInt afterId;

in ThreadSpecificData structure used to generate after handle wouldn't
hurt either. Note that in the web server scenario, if this server uses
threads, each threads will have its own afterId sequence.

Eric

vit...@gmail.com

unread,

Apr 10, 2008, 1:01:30 AM4/10/08

to

Thanks Eric

George Peter Staplin

unread,

Apr 10, 2008, 9:24:10 AM4/10/08

to

For what it's worth I've wanted to rewrite some of the after/tclTimer.c
code for better performance.

My idea is to replace what is currently a linked list of items with a
table of pointers based on reusable integer offsets, and sort the
TimerHandler structs utilizing a double-linked list. Thus the behavior
of things like: [after info] and [after cancel] could be done in near
constant time. Also, the cleanup of timers is done in constant time
(via the use of the double-linked list, and offset table.)

Basically the after id becomes the offset into table. You can think of
it as working like a file descriptor table in a unix-like kernel. This
means the entire timer queue wouldn't need to be searched.

With data structures like this:
"typedef struct TimerHandler {
Tcl_Time time; /* When timer is to fire. */
Tcl_TimerProc *proc; /* Function to call. */
ClientData clientData; /* Argument to pass to proc. */
Tcl_TimerToken token; /* Identifies handler so it can be
deleted. */
struct TimerHandler *nextPtr;
/* Next event in queue, or NULL for end
of
* queue. */

+ struct TimerHandler *prevPtr; /* The previous event in the queue or
NULL. */
} TimerHandler;"

typedef struct TimerTable {
int used; /* The number of table entries used. */
int allocated; /* The number of table entries allocated. */
TimerHandler **table;
TimerHandler *listHead; /* The head of the sorted TimerHandler
linked-list. */
} TimerTable;

George

Alexandre Ferrieux

unread,

Apr 10, 2008, 10:35:02 AM4/10/08

to

On Apr 10, 3:24 pm, George Peter Staplin

<georgepsSPAMME...@xmission.com> wrote:
> For what it's worth I've wanted to rewrite some of the after/tclTimer.c
> code for better performance.

Interesting. Did you enter this in the tracker as a Feature Request ?

-Alex

Bruce Hartweg

unread,

Apr 10, 2008, 1:22:37 PM4/10/08

to

George Peter Staplin wrote:

>
> For what it's worth I've wanted to rewrite some of the after/tclTimer.c
> code for better performance.
>

Mind if I ask why?
Have you really seen this as a performance issue in an
actual application or is this just an exercise in redesign.

Bruce

Darren New

unread,

Apr 11, 2008, 11:35:23 AM4/11/08

to

Bruce Hartweg wrote:
> Have you really seen this as a performance issue in an
> actual application or is this just an exercise in redesign.

I have, yes. I had to change from queuing up 10,000 actions at the
start to having each action figure out what the next one was and the
appropriate offset and queue that up.

--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."

George Peter Staplin

unread,

Apr 12, 2008, 9:35:29 AM4/12/08

to

Darren New wrote:
> Bruce Hartweg wrote:
>> Have you really seen this as a performance issue in an
>> actual application or is this just an exercise in redesign.
>
> I have, yes. I had to change from queuing up 10,000 actions at the
> start to having each action figure out what the next one was and the
> appropriate offset and queue that up.

It's good to know I'm not alone in wanting better performance out of
tclTimer.c.

I also have some code for TSD in tclThreadStorage.c and some
per-platform code relating to that which should improve performance by
eliminating the hash table lookups, and replacing them with integer
offsets that refer to data in the table essentially. I've been working
with Kevin B. Kenny on that. It will only use 1 native TSD slot
per-platform, which should be good considering all of the TSD that Tcl
uses.

George

Alexandre Ferrieux

unread,

Apr 12, 2008, 10:31:52 AM4/12/08

to

On Apr 12, 3:35 pm, George Peter Staplin

<georgepsSPAMME...@xmission.com> wrote:
>
> I also have some code for TSD in tclThreadStorage.c and some
> per-platform code relating to that which should improve performance by
> eliminating the hash table lookups, and replacing them with integer
> offsets that refer to data in the table essentially. I've been working
> with Kevin B. Kenny on that. It will only use 1 native TSD slot
> per-platform, which should be good considering all of the TSD that Tcl
> uses.

Good to know you're working in sync with Kevin; but you didn't answer
my previous question: is there a Tracker entry for a feature request
of all this ? The intent is not to look over your shoulder to
criticize; rather, to make sure that spot stays ons the radar screen,
and possibly educate ourselves.

-Alex

Uwe Klein

unread,

Apr 12, 2008, 11:09:05 AM4/12/08

to

Alexandre Ferrieux wrote:
..is there a Tracker entry for a feature request

> of all this ? The intent is not to look over your shoulder to
> criticize; rather, to make sure that spot stays ons the radar screen,
> and possibly educate ourselves.
>
> -Alex

I once proposed to have
[after cancel $afterid]
return something sensible like 0/1 (
depending on an event cancled or not)
or the remaining time.

having to catch "after cancel" to get around
expired events is rather unwieldy.

uwe

George Peter Staplin

unread,

Apr 12, 2008, 11:31:38 AM4/12/08

to

Oh, sorry, it was only due to my busy morning and distractions that I
neglected to answer. There is no feature request for the tclTimer.c
ideas yet. Feel free to make one based on my post, and any of your
ideas relating to that. Also, pointing out that another person agreed
it's a problem in some cases would help.

The TSD code is on the radar for now. I may post to comp.lang.tcl about
it, seeing as I'm not subscribed to tcl-core. We will have to see what
happens. I personally wish more of the Tcl community was involved with
dev work on the core, so that we get (hopefully) the best ideas and
algorithms used in the core.

I also want to thank you for your efforts put into the fcopy bugs. I
think that extra nudge and your bug hunting/fixing helped a lot. So,
thank you very much for that.

Kind regards,

George

Alexandre Ferrieux

unread,

Apr 12, 2008, 12:51:34 PM4/12/08

to

On Apr 12, 5:31 pm, George Peter Staplin

<georgepsSPAMME...@xmission.com> wrote:
>
> Oh, sorry, it was only due to my busy morning and distractions that I
> neglected to answer. There is no feature request for the tclTimer.c
> ideas yet. Feel free to make one based on my post, and any of your
> ideas relating to that. Also, pointing out that another person agreed
> it's a problem in some cases would help.

I will do this with pleasure, but I would like to exchange a bit more
with you before.
I confirm my interest in seeing turned from linear to constant time
whatever lends itself to it in the core.
In order to do this, one solution is your approach, namely a more
elaborate data structure than the simple linked list and ever-
incremented ID. I've done exactly this in several of my own projects.

Now in the context of Tcl, there's an alternative: define a Tcl_Type
for after ids. This is currently done for procnames, variable names,
indexes, and even more recently for channels -- why not for afters ?
Of course, it won't work when the after id shimmers out. But letting
this happen requires a seriously contorted setup (like hiding after
ids in strings instead of just storing them in the usual containers).

I have no string opinion yet, I'd just like to know your analysis.

-Alex

Alexandre Ferrieux

unread,

Apr 12, 2008, 12:54:06 PM4/12/08

to

On Apr 12, 6:51 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

>
> I have no string opinion yet, I'd just like to know your analysis.

Heck, I failed to obtain the proper strong rep ;-)

-Alex

George Peter Staplin

unread,

Apr 12, 2008, 6:30:31 PM4/12/08

to

A Tcl_ObjType might be beneficial for Tcl_AfterObjCmd. Basically
instead of:
Tcl_SetObjResult(interp, Tcl_ObjPrintf("after#%d", afterPtr->id));

We would return a Tcl_Obj with a Tcl_ObjType that is just for use in
tclTimer.c. Normally this Tcl_Obj probably won't shimmer, and thus
[after canel] and [after info] would be faster, because the
objPtr->internalRep would have an integer value that we use as an offset
with the TimerHandler **table; If the string did shimmer we can
reinterpret the string rep, and convert it back to the Tcl_ObjType.

If say we used a representation like: after#123 (like the current
implementation): we would do something like:

int
GetTimerFromObj(Tcl_Obj *objPtr, long *offsetPtr) {
const char *prefix = "after#"
char *objStr;
int objStrLen;
size_t len = strlen(prefix);

if (objPtr->typePtr == &afterType) {
*offsetPtr = objPtr->internalRep->longValue;
return TCL_OK;
}

objStr = Tcl_GetString(objPtr, &objStrLen);

if (!strncmp(prefix, objStr, len) && objStrLen > len) {
long offset;

/* Tcl's strtol-like code using objStr + len and objStrLen - len */

/* free the internal rep */

objPtr->typePtr = &afterType;
objPtr->internalRep->longValue = offset;
*offset = offset;

}
Tcl_SetResult(interp, "inavlid id", TCL_STATIC);
return TCL_ERROR;
}

The table is allocated using something like:

table = (void *)Tcl_Alloc(sizeof(*table) * numTimers);

Then when we want to acccess a TimerHandler struct for after id or after
cancel:

struct TimerTable *t = GetTimerTableSomehow();
long offset;

if (GetTimerFromObj(obj, &offset) != TCL_OK) {
return TCL_ERROR;
}

if (offset >= 0 && offset < t->allocated && t->table[offset]) {
TimerHandler *timer = t->table[offset];
} else {
Tcl_SetResult(interp, "invalid id", TCL_STATIC);
return TCL_ERROR;
}

/* Remove this link from the double-linked list. */
if (timer->prevPtr) {
timer->prevPtr->next = timer->nextPtr;
}

if (timer->nextPtr) {
timer->nextPtr->prevPtr = timer->prevPtr;
}

t->table[timer->slot] = NULL;

----

Now say we have just run a timer, to remove it is as simple as:

t->table[timer->slot] = NULL;

Whew, that was long. Anyway, that's my basic idea.

George

George Peter Staplin

unread,

Apr 12, 2008, 6:43:38 PM4/12/08

to

+ Delink the sorted double-linked list :)

>
> Whew, that was long. Anyway, that's my basic idea.

In hindsight it makes more sense for GetTimerFromObj to take a
Tcl_Interp * and set the result when it's an invalid object, followed by
a return of TCL_ERROR. I'll have to check on the interalRep
freeing/replacement code that's required.

The rest should be fairly simple, after a lot of review, and testing.
Feel free to run with the idea. I may not have the time and motivation
for a while. I'd be happy to review any changes though.

George

Alexandre Ferrieux

unread,

Apr 13, 2008, 5:38:27 AM4/13/08

to

On Apr 13, 12:43 am, George Peter Staplin

> George- Hide quoted text -
>
> - Show quoted text -

George, I must have been unclear, sorry.
What I meant was that the Timer internal rep could just hold one of
today's pointers to a linked list's cell.
This means a very tiny incremental change, no need to overhaul the
underlying data structure.
I was thinking about it as an alternative to your proposal; but again,
I have no clear view yet of which one is better, I'm ready to lean
either way... So I was rather expecting arguments than code (we both
know how to do all this) ;-)

-Alex

Schelte Bron

unread,

Apr 13, 2008, 7:31:26 AM4/13/08

to

Uwe Klein wrote:
> I once proposed to have
> [after cancel $afterid]
> return something sensible like 0/1 (
> depending on an event cancled or not)
> or the remaining time.
>

It does return something sensible; an empty string. Which is exactly
what I usually use to indicate a timer is inactive. So I frequently
use:
set afterid [after cancel $afterid]

> having to catch "after cancel" to get around
> expired events is rather unwieldy.
>

Are you now describing a possible, but undesired change to the way
after works? Because currently "after cancel" doesn't generate an
error when the event has already expired.

Schelte.
--
set Reply-To [string map {nospam schelte} $header(From)]

Uwe Klein

unread,

Apr 13, 2008, 8:06:59 AM4/13/08

to

Schelte Bron wrote:
> Uwe Klein wrote:
>
>>I once proposed to have
>>[after cancel $afterid]
>>return something sensible like 0/1 (
>> depending on an event cancled or not)
>>or the remaining time.
>>
>
> It does return something sensible; an empty string. Which is exactly
> what I usually use to indicate a timer is inactive. So I frequently
> use:
> set afterid [after cancel $afterid]
>
>
>>having to catch "after cancel" to get around
>>expired events is rather unwieldy.
>>
>
> Are you now describing a possible, but undesired change to the way
> after works? Because currently "after cancel" doesn't generate an
> error when the event has already expired.

Has this been changed?
Just retrieved the thread I posted this in ) from 2002)
http://groups.google.com/group/comp.lang.tcl/browse_frm/thread/19475ec9b2e5580f

for the project this was relevant for I had patched tcl to suit me.

uwe

George Peter Staplin

unread,

Apr 13, 2008, 10:49:08 AM4/13/08

to

Alexandre Ferrieux wrote:
> George, I must have been unclear, sorry.

No worries :)

> What I meant was that the Timer internal rep could just hold one of
> today's pointers to a linked list's cell.

I suppose that means that whenever the internal rep is lost for the
Timer Tcl_ObjType we would have to search the list, or possibly use a
hash table lookup to restore the internal rep, or get the TimerHandler
struct from the string rep. By using the TimerHandler **table; we can
reduce the cost of searching the list in the case of the lost internal
rep. Because the string rep has an integer in it that most likely
refers to an offset for use with the table.

Also, with a linked list rather than a double-linked list, the removal
will still require iterating a fair chunk of the list, depending on
where the TimerHandler is, to remove it and set the prevPtr to the
TimerHandler's ->nextPtr.

> This means a very tiny incremental change, no need to overhaul the
> underlying data structure.
> I was thinking about it as an alternative to your proposal; but again,
> I have no clear view yet of which one is better, I'm ready to lean
> either way... So I was rather expecting arguments than code (we both
> know how to do all this) ;-)

Perhaps I misunderstood you again. How would you solve the problems I
just pointed out with an internal rep alone?

George

Alexandre Ferrieux

unread,

Apr 13, 2008, 1:33:07 PM4/13/08

to

On Apr 13, 4:49 pm, George Peter Staplin
<georgepsSPAMME...@xmission.com> wrote:
> Alexandre Ferrieux wrote:

> > What I meant was that the Timer internal rep could just hold one of
> > today's pointers to a linked list's cell.
>
> I suppose that means that whenever the internal rep is lost for the
> Timer Tcl_ObjType we would have to search the list, or possibly use a
> hash table lookup to restore the internal rep, or get the TimerHandler
> struct from the string rep.

Yes, the idea, as usual with internal reps, is that it is just a
cache, and when lost the string rep is enough to rebuild it. Here the
most natural way of rebuilding it is what is currently at work at all
times: a linear svan of the linked list. But as you noticed, in real
life those intreps are not likely to vanish, so I'm ready to tolerate
this linear scan for such rare cases.

> Also, with a linked list rather than a double-linked list, the removal
> will still require iterating a fair chunk of the list, depending on
> where the TimerHandler is, to remove it and set the prevPtr to the
> TimerHandler's ->nextPtr.

Geez, you got a point :-) Sorry for forgetting this part of your
proposal.
You are right, we need the doubly linked list for this to be O(1). And
that's still a vary small increment, not an overhaul.

To summarize: "my" (infact, our) counter-proposal is made of:

- an intrep holding a direct pointer to a cell in the linked list
- that list doubly linked for O(1) removal.

Apologies again for the mixup in my brain.
Now, I would really love to hear your opinion about this vs your
original proposal (with the unix-fd-like direct-offset access).

-Alex

Schelte Bron

unread,

Apr 13, 2008, 4:44:37 PM4/13/08

to

Uwe Klein wrote:
> Schelte Bron wrote:
>> Uwe Klein wrote:

[snip]

>>>having to catch "after cancel" to get around
>>>expired events is rather unwieldy.
>>>
>> Are you now describing a possible, but undesired change to the
>> way after works? Because currently "after cancel" doesn't
>> generate an error when the event has already expired.
> Has this been changed?
> Just retrieved the thread I posted this in ) from 2002)
>
http://groups.google.com/group/comp.lang.tcl/browse_frm/thread/19475ec9b2e5580f
>
> for the project this was relevant for I had patched tcl to suit
> me.
>

That thread indicates "after info" returns an error. That is true.
But "after cancel" does not.

Kevin Kenny

unread,

Apr 13, 2008, 11:28:13 PM4/13/08

to

Alexandre Ferrieux wrote:
> To summarize: "my" (infact, our) counter-proposal is made of:
>
> - an intrep holding a direct pointer to a cell in the linked list
> - that list doubly linked for O(1) removal.

Hmm, with only a very slightly smarter data structure, you can have
it all.

(1) Have a hash table that maps afterID to the after event's data
structure.
(2) The after events' data structures are kept in a priority queue
(ordered by wakeup time) that supports deletion of arbitrary
elements. We want this queue to support insertion, deletion,
and "find minimum" in constant time. Appropriate implementations
include AVL-trees, B-trees, red-black-trees, and leftist heaps.
The last have very favourable properties for [after] queues, and
I've used them for organizing things like television schedules.
(3) Optionally, bypass the hash lookup by caching the address of the
after event's data structure on the internal representation of
the object that holds the afterID. This is the least important
optimization here; the hash table lookup is amortized constant
time, so we're getting rid of at most a constant factor with
this.

This is something that I've been wanting to try, but I've been kind
of deferring it until I got around to the (more important, in my
estimation) implementation of 64-bit wakeup times.

At some point, we also will want to begin making a distinction between
two different sorts of timer events;

(1) wake up after a certain amount of apparent time has elapsed,
irrespective of resets of the system clock. On most systems,
the reference for this function is something like "clock ticks
since bootload".
(2) wake up at a particular (UTC or local) time, tracking any clock
changes that take place.

Right now, all we have is (2), albeit with a usage that makes callers
more expect (1). Fixing it will most likely take *two* new commands,
since in the 8.x series at least, [after] needs to continue functioning
as it currently does. More research is also needed to validate that (1)
is achievable everywhere, particularly given the fact that a given
process or thread may have a whole queue of relative-time handlers
waiting to fire.

Just a few thoughts, and I cheer anyone who's willing to tackle this mess.

--
73 de ke9tv/2, Kevin

Alexandre Ferrieux

unread,

Apr 14, 2008, 4:42:56 AM4/14/08

to

On Apr 14, 5:28 am, Kevin Kenny <kenn...@acm.org> wrote:
>
> Hmm, with only a very slightly smarter data structure, you can have
> it all.
>
> (1) Have a hash table that maps afterID to the after event's data
> structure.
> (2) The after events' data structures are kept in a priority queue
> (ordered by wakeup time) that supports deletion of arbitrary
> elements. We want this queue to support insertion, deletion,
> and "find minimum" in constant time. Appropriate implementations
> include AVL-trees, B-trees, red-black-trees, and leftist heaps.
> The last have very favourable properties for [after] queues, and
> I've used them for organizing things like television schedules.

Hmmm, right, O(1) insertion is one thing we haven't addressed so far.
And come to think of it, it is at least as important as O(1)
deletion !
I like the heap idea (O(log(N)) instead of O(1) but that's close in
practice).

> (3) Optionally, bypass the hash lookup by caching the address of the
> after event's data structure on the internal representation of
> the object that holds the afterID. This is the least important
> optimization here; the hash table lookup is amortized constant
> time, so we're getting rid of at most a constant factor with
> this.

Yes, in the light of the "new" requirement of fast insertion, I
realize that the intrep idea (which is basically just boosting lookup)
is kinda awkward...

> This is something that I've been wanting to try, but I've been kind
> of deferring it until I got around to the (more important, in my
> estimation) implementation of 64-bit wakeup times.

OK, so I think it deserves a TIP (is there one already ?). Again to
stay on the radar... If nobody else steps forward, maybe I could write
it, but I'd appreciate your eventual implication in the ref
implementattion.

> At some point, we also will want to begin making a distinction between
> two different sorts of timer events;
>
> (1) wake up after a certain amount of apparent time has elapsed,
> irrespective of resets of the system clock. On most systems,
> the reference for this function is something like "clock ticks
> since bootload".

Hey, so you've (secretly) looked at TIP 302 after all ? ;-)

> (2) wake up at a particular (UTC or local) time, tracking any clock
> changes that take place.

"Tracking" is a bit of an exaggeration. "Undergoing" would be closer
to the truth...

> Right now, all we have is (2), albeit with a usage that makes callers
> more expect (1). Fixing it will most likely take *two* new commands,
> since in the 8.x series at least, [after] needs to continue functioning
> as it currently does.

I would be very surprised to see a single example of a beneficial, or
even depended-upon, effect of a gettimeofday()-shift on [after].
Indeed it "allows" scripts to run timely 99.99% of the time, and at
some random spots under the control of an unknowing party (sysadmin or
ntpdate), either freeze for hours or suddenly fire a burst of stale
timers.

Looks to me like preserving backwards compatibility for a Bus
Error ...
That's why I wrote 302 as a request to "fix" rather than "extend" the
after system.

> More research is also needed to validate that (1)
> is achievable everywhere, particularly given the fact that a given
> process or thread may have a whole queue of relative-time handlers
> waiting to fire.

Sorry, I don't understand this remark. Do you mean you're worried
about bursts of timer activity, when Tcl has been out of the event
loop for some time and many handlers are found to be in the expired
state ?

Isn't this strictly orthogonal to replacing a slippery timebase with a
rock-solid, hardware-based one ?

>
> Just a few thoughts, and I cheer anyone who's willing to tackle this mess.
>

I am, and have been for some time. Welcoming that long-awaited
discussion with you ;-)

-Alex

Kevin Kenny

unread,

Apr 16, 2008, 10:34:07 AM4/16/08

to

Alexandre Ferrieux wrote:
> Hmmm, right, O(1) insertion is one thing we haven't addressed so far.
> And come to think of it, it is at least as important as O(1)
> deletion !
> I like the heap idea (O(log(N)) instead of O(1) but that's close in
> practice).

We're not going to *get* O(1) insertion and deletion; if we could,
it would be possible to sort in O(N) time! O(log N) is readily
achievable for both.

> OK, so I think it deserves a TIP (is there one already ?). Again to
> stay on the radar... If nobody else steps forward, maybe I could write
> it, but I'd appreciate your eventual implication in the ref
> implementattion.

TIPs are not needed for changes that affect only performance and
do not change the API. Reorganizing [after] to use a better data
structure is something that we can just do as a performance enhancement.

>> At some point, we also will want to begin making a distinction between
>> two different sorts of timer events;
>>
>> (1) wake up after a certain amount of apparent time has elapsed,
>> irrespective of resets of the system clock. On most systems,
>> the reference for this function is something like "clock ticks
>> since bootload".
>
> Hey, so you've (secretly) looked at TIP 302 after all ? ;-)

Did you see my comment at the bottom? I've misgivings about
whether it's feasible. Yes, I've *looked* at it.

>> (2) wake up at a particular (UTC or local) time, tracking any clock
>> changes that take place.
>
> "Tracking" is a bit of an exaggeration. "Undergoing" would be closer
> to the truth...

Well, with a functioning NTP infrastructure, it's a non-issue.
Even today, the [after] command doesn't depend on time zone changes,
for instance; [after]s go right on ticking across the "spring
forward" and "fall back" transitions into and out of Daylight
Saving Time. It's only when the system clock lacks a reliable
reference that the problem is even noticed. (And any system
with an Internet connection has *some* reference that's just
a few packets away!)

> I would be very surprised to see a single example of a beneficial, or
> even depended-upon, effect of a gettimeofday()-shift on [after].
> Indeed it "allows" scripts to run timely 99.99% of the time, and at
> some random spots under the control of an unknowing party (sysadmin or
> ntpdate), either freeze for hours or suddenly fire a burst of stale
> timers.

I have one important one: NBC. The Tcl application that does
supervision, control, and data acquisition for most of NBC's broadcast
operations center obviously has to be tied closely to the television
schedule. (It doesn't do the "hard real-time" aspects of the job,
but can tolerate being no more than a few tens of milliseconds off
from the actual time.) [after] (and Tcl_CreateTimerHandler) events
are used extensively to cause things to happen at the transition
times. The O(N) performance hasn't been a problem in practice because
the application has other data structures that manage video events
(checking for equipment conflicts, schedule gaps, and so on), and
only a handful of events at any one time are in a state where they
need timer handlers.

The events are indeed all tied to the system clock, which is
governed by NTP using "plant time" as its reference. The plant time
is intentionally a few seconds behind UTC, because it's easier
on the operators to say that a show begins at, say, 19:00 rather
than to say that it begins at 18:59:44 (because of cuss-timer,
frame synchronizer, timebase corrector, codec and satellite
delays). The time reference is distributed
through both the video channels and through the plant as SMPTE
12M time-code, and all the control computers have time code readers.
Computers that are not tied to the equipment get their time from
NTP servers on the local network that do have time code readers.

In this scheme, the terrible crystal oscillators that most computers
use as references for the system clock when NTP discipline is
not available would be entirely unacceptable. In a system where
events are timed to the video frame (33 ms or so), you can't
stand PC clocks that drift by seconds per day! But NTP takes
care of all that. (And the app runs for months at a time; I've
seen nodes where an uncorrected clock would have been off
by many minutes.) So unlike your assertion that the
hardware clock is "rock solid" while system time is capricious,
in the NBC system, system time is traced to a primary reference
like http://www.symmttm.com/products_pfr_4065C.asp while
the hardware time is (by comparison!) $env(LC_DEITY) knows what!

Any shop where time is important enough that there are Stratum 1
servers in house is going to be in the same boat: the system clock
may drift by seconds per day, but NTP time is going to be stable
over the long run to 10**-9 or better.

(Yes, I maintain the [clock] command. It's *important* to me.)

>> More research is also needed to validate that (1)
>> is achievable everywhere, particularly given the fact that a given
>> process or thread may have a whole queue of relative-time handlers
>> waiting to fire.
>
> Sorry, I don't understand this remark. Do you mean you're worried
> about bursts of timer activity, when Tcl has been out of the event
> loop for some time and many handlers are found to be in the expired
> state ?
>
> Isn't this strictly orthogonal to replacing a slippery timebase with a
> rock-solid, hardware-based one ?

No. The problem is in finding a time base that's immune to shifts
in the system clock. On Windows, the "time since bootload" is
readily available, and Windows is usually where the problem occurs.
(Unix distributions come with NTP out of the box, and getting them
configured to Just Work is easier.) Your suggestion in TIP #302
of using times() is not workable, because times() reports CPU time,
not wall-clock time; when a process is idle, times() does not
advance. The uptime command, on many Unixes, derives its reference
from the system clock (subtracting the current value from a "time
last booted" value stored in kmem somewhere). In short, Unix
presumes that the timekeeping infrastructure works, and doesn't
provide a way to work around it. (I'm willing to be proven wrong
here, but I've never found such a beast.)

Of course, the division between [after] and [at] is still a convenience
to the programmer, and avoids a potential race condition if for
some reason the thread freezes between the time that the [after]
interval is calculated and the time that the event is actually
queued. That aspect is the part of TIP 302 that's a worthy idea.

Perhaps my experience is unusual, but it's been literally years
since I've set a system clock by the wristwatch-and-eyeball method.
Even my wristwatch (like millions of others) gets its time from
NIST; there's a WWVB receiver aboard that resynchronizes daily to
the radio signal from Fort Collins.

Alexandre Ferrieux

unread,

Apr 17, 2008, 5:02:42 PM4/17/08

to ken...@acm.org

On 16 avr, 16:34, Kevin Kenny <kenn...@acm.org> wrote:
>
> > OK, so I think it deserves a TIP (is there one already ?). Again to
> > stay on the radar... If nobody else steps forward, maybe I could write
> > it, but I'd appreciate your eventual implication in the ref
> > implementattion.
>
> TIPs are not needed for changes that affect only performance and
> do not change the API. Reorganizing [after] to use a better data
> structure is something that we can just do as a performance enhancement.

OK, I stand corrected. But is it sufficient reason to ignore it during
one year and four months ?
Had somebody told me it was not TIPpable, I would have happily moved
it to a more appropriate channel...

> Did you see my comment at the bottom? I've misgivings about
> whether it's feasible. Yes, I've *looked* at it.

Two answers:

(1) no, I hadn't seen your comment at the bottom until now, because
there is no e-mail notification of TIP updates apparently (or it
failed on me, or on Gmail's antispam, etc.)

(2) I disagree with the comment. Quoting you:

> The times function in Unix is not an appropriate time base.
> It reports the user and system time (CPU time, in other words)
> of the currently executing process and its children...

What you say accurately describes what times() returns in its argument
structure. But it also has a return value, which is of type clock_t,
and, quoting the manpage:

>> times() returns the number of clock ticks that have elapsed
>> since an arbitrary point in the past.

So again, why is not an appropriate time base ?

> >> (2) wake up at a particular (UTC or local) time, tracking any clock
> >> changes that take place.
>
> > "Tracking" is a bit of an exaggeration. "Undergoing" would be closer
> > to the truth...
>
> Well, with a functioning NTP infrastructure, it's a non-issue.

Yes but not everybody has this. I routinely get to handle machines
jailed in strange subnetworks with armies of firewall around, and NTP
not an option. Nor would it deserve any effort: on these machines,
accuracy of the 'date' command is not required. Only from time to time
a sysadmin may find too large a drift and decide to take action,
without imagining the harm this operation does to Tcl (and on Tcl
alone, since sh's sleep command uses setitimer() which is also
insensitive to date-setting).

> Even today, the [after] command doesn't depend on time zone changes,

Yeah, this red herring has been covered to death on c.l.t :-)
We both know TZ and DST are a separate layer from the true time base
ticking [after], let's dismiss that.

> > I would be very surprised to see a single example of a beneficial, or
> > even depended-upon, effect of a gettimeofday()-shift on [after].
>

> I have one important one: NBC.

> [...] So unlike your assertion that the

> hardware clock is "rock solid" while system time is capricious,
> in the NBC system, system time is traced to a primary reference

OK, I understand the specific need, but:

(1) this is closer in my view to an [at] command, rather than an
[after]

(2) even with a ticks-based, drifting time reference in [after], it
would be easily achievable with a two-resolution scheme: a periodic
task running every (say) minute, checking [clock], and posting short-
range [after]s.

(3) Conversely, even in the current implementation, with a tight NTP
hand on the system clock, if there is one single long [after] with no
intervening events, Tcl will compute one single select() timeout
value, and the syscall will sleep until its end, regardless of any
intervening NTP corrections; so it will suffer from the drift. Of
course, this won't happen if the select() is frequently interrupted by
other events (including shorter-term afters), since in this case
gettimeofday() is called at each time to recompute the next target).
Is it your case ?

(4) I'd be curious to know how many dozens of cases of the stalled
periodic task there are per single NBC-like case ... But I admit the
voice of a TCT member can weigh higher than 1 ;-)

> configured to Just Work is easier.) Your suggestion in TIP #302
> of using times() is not workable, because times() reports CPU time,
> not wall-clock time; when a process is idle, times() does not
> advance.

Not true, see above :-}

> Perhaps my experience is unusual, but it's been literally years
> since I've set a system clock by the wristwatch-and-eyeball method.
> Even my wristwatch (like millions of others) gets its time from
> NIST; there's a WWVB receiver aboard that resynchronizes daily to
> the radio signal from Fort Collins.

I understand, but again, if your experience is "unusual", mightn't it
be advisable to ease the life of the Neandertalian masses including
myself ? (of course with an explicit option, like the -robust in the
TIP)

-Alex

Kevin Kenny

unread,

Apr 18, 2008, 12:35:58 PM4/18/08

to

Alexandre Ferrieux wrote:
[why ignore this for over a year?]

Uhm, because I've been working on other things? I've been *wanting*
to do a better data structure for [after] for almost ten years now,
and more important things keep coming up. Of course, if someone
*else* were to implement it, shepherding it into the Core would have
a higher priority. :)

> (2) I disagree with the comment. Quoting you:
>
> > The times function in Unix is not an appropriate time base.
> > It reports the user and system time (CPU time, in other words)
> > of the currently executing process and its children...
>
> What you say accurately describes what times() returns in its argument
> structure. But it also has a return value, which is of type clock_t,
> and, quoting the manpage:
>
> >> times() returns the number of clock ticks that have elapsed
> >> since an arbitrary point in the past.
>
> So again, why is not an appropriate time base ?

At some point, I understood that the return value of times() was
actually calculated by subtracting gettimeofday() from the
bootload time, so it wasn't guaranteed to be monotonic. If that
is the case, it's no better than [after] is today, and in fact
is worse because it's reliable as neither an absolute nor a
relative standard. If in fact it is monotonic, and has a
stable frequency reference backing it up, then it would be
an ideal choice. (These implementation details are nasty!)

> Yes but not everybody has this. I routinely get to handle machines
> jailed in strange subnetworks with armies of firewall around, and NTP
> not an option. Nor would it deserve any effort: on these machines,
> accuracy of the 'date' command is not required.

Nowadays, that seems a rather strange environment. In particular,
it presumes that shared filesystems (or even rsync) are not
required to be coherent, since strange things start to happen
if the system time on a machine that is writing files drifts
too far from the system time on the machine where they live.
Blocking NTP at firewalls, and providing no alternative,
seems to be the act of a sysadmin that is actively hostile to
the users. Nevertheless, if that's what you have to deal with,
and it isn't *too* hard to support it, I suppose we can work
up something.

>> Even today, the [after] command doesn't depend on time zone changes,
> Yeah, this red herring has been covered to death on c.l.t :-)
> We both know TZ and DST are a separate layer from the true time base
> ticking [after], let's dismiss that.

The reason that it's been discussed to death is largely that it
was *not* true once upon a time. Windows prior to Windows 2000
kept the hardware clock set to local time, and there was a glitch
at "fall back" that could be avoided only by intercepting the
WM_SETTINGSCHANGED message, reading the clocks, and rescheduling
everything. (Moreover, there was a race condition where the
WM_SETTINGSCHANGED could awaken a process before the settings
change had actually propagated all the way!) Tcl tripped over
*that*, too. Fortunately, Win9x and WinNT 3.51 are things of the past.

>>> I would be very surprised to see a single example of a beneficial, or
>>> even depended-upon, effect of a gettimeofday()-shift on [after].
>> I have one important one: NBC.
>> [...] So unlike your assertion that the
>> hardware clock is "rock solid" while system time is capricious,
>> in the NBC system, system time is traced to a primary reference
>
> OK, I understand the specific need, but:
>
> (1) this is closer in my view to an [at] command, rather than an
> [after]

Indeed, and I'm all in favour of implementing [at] - just haven't
had the time to do it myself (and it *does* need a TIP).

> (2) even with a ticks-based, drifting time reference in [after], it
> would be easily achievable with a two-resolution scheme: a periodic
> task running every (say) minute, checking [clock], and posting short-
> range [after]s.

Assuming, as said earlier, that an appropriate timebase is available
in userland. If I'm wrong about times(), then that problem is solved.

> (3) Conversely, even in the current implementation, with a tight NTP
> hand on the system clock, if there is one single long [after] with no
> intervening events, Tcl will compute one single select() timeout
> value, and the syscall will sleep until its end, regardless of any
> intervening NTP corrections; so it will suffer from the drift. Of
> course, this won't happen if the select() is frequently interrupted by
> other events (including shorter-term afters), since in this case
> gettimeofday() is called at each time to recompute the next target).
> Is it your case ?

Since the UI has a clock displaying seconds, the select() in my
process never sleeps for more than a second. (Also, the workload
is such that I'd expect it never to sleep for more than a few
seconds in any case.)

In any case, I think we're in some form of "violent agreement" -
we both agree that

- separating [after] from [at] would be a good idea, even
irrespective of the chosen timebase.
- tying [after] to a stable relative timebase - assuming
that we can find one - is another good idea.
- irrespective of those two points, a data structure capable
of handling a large schedule of timers without performance
anomalies would be another good thing, and this last bit
can be done without a TIP.

We disagree on priorities (who doesn't?), largely because of
different experience. I work in a lot of strange places, but
it's been years since I last set a system clock with wristwatch
and eyeball. You obviously labor under different constraints.

Alexandre Ferrieux

unread,

Apr 21, 2008, 7:02:54 AM4/21/08

to Kevin Kenny

The key is that times() returns the tick count, which is the heartbeat
of the unix scheduler itself. So if it weren't monotonic, macroscopic
effects should be observed on each date-reset, which is not the case:
Tcl stands alone.
In addition, in the last 16 years I've been playing with it on SunOS,
then Solaris, AIX, IRIX, Linux, including embedded ones, it never
broke the contract.

So, unless we want to be exceedingly paranoid about documented
semantics and require a manpage saying "ticks increase evenly" on all
existign OSes before we move, I think it would be overall good for Tcl
to stop being the ugly duckling of date-shifts.

Your call :)

> Nevertheless, if that's what you have to deal with,
> and it isn't *too* hard to support it, I suppose we can work
> up something.

Gee, that feels good !

But please notice I'm not only preaching for my own little case (I
have long learnt to say [open "|sh -c {while :;do echo;sleep 1;done}
2>@ stderr" r] to get a reliable periodic task !). Instead, I feel
fundamental discomfort about the unique dependency that Tcl shows on
an accurate, incremental, clock-driving mechanism like NTP. Even if it
is relatively widespread, it is still taking too much for granted
(ever heard of a standalone machine ?). And again, no other similar
tool shows this peculiar dependency.

> Indeed, and I'm all in favour of implementing [at] - just haven't
> had the time to do it myself (and it *does* need a TIP).

OK.

> > (2) even with a ticks-based, drifting time reference in [after], it
> > would be easily achievable with a two-resolution scheme: a periodic
> > task running every (say) minute, checking [clock], and posting short-
> > range [after]s.
>
> Assuming, as said earlier, that an appropriate timebase is available
> in userland. If I'm wrong about times(), then that problem is solved.

It is.

> In any case, I think we're in some form of "violent agreement" -
> we both agree that
>
> - separating [after] from [at] would be a good idea, even
> irrespective of the chosen timebase.

Yes, your upcoming [at] TIP.

> - tying [after] to a stable relative timebase - assuming
> that we can find one - is another good idea.

That's TIP 302. In the light of the new data about times(), can you
please at least update the comment at the end of the TIP, and (even
better) write about your thoughts on this on TCLCORE (just in case you
need a hand, I understand you're at least as busy as I am). Of course
I could dedicate some time to the (still lacking) ref implementation,
but I doubt I am the best horse for the job.

> - irrespective of those two points, a data structure capable
> of handling a large schedule of timers without performance
> anomalies would be another good thing, and this last bit
> can be done without a TIP.

On the Tracker, then ?

-Alex

Fredderic

unread,

Jul 4, 2008, 1:33:39 AM7/4/08

to

On Fri, 18 Apr 2008 12:35:58 -0400, Kevin Kenny wrote:

This is a butt-old thread, I know, for some reason a few messages
accidentally got marked as unread, so I've ended up reading it over
again... But something just jumped out at me...

>> (1) this is closer in my view to an [at] command, rather than an
>> [after]
> Indeed, and I'm all in favour of implementing [at] - just haven't
> had the time to do it myself (and it *does* need a TIP).

Would [clock alarm] make sense, rather than adding a new [at] command?
Its placement as a sub-command of [clock] also points directly at the
fact that this command is based on wall-clock time, rather than elapsed
time.

Just a thought...

Fredderic

Alexandre Ferrieux

unread,

Jul 4, 2008, 3:15:55 AM7/4/08

to

An alias is always possible of course, but keep in mind that this wall-
clock semantics has been attached to [after] since the beginning, so
it must stay that way. So you also should invent a nice name for the
new, truly relative timers...

-Alex

Andreas Leitgeb

unread,

Jul 4, 2008, 4:57:42 AM7/4/08

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
>> Would [clock alarm] make sense, rather than adding a new [at] command?
>> Its placement as a sub-command of [clock] also points directly at the
>> fact that this command is based on wall-clock time, rather than elapsed
>> time.
>
> An alias is always possible of course, but keep in mind that this wall-
> clock semantics has been attached to [after] since the beginning,

But always in contradiction to documentation:
" after ms
" Ms must be an integer giving a time in milliseconds.
" The command sleeps for ms milliseconds and then returns.
" ...
" after ms ?script script script ...?
" ..., but it arranges for a Tcl command to be executed
" ms milliseconds later as an event handler.

There is no mention that the future wallclock time is calculated
and then awaited but only of a timespan that's supposed to pass.

> it must stay that way. So you also should invent a nice name for the
> new, truly relative timers...

I disagree.

PS: The whole issue is only relevant if the underlying system
clock has a tendency to run unsteadily, which was probably
the reason for not caring back when "after" was designed.

Donal K. Fellows

unread,

Jul 4, 2008, 8:51:44 AM7/4/08

to

Andreas Leitgeb wrote:
> PS: The whole issue is only relevant if the underlying system
> clock has a tendency to run unsteadily, which was probably
> the reason for not caring back when "after" was designed.

So run ntpdate (or equivalent) already.

Donal.

Andreas Leitgeb

unread,

Jul 4, 2008, 10:33:17 AM7/4/08

to

sorry, cannot really parse that line...

What has ntpdate got to do with the issue of system-time not
always passing uniformly (beyond being one possible cause of
it)?

Donald Arseneau

unread,

Jul 4, 2008, 9:31:26 PM7/4/08

to

On Jul 4, 12:15 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> > > Indeed, and I'm all in favour of implementing [at] - just haven't
> > > had the time to do it myself (and it *does* need a TIP).
>

> An alias is always possible of course, but keep in mind that this wall-
> clock semantics has been attached to [after] since the beginning, so
> it must stay that way.

This comes up repeatedly and I very strongly disagree. The [after]
command is documented to implement intervals, not wall-clock. The
fact that it has hiccups when the wall clock changes is just a
long-standing bad bug, and is no reason to retain the incorrect
behavior. Applications that rely on the current behavior have
been brought up in discussions, but they turn out to be mostly
mythical, and what few do exist should just be fixed (coding that
uses [after] with the intention of [at], even when such coding,
works, is too obscure). On the other hand, almost every use of
[after] is to implement a short delta-time delay, and these break
badly once or twice a year with the current [after].

Can't we follow the principle of least surprise here?

Donald Arseneau as...@triumf.ca

Donal K. Fellows

unread,

Jul 5, 2008, 10:32:19 AM7/5/08

to

Donald Arseneau wrote:
> This comes up repeatedly and I very strongly disagree.

This comes up repeatedly because nobody seems to want to bite the
bullet and say how to make things work in real code. Your agreement or
otherwise with this state of affairs is irrelevant (unless you write a
patch, of course).

> The [after]
> command is documented to implement intervals, not wall-clock. The
> fact that it has hiccups when the wall clock changes is just a
> long-standing bad bug, and is no reason to retain the incorrect
> behavior.

OK, how do you work out how long until the next event fires without an
absolute time source? For added bonus points, do so with arbitrary
real-time delays between processing points and without extra threads.

> On the other hand, almost every use of
> [after] is to implement a short delta-time delay, and these break
> badly once or twice a year with the current [after].

Only on Windows, which is surprised by the occurrence of timezone
changes twice a year, and Linux, which tries to emulate the stupidity
of Windows so that dual-booting doesn't break. Sane systems don't
change their clocks when when DST goes into force or when they are
moved between timezones; they just change how they render the clocks.
What I've never figured out is which idiot decided this was a
(mis)feature and not a flat bug.

> Can't we follow the principle of least surprise here?

I'd like to hear how you work out when a timer is ready to fire.

Donal.

Donal K. Fellows

unread,

Jul 5, 2008, 10:34:30 AM7/5/08

to

Andreas Leitgeb wrote:
> What has ntpdate got to do with the issue of system-time not
> always passing uniformly (beyond being one possible cause of
> it)?

If you're running ntpdate, system time will always be close to real
time (which I think we can assume passes uniformly for the sake of
this discussion) and you'll never have the problem. QED.

Donal.

Alexandre Ferrieux

unread,

Jul 5, 2008, 12:17:10 PM7/5/08

to

On Jul 5, 4:32 pm, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

>
> > Can't we follow the principle of least surprise here?
>
> I'd like to hear how you work out when a timer is ready to fire.

The please guys contribute your insights as a discussion on tcl-core
about

TIP 302: Fix "after"'s Sensitivity To Adjustments Of System Clock
http://www.tcl.tk/cgi-bin/tct/tip/302.html

Here is a summary of what I have gathered so far.
Out of naivety, I initially shared Donald's feeling expressed in this
thread, hence the title of the TIP ("fix" and not "invent a new
variant").
But since then, discussions with Kevin yielded to things:

- Kevin showed me real-life examples that the fix would break (in
the television industry)
- I showed Kevin that both Windows and especially unix[*] had proper
syscalls to get a tickcount-from-boot that is a proper basis for the
fix.

Then discussion stopped because we were both busy, as usual.
I am not far from being able to write a ref implementation for the
TIP, but I have to admit I have shied away so far because the
coexistence with the old behavior (unavoidable as shown by Kevin)
flatly doubles the size of the code, since it means maintaining two
sorted lists of events, one based on wall clock, the other on ticks...

Anyone with more courage can step forward !

-Alex

Alexandre Ferrieux

unread,

Jul 5, 2008, 12:21:42 PM7/5/08

to

On Jul 5, 6:17 pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> >

> - Kevin showed me real-life examples that the fix would break (in
> the television industry)
> - I showed Kevin that both Windows and especially unix[*] had proper
> syscalls to get a tickcount-from-boot that is a proper basis for the
> fix.

Missing footer:
[*] The times() function happens to *return* a value which is the
tickcount, while the struct passed to it is filled with cpu-time
counters, which do not interest us in that context.

-Alex

Andreas Leitgeb

unread,

Jul 5, 2008, 12:50:21 PM7/5/08

to

So shall we require a connection to the internet (for running
ntpdate) as a prerequisite for running after-using Tcl scripts? ;-)

Kevin Kenny

unread,

Jul 5, 2008, 12:56:23 PM7/5/08

to

Donal K. Fellows wrote:
> OK, how do you work out how long until the next event fires without an
> absolute time source? For added bonus points, do so with arbitrary
> real-time delays between processing points and without extra threads.

Donal,

Many people here are unquestionably arguing from ignorance, but
I do suspect after some conversations with Alexandre Ferrieux
that they are onto something with wanting a distinction between
absolute and relative times.

Absolute time, as far as Tcl is concerned, is UTC with smoothing
(http://www.cl.cam.ac.uk/~mgk25/uts.txt). It advances monotonically
and (more or less) continuously, ticking at an uneven rate to
accommodate TAI-UTC differences and NTP corrections. It *can*
be nonmonotonic or discontinuous in unusual circumstances, owing
to an act of God (a network outage causing too large an NTP jump
to recover by advancing or retarding clock frequency), an act
of the operator (adjusting system time by the wristwatch-and-
-eyeball method), or an Act of Congress (Daylight Saving Time,
on operating systems that use civil time as the reference).

Its reference on Unix-like systems is gettimeofday(); on Windows
systems its ultimate reference is the system clock returned
by GetSystemTimeAsFileTime, with additional precision provided
by interpolation using the performace counter.

Relative time is not something that Tcl currently recognises,
but perhaps should. It is strictly monotonic and uniform;
it can be thought of as the independent variable of Newton's
Laws of Motion. It is not - indeed, cannot be - tied to any
concept of absolute time outside the computer, since all the
"acts" above can cause dislocations. Nevertheless, it appears
to be what people want in scheduling short-period tasks.
Moreover, the anomalies in time handling on Windows (where
near the Spring and Autumn transitions of Daylight Saving Time,
the "UTC" returned by the system can be an hour off!) appear
to be with us to stay. I've seen bugs in that area in every
Windows release from 3.1 to Vista.

We have historically (and I am an offender in that respect)
offered the answer, "make your NTP infrastructure work!"
That's certainly part of the answer, and necessary for other
reasons (such as making sure that the system clock is
reasonably synchronised with the clock of remotely-mounted
file systems). It does not address the fact that there
are timing windows at the Daylight Saving Time transitions
during which the Windows API returns times that are simply
incorrect. It also fails to address the concerns of those
who deal with systems that have only intermittent network
connectivity or lack it altogether. (Tcl gets ported to
some very strange places, indeed.)

Given these considerations, I think we can all agree that,
all else being equal, we might gain something by
distinguishing the relative timing function (today's
[after] used for short-period tasks with loose accuracy
being tolerable) and the absolute timing function (delivery
of an event at a given time in the future, accounting
for any known dislocations of the clock). The latter might
be implemented by a command analogous to 'after' that
accepts seconds (milliseconds, microseconds, choose
a convenient unit) of UTS time from the epoch instead
of milliseconds from the current time.

The chief argument against that approach, we have stated
in the past, is infeasibility. Unquestionably, we have
system calls that delay a certain length of time irrespective
of changes to the civil clock - indeed, sleep() and
select() on Unix; and Sleep and MsgWaitForMultipleObjectsEx
on Windows work that way. What we have lacked, or so we
have claimed, is a reliable way to handle multiple
relative timers - once the first timer has rung, how
do we wind the second timer with the now-shorter interval
that it needs? What is needed is a reference clock that
advances with the properties that we require (monotonicity,
uniformity, but not accuracy nor synchrony with the
outside world).

I - and apparently you, Donal - had long believed that
no portable way existed to get such a clock, but I am
coming to suspect that the world is catching up with our
needs. On Windows, the information is available with
the GetTickCount function, which has existed since the
beginning (Windows attempts to calibrate its rate to
NTP corrections, but guarantees its monotonicity
except for the 49.7-day rollover, which we could deal
with easily). Unix-like systems are a tougher nut
to crack, but with the Posix standards being implemented
more and more widely, I suspect that the times() function
is a reasonable reference on Unix-like systems. While
its primary purpose is to retrieve CPU time for a process,
it has the side effect of returning clock ticks since
an arbitrary point in the past, advancing in monotonic
fashion. (This reference is needed when successive
calls to times() are used to compute percent-CPU-usage.)

It remains to be seen whether the combination of
times() and GetTickCount() will work on all the platforms
that we support, but it's also something that we could
exploit anyway, by having the configurator check for
the routines and fall back upon today's usage of
absolute time if they are not available.

I don't personally have the time at the moment to
tackle such a project, but if someone else wants to
draft a TIP and attempt a reference implementation,
I'd be willing to shepherd them through the process.

Donal K. Fellows

unread,

Jul 5, 2008, 1:58:06 PM7/5/08

to tcl-...@lists.sourceforge.net

Kevin Kenny wrote:
> Many people here are unquestionably arguing from ignorance, but I do
> suspect after some conversations with Alexandre Ferrieux that they
> are onto something with wanting a distinction between absolute and
> relative times.

While I could always see how to do it with single sets of timers, I
could never work out how to do it reliably with multiple event
sequences on different scales (e.g. a 13ms interval and a 1s interval)
where the event processing takes an appreciable time w.r.t. the
shorter intervals. Which happens for sure when using, say, Tk; redraws
are quite expensive. With each pending timer event assigned an
absolute time at which it becomes eligible for execution, it's pretty
simple to handle. But without it...

> I - and apparently you, Donal - had long believed that no portable
> way existed to get such a clock, but I am coming to suspect that the
> world is catching up with our needs. On Windows, the information is
> available with the GetTickCount function, which has existed since the
> beginning (Windows attempts to calibrate its rate to NTP corrections,
> but guarantees its monotonicity except for the 49.7-day rollover,
> which we could deal with easily). Unix-like systems are a tougher
> nut to crack, but with the Posix standards being implemented more and
> more widely, I suspect that the times() function is a reasonable
> reference on Unix-like systems.

Alas, no. On Darwin[*], times() is documented as returning the number
of clock ticks since the Unix epoch, and so is inherently coupled to
absolute time. It's also deprecated in favour of gettimeofday() (and
getrusage(), but that's by the by).

Without a monotonic reference, the problem's just about impossible
without doing something elaborate with a RTOS.

Donal.
[* I read this in the manual pages on Leopard... ]

Donal K. Fellows

unread,

Jul 5, 2008, 2:01:30 PM7/5/08

to

Andreas Leitgeb wrote:
> So shall we require a connection to the internet (for running
> ntpdate) as a prerequisite for running after-using Tcl scripts? ;-)

No. It's only required if you also want to have an accurate clock. If
you never change the time manually (or perpetrate the Windows DST
"crime") [after] will work pretty nicely. Even poor computer clocks
don't drift *that* fast. ;-)

Donal.

Donal K. Fellows

unread,

Jul 5, 2008, 2:02:47 PM7/5/08

to

Alexandre Ferrieux wrote:
> [*] The times() function happens to *return* a value which is the
> tickcount, while the struct passed to it is filled with cpu-time
> counters, which do not interest us in that context.

Not on all systems. Some (e.g. Darwin) use an absolute tickcount.

Donal.

Kevin Kenny

unread,

Jul 5, 2008, 7:29:42 PM7/5/08

to

Donal K. Fellows wrote:
> Alas, no. On Darwin[*], times() is documented as returning the number
> of clock ticks since the Unix epoch, and so is inherently coupled to
> absolute time. It's also deprecated in favour of gettimeofday() (and
> getrusage(), but that's by the by).
>
> Without a monotonic reference, the problem's just about impossible
> without doing something elaborate with a RTOS.

Does Darwin support clock_gettime(CLOCK_MONOTONIC, &timespec)?
That's the current OpenGroup thinking, and I do believe that I'd
have the configurator try for that first. (Nope! Looked it
up - MacOSX does not support clock_gettime at all. What *were*
they thinking?)

In any case, on systems where there is no monotonic reference
available (I suspect the other BSD's share the problem with Darwin),
we can fall back on using gettimeofday() as the reference for
relative timers. Darwin is less likely than Windows and Linux to
be deployed in an environnment with no possibility of a working
NTP infrastructure. (How many Apples do you see running factories
or labs?) And it also doesn't have the "guaranteed twice a year"
failure that Windows appears to be prone to. Alex's suggestion
makes things better on Windows and Linux, and does no harm on
HPUX and Solaris. The big hurdle is still autoconf - to recognize
the method to be used, from among

clock_gettime(CLOCK_MONOTONIC, ...)
Solaris, BSD's, newer Linuxes. Solaris needs -lrt or -lposix4
times()
HP-UX (lacks CLOCK_MONOTONIC), older Linuxes, older BSD's, AIX
GetTickCount()
Windows
gettimeofday
Ultimate fallback, no worse than today.
Darwin, others?

I could be all wet about some of these.

sleb...@gmail.com

unread,

Jul 5, 2008, 7:33:47 PM7/5/08

to

On Jul 6, 1:58 am, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

Does Darwin have problems with after when daylight savings occurs? If
no then it's fine for Darwin to internally use wall-clock to implement
after while Windows and Linux use CPU ticks.

Fredderic

unread,

Jul 6, 2008, 2:25:55 AM7/6/08

to

On Sat, 05 Jul 2008 12:56:23 -0400,
Kevin Kenny <ken...@acm.org> wrote:

All this, simply because I thought that if someone WAS to add an [at]
command to counter-point [after], then I hope to Ghod they don't call it
[at]. ;)

> Relative time is not something that Tcl currently recognises,
> but perhaps should. It is strictly monotonic and uniform;
> it can be thought of as the independent variable of Newton's
> Laws of Motion. It is not - indeed, cannot be - tied to any
> concept of absolute time outside the computer, since all the
> "acts" above can cause dislocations. Nevertheless, it appears
> to be what people want in scheduling short-period tasks.
> Moreover, the anomalies in time handling on Windows (where
> near the Spring and Autumn transitions of Daylight Saving Time,
> the "UTC" returned by the system can be an hour off!) appear
> to be with us to stay. I've seen bugs in that area in every
> Windows release from 3.1 to Vista.

That is how I thought the [after] time worked, for a very long time.
(Mostly courtesy of the documentation.) For that reason I used to write
code that uses [after] to skip a portion of the overall time,
periodically re-calculating the event time in case the clock has
changed in between.

Usually, when I want something to happen after an interval, then that's
exactly what I want. An animation, a network timeout, whatever, if I
can help it at all, I don't want it being sensitive to variations in the
system clock ticking.

Likewise, when I want something to happen [at] a specific time, it's
because the user doesn't want to have to bother with watching a real
wall-clock, or because it needs to mesh in with other events that are
happening in the rest of the system, such as tasks being started by
cron and its ilk. So I *DO* want it to keep in sync as much as
possible with what the rest of my computer believes the time to be. Tcl
having its own private idea of what time is really doesn't help there.

These really are two different use cases, and really don't fit the one
hammer. Although I agree that "normally" they're both satisfied by the
same mechanism, whatever that happens to be. But.....

As for NTP, once or twice a year I go without an internet connection at
all for a week or two, for whatever reason, so I check my clock against
my mobile phone once a day and change it if I happen to notice a
difference of more than a second (as it turns out my system clock
without NTP drifts about a second every three days or so). Also even
with NTP, I don't raise my network interfaces automatically at boot,
and by the time I do, the system's already been up at least 10 minutes,
sometimes as much as a couple hours, which doesn't help the couple Tcl
daemons that are already running at that point.

To further complicate the issue, I do dual-boot, about once every two or
three weeks - the default Wine just doesn't want to play Diablo II or
Civilisation III GE (according to WineHQ, Diablo SHOULD work, but
CivIII gold edition, doesn't). Because I used to go several months
between boots (my system's getting a little old these days), and I'd
only use Windoze once every couple of those, Linux is set to GMT time,
not Windoze compatibility time (I probably should change that, but it
feels too much like a cruel de-evolution of the system). That means
during DST, every time I return to Linux my clock is out by an hour,
and NTP has to bring it back when it starts. The default method, of
which, is to not jump the clock unless it's wrong by at least three
hours, and so it skews the clock back into sync with the rest of the
world over the next couple hours. If I think about it, I'll eye-ball
it roughly into sync (and give the system a couple seconds to settle
back down again), before I bring up the network interface (actually
by kicking the runlevel from 3 up to 4 or 5). But usually it's not a
real big deal and letting NTP skew it for me is a lot nicer on the
various daemons and other programs running (I just have to be careful
with my TV guide firing off alarms at the wrong time ;) ). Also
sometimes if I'm going out, I'll tell the boot-loader to bring it up in
runlevel 4 for me, in which case the network interfaces WILL come up,
but it'll still take a couple hours for the clock to actually match the
one on the wall.

While this stuff is fixable, I could change NTP to jump the time if
it's more than half an hour out, or use ntpdate to jump it before
starting NTP, it is none the less a simple case where saying "have a
working NTP" just really isn't good enough. There are times when NTP
isn't available, and there's not a lot you can do about it...

Personally, if both methods aren't going to be provided, then provide as
close to relative time as possible. The other, wall-clock time, can be
synthesised if needs be, although not nearly as efficiently as the
core could do it.

Fredderic

Alexandre Ferrieux

unread,

Jul 6, 2008, 7:38:27 AM7/6/08

to

On Jul 5, 6:56 pm, Kevin Kenny <kenn...@acm.org> wrote:
>
> I don't personally have the time at the moment to
> tackle such a project, but if someone else wants to
> draft a TIP and attempt a reference implementation,
> I'd be willing to shepherd them through the process.

Kevin, two questions:

(1) Why are you suggesting a new TIP, while 302 already has most of
the background (at least for non-Darwin systems) ?

(2) Do I understand correctly that your suggestion now is in line with
TIP#302's initial proposal (and title), in that we could *change*
today's [after] so that, without any extra option, it would now resort
to relative time when available, so be "fixed" for 99.9%[*] of cases,
and separately add a new primitive like [at], taking an absolute
target as a parameter, and built on the current [after]'s machinery ?

-Alex

[*] the 0.1% being, so far, "Kevin's case". I'm not against the label
"*** POTENTIAL INCOMPATIBILIY ***" in the release notes if it improves
the lives of 99.9% of the Tcl world ;-)

Kevin Kenny

unread,

Jul 6, 2008, 10:48:47 AM7/6/08

to

Alexandre Ferrieux wrote:
> Kevin, two questions:
>
> (1) Why are you suggesting a new TIP, while 302 already has most of
> the background (at least for non-Darwin systems) ?

Uhm, because this discussion has run to so many words that I
entirely forgot 302 was there?

> (2) Do I understand correctly that your suggestion now is in line with
> TIP#302's initial proposal (and title), in that we could *change*
> today's [after] so that, without any extra option, it would now resort
> to relative time when available, so be "fixed" for 99.9%[*] of cases,
> and separately add a new primitive like [at], taking an absolute
> target as a parameter, and built on the current [after]'s machinery ?

*If* the configuration issues can be addressed (it's not
obvious to me how to distinguish, for example, systems that
provide the calls but get them wrong from those that get them
right), I'm coming to believe that's the least-worst option.
I'm tired of bug reports every Hallowe'en.

Schelte Bron

unread,

Jul 6, 2008, 11:09:40 AM7/6/08

to

Alexandre Ferrieux wrote:
> [*] the 0.1% being, so far, "Kevin's case".

For the record: I also have an application (home automation) that
allows users to specify a wall-clock time when certain events will
happen every day. At the moment I rely on the current implementation
of after to provide that functionality. If the guts of after changes
without the introduction of a new command that does what after does
now, it would complicate that part of my application quite a bit.

Schelte.
--
set Reply-To [string map {nospam schelte} $header(From)]

Andreas Leitgeb

unread,

Jul 6, 2008, 6:00:24 PM7/6/08

to

Schelte Bron <nos...@wanadoo.nl> wrote:
> Alexandre Ferrieux wrote:
>> [*] the 0.1% being, so far, "Kevin's case".

> For the record: I also have an application (home automation) that
> allows users to specify a wall-clock time when certain events will
> happen every day.
> At the moment I rely on the current implementation
> of after to provide that functionality. If the guts of after changes
> without the introduction of a new command that does what after does
> now, it would complicate that part of my application quite a bit.

Under the assumption, that the system time corresponds to wallclock time
with a not too large deviation, You'll have no need to change.

Under the assumption, that your system-clock may bounce forth and back,
then your environment is already unsafe. Afterall it may just believe
a wrong time when the wall clock says that it's the awaited time.

In any way, your script would become simpler, because rather than
calculating the microseconds till the target point of time for "after",
you'd just tell "at" (or what it would be called) the target point of
time itself.

PS: does the system call "select" with some non-zero timeout abort
earlier, if system-time is adjusted forward during the wait?

If the automation-appication waits for next day at 7am, and internally
calls "select" with a timeout of 9hours and next day at 2am is the
switch to daylight saving time (on Windows), will current "after"
wake up at 7am or at 8am ?
Ok, tcl may anticipate this particular change, but what if the user
notices meanwhile that the system clock is 10 minutes late, and corrects
it, (slowly or quickly wouldn't matter), will current tcl notice the
effectively shorter wanted sleep-time ?

Andreas Leitgeb

unread,

Jul 7, 2008, 1:35:10 AM7/7/08

to

Just clarifying/correcting my own questions:

Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> asked in the PS:

> does the system call "select" with some non-zero timeout abort
> earlier, if system-time is adjusted forward during the wait?

(on linux)

> If the automation-appication waits for next day at 7am, and internally
> calls "select" with a timeout of 9hours and next day at 2am is the
> switch to daylight saving time (on Windows), will current "after"
> wake up at 7am or at 8am ?

On windows it surely doesn't call "select()". Whatever it calls
there for the actual waiting; does that care for system-time-
changes during the waiting time?

Schelte Bron

unread,

Jul 7, 2008, 7:22:03 AM7/7/08

to

Andreas Leitgeb wrote:
> In any way, your script would become simpler, because rather than
> calculating the microseconds till the target point of time for
> "after", you'd just tell "at" (or what it would be called) the
> target point of time itself.
>

This would only be true if a command like "at" would be introduced
at the same time "after" is changed. However, I got the impression
that the necessity of an "at" command was being questioned. That's
why I indicated that merely changing the way after works "without

the introduction of a new command that does what after does now"

would be inconvenient to me.

I'll leave your other questions for someone who knows the answers to
them.

Ralf Fassel

unread,

Jul 7, 2008, 7:38:30 AM7/7/08

to

* Kevin Kenny <ken...@acm.org>
| [make your NTP infrastructure work!]

| It also fails to address the concerns of those who deal with systems
| that have only intermittent network connectivity or lack it
| altogether. (Tcl gets ported to some very strange places, indeed.)

For the record: we use TCL/Windows in 24/7 production lines with no
network at all, and the absolute computer clock time is of no
importance there. What is important is that the 20ms/1s schedule
timers really run all the time and not getting delayed once or twice a
year by an hour when DST changes. So I'm all in favor of [after]
getting changed to reliably work with relative times...

R'

Donal K. Fellows

unread,

Jul 7, 2008, 8:08:41 AM7/7/08

to

Andreas Leitgeb wrote:
> On windows it surely doesn't call "select()".

I think we use WaitForMultipleObject() on Win.

> does that care for system-time-changes during the waiting time?

I don't know. I do know that we compare with "local absolute time"[*]
after a wait when working out what events to actually fire.

Donal.
[* Now there's an odd concept! ]

Kevin Kenny

unread,

Jul 7, 2008, 10:59:27 AM7/7/08

to

Ralf Fassel wrote:
> For the record: we use TCL/Windows in 24/7 production lines with no
> network at all, and the absolute computer clock time is of no
> importance there. What is important is that the 20ms/1s schedule
> timers really run all the time and not getting delayed once or twice a
> year by an hour when DST changes. So I'm all in favor of [after]
> getting changed to reliably work with relative times...

I've received several private emails that seem to be
laboring under a misconception.

For the non-Windows people among us, it is important
not to be confused about Tcl. Tcl internally maintains wakeup
times for timer handlers in UTC. So please stop suggesting
that we do so. We do. (Not you, Ralf! I know that you
know better.)

The problem is that it appears virtually guaranteed that
Windows will wake up the Tcl process from its MsgWaitMultipleObjects
call at a time where:

the hardware clock has been set back to reflect the Daylight
Saving Time change (the hardware clock is set to *local* time!)

BUT

the local-UTC offset has not been changed, so that calls to
get the UTC time are an hour off.

(or else the other way around: the hardware clock hasn't been
reset, but the local-UTC offset has. Either way, for one trip
through the Notifier, the time is an hour off).

The result is that time appears to jump an hour, first one way
then the other. When it jumps forward, all [after] handlers
for that hour fire immediately (often to bizarre effect);
when it jumps backward, the process, as has been observed,
freezes for an hour.

Every new Windows release, and every new release of the VC++
runtime, claims to fix the problem. They never do. (Sometimes,
they change whether the jump is forward-then-back, or
back-then-forward. They never seem to be able to make it
atomic.)

It's *hard* to make these things work when the underlying system
is so messed up. I recommend to Ralf that he set his machines on
the factory floor to the Africa/Monrovia time zone; that's
the only time zone on Windows that means "UTC, no Daylight
Saving Time".

Ralf Fassel

unread,

Jul 7, 2008, 11:34:01 AM7/7/08

to

* Kevin Kenny <ken...@acm.org>
| [Windows DST 'adjustments']

| The result is that time appears to jump an hour, first one way then
| the other.

Is this
first (in March/DST-on) one way then (in October/DST-off) the other
or
first (02:00:000, 1st call after DST-on) one way then (02:00:001,
2nd call after DST-on) the other
? From your statement "They never seem to be able to make it atomic."
I guess the second is the case? (I never spent much thoughts on this
until I stumbled upon this thread :-/)

| I recommend to Ralf that he set his machines on the factory floor to
| the Africa/Monrovia time zone; that's the only time zone on Windows
| that means "UTC, no Daylight Saving Time".

There is a checkbox "adjust automatically to DST" (don't know the
exact wording in the english Windows, in German it is "Uhr automatisch
auf Sommer/Winterzeit umstellen"), which I had hoped when unchecked
would not cause this DST jump? (Of course we get local time an hour
off during DST, but as I said, we don't care about local time on those
systems).

What a mess...
R'

Fredderic

unread,

Jul 9, 2008, 3:48:59 AM7/9/08

to

On 07 Jul 2008 05:35:10 GMT,
Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> wrote:

Seems to me that the OSs all use relative timers, though may calculate
those times from absolute times provided. And they probably do it for
the very reasons being discussed in this thread.

The only way to make an absolute time timer in an environment where the
universe can change beneath you any time it feels like it, is to
repeatedly and frequently pause, check that the universe is still where
you left it, adjust your sense of reality if it isn't, and then continue
on waiting until the next interval.

This then limits your accuracy by the frequency of these checks, but
the problem there is that the more often you check, the more time you
burn doing effectively nothing, and if the checks are an extra activity
outside of the usual ticking of the system then when the times start
getting small the granularity of your checks actually starts to impact
on the timing of the event.

But the bottom line is that the practical absolute time timers are built
on top of short relative time timers, simply because that's the only way
they can work. Otherwise they stall if the time skips backwards, and
extra checking needs to be added in case time skips forwards over the
event time. So it makes sense to me to support the one that actually
works (and also happens to be what the documentation describes), being
the relative timers, and synthesise absolute timers as a higher-level
feature in either [at] or [clock alarm] (my pick, since the source of
the time information happens to be [clock seconds]) that takes an
absolute time. This would introduce a kind of heart-beat tick as a
side-effect, which could then be tuned by the user.

Making the calculation each time you go through the main loop also gives
it the highest and most natural granularity possible, which just leaves
the question of whether you want to introduce a periodic artificial
universe-check wakeup built-in. That's not a particularly complex thing
to do, and in fact is quite trivial (if messy and inefficient) to do
from script anyhow. Having it a built in setting though, allows it to
only waste time when it's actually needed, and becomes a simple issue of
subtracting the earliest target time from the current clock time
(which I believe is what TCL does right now, from what I've read in
these threads?) and clamping the resultant interval between 0 and the
new configurable heart-beat ticker. The ticker rate could even be
configured per-timer from a system default, if desired, with the one for
the next event to fire being the one that counts (naturally).

I suppose that does mean a little bit of duplicated code, but a good
deal of that should be able to be factored out into common functions,
and having a separate event source for each type of timer means that it
gets right out of the way and doesn't need to be checked for at all
when it's not needed.

Fredderic