rolloverTime := PreviousClockValue + InterruptPeriod.
"Adjust the resumption time for every process that
was suspended before the rollover. If the resumption time
was after the previous clock value but before the projected
rollover time then adjust the resumption time to be exactly
the projected rollover time (which is zero after the rollover)."
self delayedTasks do: [:task | | t |
(t := task resumptionTime) >= PreviousClockValue
ifTrue: [task resumptionTime: ((t - rolloverTime) max: 0)]]]].
timeAdjustment := PreviousClockValue - currentTime.
self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
Note that I don't worry about the redumptionTime being beyond the PreviousClockValue, they all should be and it won't hurt if they aren't. Also, I don't worry about the redumptionTime going negative, the delay will still fire.
Your AIX experience is interesting, my experience is only with Windows.
((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [
timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
].
I've really got to get a VA environment set up at home again! I did check my old notes, and discovered that we wrestled with this problem in mid-November of 2001. The IBM team working on it found that it also affected Java applications, which wasn't a major issue for us at the time. Not a lot of other details...Tom
I am impressed that you remember much of anything about this from 2001.
As for affecting Java apps it makes sense that Java could have the same kind of code. Asking the OS to trigger an event periodically makes sense. Using the millisecond clock to keep track of delays or when to issue callbacks looks like the only game in town. And if it didn't change with time-of-day time changes (as it seems to be on Windows) it would be pretty sound. I wonder how it is/was on OS2?It would be very good if others could follow my logic and see if I am correct or off in the weeds.
This is fairly easy from the dev env, by displaying Time>millisecondClockValue, change the time from the OS and displaying Time>millisecondClockValue again.
I am surprised that it does change, as I can see no good reason for it to change and can think a good argument for it not to.
As I mentioned originally, the time changes were a Kerberos artifact. When using Kerberos tickets to provide authentication between machines in a cluster, it's vital that time be kept consistent between machines so that tickets don't expire unexpectedly. Stand-alone Windows and Unix boxes don't have such a stringent requirement on their timekeeping, so they can get by with hitting an NTP server on a regular basis.
The clock running fast was the cause of the Delays being expired prematurely (not the backward jumps) and the current code doesn't test for this (which it can't) or even test for jumping ahead.
The current code has added a new problem by ignoring small jumps backward (up to 5 minutes), which then causes delays to expire late by the amount of time of the up to 5 minute backward jump.
This can be tested by checking to see if currentTime is between PreviousClockValue and PreviousClockValue + 500.((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
On Monday, August 6, 2012 12:29:50 PM UTC-4, Louis LaBrunda wrote:The clock running fast was the cause of the Delays being expired prematurely (not the backward jumps) and the current code doesn't test for this (which it can't) or even test for jumping ahead.
Actually, Delays were being expired prematurely because of the backward jumps. As originally implemented, the delay timer was seeing that the new time was less than the old time, and therefore assuming clock rollover. Based on that assumption, it was decided that there was a major shift in time forward, and any Delays should be expired.
timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
The current code has added a new problem by ignoring small jumps backward (up to 5 minutes), which then causes delays to expire late by the amount of time of the up to 5 minute backward jump.
In our case, at least, a timer expiring a little later is less problematic than one expiring too soon. We use delays to control timeouts on things such as torque gun controls (we build cars here), and if the delay expires too soon, the assembly line worker does not have enough time to complete his or her task on the vehicle.
This can be tested by checking to see if currentTime is between PreviousClockValue and PreviousClockValue + 500.((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
I don't actually see how your code is materially different from what is already there, except for tighter values on the test range.
Tom
timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
rolloverTime := PreviousClockValue + InterruptPeriod.
If you have the time (puns are flying all over this post) try some numbers. I will try to post some numbers to show the various possibilities.
tests
add: #('Rollover' 4294967200 10 4294967400);
add: #('Jump Back' 4294967290 4294967200 4294967395);
add: #('Jump Forward' 4294966100 42949676000 4294966205).
Transcript cr.
tests do: [:t | | pcv ct adj delayTo |
pcv := t second.
ct := t third.
adj := pcv - ct + 100.
delayTo := t fourth.
Transcript show: t first; show: ' - PreviousClockValue>'; show: pcv printString;
show: ', CurrentTime>'; show: ct printString; show: ', Adjustment>'; show: adj printString;
show: ', delayToWas>'; show: delayTo printString; show: ', delayToIs>'; show: (delayTo - adj) printString; cr.
].
Rollover - PreviousClockValue>4294967200, CurrentTime>10, Adjustment>4294967290, delayToWas>4294967400, delayToIs>110
Jump Back - PreviousClockValue>4294967290, CurrentTime>4294967200, Adjustment>190, delayToWas>4294967395, delayToIs>4294967205
Jump Forward - PreviousClockValue>4294966100, CurrentTime>42949676000, Adjustment>-38654709800, delayToWas>4294966205, delayToIs>42949676005
Run this code in a workspace, play with the numbers if you like.
My latest theory (I'm trying to think of how to test it) is that a delay ends and its process is set to ready but NOT added back to its priority queue. In the case of the program where things went back to normal, maybe the process got put back into the queue. The code at the end of Delay>checkDelayedTasks that sets processes ready and puts them back in their queue is not wrapped in a critical block and interrupts are not disabled, so maybe with some bad timing, the process may not get put back in its priority queue.
On Tuesday, August 7, 2012 4:14:26 PM UTC-4, Louis LaBrunda wrote:Run this code in a workspace, play with the numbers if you like.
I'm guessing these numbers are actual observations you've gathered from your testing. I assume that you had some printStrings embedded in the #checkDelayedTasks method in order to gather them.
Based on that, I'm not sure I seen the problem.
My latest theory (I'm trying to think of how to test it) is that a delay ends and its process is set to ready but NOT added back to its priority queue. In the case of the program where things went back to normal, maybe the process got put back into the queue. The code at the end of Delay>checkDelayedTasks that sets processes ready and puts them back in their queue is not wrapped in a critical block and interrupts are not disabled, so maybe with some bad timing, the process may not get put back in its priority queue.
While I'm loathe to suggest a major architectural change, have you looked at SstCron and SstCronEntry as a way to manage your pool of tasks? Your problems may not go away, but a larger chunk of the issue is now squarely back in Instantiations code, and you're in a much better position to toss the problem to them.
Tom