timestamp in ms and 64-bit counter

pozz

unread,

Feb 6, 2020, 7:43:35 AM2/6/20

to

I need a timestamp in millisecond in linux epoch. It is a number that
doesn't fit in a 32-bits number.

I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
counter. I need to create a mixed sw/hw 64-bits counter. It's very
simple, I configure a 32-bits hw timer to run at 1kHz and increment an
uint32_t variable in timer overflow ISR.

Now I need to implement a GetTick() function that returns a uint64_t. I
know it could be difficult, because of race conditions. One solutions is
to disable interrupts, but I remember another solution.

extern volatile uint32_t ticks_high;

uint64_t
GetTick(void)
{
uint32_t h1 = ticks_high;
uint32_t l1 = hwcnt_get();
uint32_t h2 = ticks_high;

if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
else return ((uint64_t)h1 << 32);
}

Is it correct in single-tasking? In the else branch, I decided to set
the low part to zero. I think it's acceptable, because if h1!=h2, hw
counter has just wrapped-around, so it is 0... maybe 1.

What about preemptive multi-tasking? What happens if GetTick() is
preempted by another higher-priority task that calls GetTick()?

I think it's better to fix the else branch, because the higher-priority
task could take more than a few milliseconds, so the previous assumption
that hw counter is 0 (maybe 1) can be incorrect. The fix could be:

uint64_t
GetTick(void)
{
uint32_t h1 = ticks_high;
uint32_t l1 = hwcnt_get();
uint32_t h2 = ticks_high;

if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
else return ((uint64_t)h1 << 32) | hwcnt_get();
}

David Brown

unread,

Feb 6, 2020, 9:10:35 AM2/6/20

to

On 06/02/2020 13:43, pozz wrote:
> I need a timestamp in millisecond in linux epoch. It is a number that
> doesn't fit in a 32-bits number.
>
> I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
> counter. I need to create a mixed sw/hw 64-bits counter. It's very
> simple, I configure a 32-bits hw timer to run at 1kHz and increment an
> uint32_t variable in timer overflow ISR.
>
> Now I need to implement a GetTick() function that returns a uint64_t. I
> know it could be difficult, because of race conditions. One solutions is
> to disable interrupts, but I remember another solution.
>
> extern volatile uint32_t ticks_high;
>
> uint64_t
> GetTick(void)
> {
> uint32_t h1 = ticks_high;
> uint32_t l1 = hwcnt_get();
> uint32_t h2 = ticks_high;
>
> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
> else return ((uint64_t)h1 << 32);
> }

I presume your second line should be "return ((uint64_t)h2 << 32);"

>
> Is it correct in single-tasking? In the else branch, I decided to set
> the low part to zero. I think it's acceptable, because if h1!=h2, hw
> counter has just wrapped-around, so it is 0... maybe 1.

That is a bit of an assumption, but is probably okay.

I tend to use a loop:

uint64_t getTick(void) {
uint32_t high1 = ticks_high;
while (true) {
uint32_t low = hwcnt_get();
uint32_t high2 = ticks_high;
if (high1 == high2) {
return ((uint64_t) high1 << 32) | low;
}
high1 = high2;
}
}

That will be more accurate if there is a risk of multiple low-part ticks
while executing getTick() (due to interrupts, other tasks, etc.).

>
> What about preemptive multi-tasking? What happens if GetTick() is
> preempted by another higher-priority task that calls GetTick()?
>
> I think it's better to fix the else branch, because the higher-priority
> task could take more than a few milliseconds, so the previous assumption
> that hw counter is 0 (maybe 1) can be incorrect. The fix could be:
>
> uint64_t
> GetTick(void)
> {
> uint32_t h1 = ticks_high;
> uint32_t l1 = hwcnt_get();
> uint32_t h2 = ticks_high;
>
> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
> else return ((uint64_t)h1 << 32) | hwcnt_get();
> }

Again, fix it to use h2.

That would be better, IMHO.

The loop version is useful for faster timers and lower resolution. With
a 32-bit millisecond counter, you wrap after 98 days (IIRC). So two
reads are the most you will need unless your task is pre-empted for
about 3 months, and I suspect that would indicate a more serious design
fault!

pozz

unread,

Feb 6, 2020, 10:10:10 AM2/6/20

to

Il 06/02/2020 15:10, David Brown ha scritto:
> On 06/02/2020 13:43, pozz wrote:
>> I need a timestamp in millisecond in linux epoch. It is a number that
>> doesn't fit in a 32-bits number.
>>
>> I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
>> counter. I need to create a mixed sw/hw 64-bits counter. It's very
>> simple, I configure a 32-bits hw timer to run at 1kHz and increment an
>> uint32_t variable in timer overflow ISR.
>>
>> Now I need to implement a GetTick() function that returns a uint64_t. I
>> know it could be difficult, because of race conditions. One solutions is
>> to disable interrupts, but I remember another solution.
>>
>> extern volatile uint32_t ticks_high;
>>
>> uint64_t
>> GetTick(void)
>> {
>> uint32_t h1 = ticks_high;
>> uint32_t l1 = hwcnt_get();
>> uint32_t h2 = ticks_high;
>>
>> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
>> else return ((uint64_t)h1 << 32);
>> }
>
> I presume your second line should be "return ((uint64_t)h2 << 32);"

Oh yes, my error.

>> Is it correct in single-tasking? In the else branch, I decided to set
>> the low part to zero. I think it's acceptable, because if h1!=h2, hw
>> counter has just wrapped-around, so it is 0... maybe 1.
>
> That is a bit of an assumption, but is probably okay.
>
> I tend to use a loop:
>
> uint64_t getTick(void) {
> uint32_t high1 = ticks_high;
> while (true) {
> uint32_t low = hwcnt_get();
> uint32_t high2 = ticks_high;
> if (high1 == high2) {
> return ((uint64_t) high1 << 32) | low;
> }
> high1 = high2;
> }
> }
>
> That will be more accurate if there is a risk of multiple low-part ticks
> while executing getTick() (due to interrupts, other tasks, etc.).

In single-tasking there aren't other tasks and interrupts should take
less than 1ms... at least, I usually tend to have very fast ISRs.

Anyway your implementation is good for multi-tasking too.

>> What about preemptive multi-tasking? What happens if GetTick() is
>> preempted by another higher-priority task that calls GetTick()?
>>
>> I think it's better to fix the else branch, because the higher-priority
>> task could take more than a few milliseconds, so the previous assumption
>> that hw counter is 0 (maybe 1) can be incorrect. The fix could be:
>>
>> uint64_t
>> GetTick(void)
>> {
>> uint32_t h1 = ticks_high;
>> uint32_t l1 = hwcnt_get();
>> uint32_t h2 = ticks_high;
>>
>> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
>> else return ((uint64_t)h1 << 32) | hwcnt_get();
>> }
>
> Again, fix it to use h2.

Yes.

> That would be better, IMHO.
>
> The loop version is useful for faster timers and lower resolution. With
> a 32-bit millisecond counter, you wrap after 98 days (IIRC). So two
> reads are the most you will need unless your task is pre-empted for
> about 3 months, and I suspect that would indicate a more serious design
> fault!
>

Of course!

Rick C

unread,

Feb 6, 2020, 1:02:49 PM2/6/20

to

All this seems to be more complex than it needs to be. You guys are focused on limitations when things happen fast. Do you know how slow things can happen?

A 32 bit counter incremented at 1 kHz will roll over every 3 years or so. If you can assure that the GetTick is called once every 3 years simpler code can be used.

Have a 64 bit counter value which is the 32 bit counter incremented by the 1kHz interrupt and another 32 bit counter (the high part of the 64 bits) which is incremented when needed in the GetTick code.

uint64_t
GetTick(void)
{
static uint32_t ticks_high;
uint32_t ticks_hw= hwcnt_get();
static uint32_t ticks_last;

if (ticks_last > ticks_hw) ticks_high++;
ticks_last = ticks_hw;
return ((uint64_t)ticks_high << 32) | ticks_hw;
}

I'm not so conversant in C and I'm not familiar with the conventions of using time variables. Clearly the time will need to be initialized by some means and ticks_high would need to be initialized to correspond to the current time/date, unless this is a run time variable only tracking time since boot up.

Is the "call this code at least once in 3 years" requirement reasonable? In the systems I design that would not be a problem.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Bernd Linsel

unread,

Feb 6, 2020, 2:12:56 PM2/6/20

to

Rick C wrote:
> On Thursday, February 6, 2020 at 7:43:35 AM UTC-5, pozz wrote:
>
> All this seems to be more complex than it needs to be. You guys are focused on limitations when things happen fast. Do you know how slow things can happen?
>
> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.

2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)

> [snipped]

regards,
Bernd

Robert Wessel

unread,

Feb 6, 2020, 2:20:08 PM2/6/20

to

A 1Khz 32-bit counter rolls over about every 49.7 days.

Rick C

unread,

Feb 6, 2020, 3:10:44 PM2/6/20

to

Yes, I forgot to divide by 24 hours. Nevermind... lol

Actually, it still works, just a shorter minimum time.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Joe Chisolm

unread,

Feb 6, 2020, 4:34:42 PM2/6/20

to

In your timer ISR
low32 += 1
if (low32 == 0) high32 += 1;

in your GetTick()

do {
l1 = low32;
h1 = high32;
while (l1 > low32);

return your or'd h1 and l1

It still can be incorrect with out disabling interrupts. Consider:
timer interrupt priority 1
dma interrupt priority 2
your code lowest priority
In the middle of your h1 == h2 the dma interrupt goes off.
In the dma routine your timer interrupt goes off
You unwind back to your code and your h2 and possibly h1 are
incorrect.

--
Chisolm
Texas-American

David Brown

unread,

Feb 6, 2020, 4:35:56 PM2/6/20

to

The potential problem with this is if the code is used re-entrantly -
i.e., you have multiple threads that each use GetTick() and more than
one calls it during the cross-over point. It's highly unlikely to
happen - there is only a risk once every 49 days (I got that wrong too).
But you don't want to leave these little potential race conditions in
the code - you'll never find them in testing, but they always happen at
your most important customer site.

I think it's quite likely that the code already has a 1 KHz interrupt
routine doing other things, so incrementing a "high" counter there would
not be an issue. But if there is no interrupt for other purposes, then
it is a nice idea to do the update during the GetTick call like this.
However, you need locking (or a more advanced lockfree solution, but
that would likely be a fair bit less efficient on a microcontroller. It
might be worth considering on a multi-processor system).

The easiest method is disabling interrupts in a double-checked lock. If
your system timing can't handle a few cycles of disabled interrupts
every 49 days, you should be worrying about the rest of the design.

I think this should do it:

uint64_t GetTick(void)
{
static volatile uint32_t ticks_high;
static volatile uint32_t ticks_last;

uint32_t ticks_hw = hwcnt_get();

if (ticks_last > ticks_hw) {
uint32_t old_processor_state = disableInterrupts();
ticks_hw = hwcnt_get();

if (ticks_last > ticks_hw) {
ticks_high++;
ticks_last = ticks_hw;
}

restoreInterrupts(old_processor_state);
}
return ((uint64_t)ticks_high << 32) | ticks_hw;
}

pozz

unread,

Feb 6, 2020, 5:16:24 PM2/6/20

to

low32 is the hw counter that runs at 1kHz and generates and interrupt
every 2^32 ms. It isn't a software variable, it's a register.

You are supposing to use an hw counter that generates interrupt at 1kHz,
but this is my case.

> in your GetTick()
>
> do {
> l1 = low32;
> h1 = high32;
> while (l1 > low32);
>
> return your or'd h1 and l1
>
> It still can be incorrect with out disabling interrupts. Consider:
> timer interrupt priority 1
> dma interrupt priority 2
> your code lowest priority
> In the middle of your h1 == h2 the dma interrupt goes off.
> In the dma routine your timer interrupt goes off
> You unwind back to your code and your h2 and possibly h1 are
> incorrect.
>

I couldn't get your point. Ok, when the low priority code is back after
interrupts, h1, h2 and l1 could be incorrect, but they are coherent
among them, so the result of GetTick() is still valid, maybe slightly
inaccurate.

Rick C

unread,

Feb 6, 2020, 7:29:18 PM2/6/20

to

Yes, you are absolutely right about that.

> I think it's quite likely that the code already has a 1 KHz interrupt
> routine doing other things, so incrementing a "high" counter there would
> not be an issue.

Actually, there is no need for a 1 kHz interrupt. I believe the OP has a hardware counter ticking at 1 kHz so the overflow event would be 49 days. There may be a faster interrupt for some other purpose, but not needed for the hardware timer which may well be run on a low power oscillator and backup battery.

> But if there is no interrupt for other purposes, then
> it is a nice idea to do the update during the GetTick call like this.
> However, you need locking (or a more advanced lockfree solution, but
> that would likely be a fair bit less efficient on a microcontroller. It
> might be worth considering on a multi-processor system).

My intent was to do something just plain simple, but in multitasking system it is not so simple. I don't typically use complications like interrupts. I do FPGA work where I roll the hardware for the peripherals as well as designing the instruction set for the processor. I seldom use multitasking other than potentially interrupts which usually don't need to use locking and such. If resources are not shared locking is not required.

> The easiest method is disabling interrupts in a double-checked lock. If
> your system timing can't handle a few cycles of disabled interrupts
> every 49 days, you should be worrying about the rest of the design.
>
> I think this should do it:
>
> uint64_t GetTick(void)
> {
> static volatile uint32_t ticks_high;
> static volatile uint32_t ticks_last;
>
> uint32_t ticks_hw = hwcnt_get();
>
> if (ticks_last > ticks_hw) {
> uint32_t old_processor_state = disableInterrupts();
> ticks_hw = hwcnt_get();
> if (ticks_last > ticks_hw) {
> ticks_high++;
> ticks_last = ticks_hw;
> }
> restoreInterrupts(old_processor_state);
> }
> return ((uint64_t)ticks_high << 32) | ticks_hw;
> }

That works.

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

Clifford Heath

unread,

Feb 6, 2020, 8:37:34 PM2/6/20

to

Joe is saying read the LSB then the MSB, then read the LSB again to be
sure it hasn't changed due to a tick. If it has, try again until it
doesn't change.

If LSB has changed, MSB might have. If LSB hasn't changed, and you
haven't waited for an entire cycle of the LSB register, then you know
MSB is good.

If you've repeated the loop until LSB cycled to exactly where you were
before, you have bigger problems.

However, Joe got it wrong. The do... while loop should use not-equal,
not >. Otherwise if you read LSB as it rolls over l1 is < low32, and
you're broken because you didn't detect that.

Clifford Heath.

Richard Damon

unread,

Feb 6, 2020, 11:57:35 PM2/6/20

to

If the two reads of the high tick counter are different, then the point
in time when the high tick counter had its higher value, and the lower
tick counter was 0 had to have occurred between the two reads of the
high tick counter, so it is a valid time point that occurred during the
call to GetTick(). The fact that it might be 'stale' compared to the
return time is no different than if it returned a totally accurate
time-stamp, but just after the return the higher priority task interrupt
this task.

(Note, you need to use h2, not h1, to build the 64 bit time stamp, h1
with a 0 bottom word is likely outside your call window, as you likely
did the first read with the counter at max, and the second after it
wrapped to 0.)

pozz

unread,

Feb 7, 2020, 3:09:55 AM2/7/20

to

Ok, but he was saying *his* approach (while loop) is *still* incorrect
without disabling interrupts. I don't think so.

> If LSB has changed, MSB might have. If LSB hasn't changed, and you
> haven't waited for an entire cycle of the LSB register, then you know
> MSB is good.
>
> If you've repeated the loop until LSB cycled to exactly where you were
> before, you have bigger problems.
>
> However, Joe got it wrong. The do... while loop should use not-equal,
> not >. Otherwise if you read LSB as it rolls over l1 is < low32, and
> you're broken because you didn't detect that.

Sure!

pozz

unread,

Feb 7, 2020, 3:27:39 AM2/7/20

to

Another consideration.

Bugs that *could* happen every 49 days are critical, because they could
really happen even more rarely.

If you have a ms resolution, I think it's better to work with a 16-bits
hw counter, coupled with a 32-bits sw counter.

In this way, the potential bug could happen much more frequently (every
minute) and the 48-bits timestamp could represents datetime until... I
don't know exactly, but I don't worry what it is.

David Brown

unread,

Feb 7, 2020, 4:43:46 AM2/7/20

to

On 07/02/2020 01:29, Rick C wrote:
> On Thursday, February 6, 2020 at 4:35:56 PM UTC-5, David Brown
> wrote:
>> On 06/02/2020 19:02, Rick C wrote:

>
>> I think it's quite likely that the code already has a 1 KHz
>> interrupt routine doing other things, so incrementing a "high"
>> counter there would not be an issue.
>
> Actually, there is no need for a 1 kHz interrupt. I believe the OP
> has a hardware counter ticking at 1 kHz so the overflow event would
> be 49 days. There may be a faster interrupt for some other purpose,
> but not needed for the hardware timer which may well be run on a low
> power oscillator and backup battery.

We don't know that - the OP hasn't told us all the details. An
interrupt that hits every millisecond (or some other regular time), used
as part of the global timebase and for executing regular functions, is
very common. Maybe he has something like this, maybe not.

>
>
>> But if there is no interrupt for other purposes, then it is a nice
>> idea to do the update during the GetTick call like this. However,
>> you need locking (or a more advanced lockfree solution, but that
>> would likely be a fair bit less efficient on a microcontroller. It
>> might be worth considering on a multi-processor system).
>
> My intent was to do something just plain simple, but in multitasking
> system it is not so simple. I don't typically use complications like
> interrupts. I do FPGA work where I roll the hardware for the
> peripherals as well as designing the instruction set for the
> processor. I seldom use multitasking other than potentially
> interrupts which usually don't need to use locking and such. If
> resources are not shared locking is not required.

Fair enough. And we don't know if there is multi-tasking involved here
or not. If GetTick is only called from one thread, there is no problem.
Also if he has cooperative multitasking rather than pre-emptive, there
will be no problem. Your suggested solution is good (IMHO), but it is
important to understand its restrictions too.

David Brown

unread,

Feb 7, 2020, 4:46:37 AM2/7/20

to

That is a useful thought. It is very important to write code in a way
that it can be tested. And even then, remember that testing can only
prove the /presence/ of bugs, never to prove their /absence/.

Another trick during testing is to speed up the timers. If you can make
the 1 kHz timer run at 1 MHz for testing, you'll get similar benefits.

pozz

unread,

Feb 7, 2020, 8:29:05 AM2/7/20

to

Il 07/02/2020 10:46, David Brown ha scritto:
> On 07/02/2020 09:27, pozz wrote:
>> Another consideration.
>>
>> Bugs that *could* happen every 49 days are critical, because they could
>> really happen even more rarely.
>>
>> If you have a ms resolution, I think it's better to work with a 16-bits
>> hw counter, coupled with a 32-bits sw counter.
>>
>> In this way, the potential bug could happen much more frequently (every
>> minute) and the 48-bits timestamp could represents datetime until... I
>> don't know exactly, but I don't worry what it is.
>
> That is a useful thought. It is very important to write code in a way
> that it can be tested. And even then, remember that testing can only
> prove the /presence/ of bugs, never to prove their /absence/.

Sure!

> Another trick during testing is to speed up the timers. If you can make
> the 1 kHz timer run at 1 MHz for testing, you'll get similar benefits.

Hmmm.., most probably increasing the frequency of the timer will break
some other parts of the application (timeouts on serial lines...).

Bernd Linsel

unread,

Feb 7, 2020, 10:49:23 AM2/7/20

to

The Linux approach is still better:
Initialize the timer count variable with a value just some seconds
before it wraps.

Regards,
Bernd

Rick C

unread,

Feb 7, 2020, 12:51:57 PM2/7/20

to

On Friday, February 7, 2020 at 4:43:46 AM UTC-5, David Brown wrote:
> On 07/02/2020 01:29, Rick C wrote:
> > On Thursday, February 6, 2020 at 4:35:56 PM UTC-5, David Brown
> > wrote:
> >> On 06/02/2020 19:02, Rick C wrote:
>
> >
> >> I think it's quite likely that the code already has a 1 KHz
> >> interrupt routine doing other things, so incrementing a "high"
> >> counter there would not be an issue.
> >
> > Actually, there is no need for a 1 kHz interrupt. I believe the OP
> > has a hardware counter ticking at 1 kHz so the overflow event would
> > be 49 days. There may be a faster interrupt for some other purpose,
> > but not needed for the hardware timer which may well be run on a low
> > power oscillator and backup battery.
>
> We don't know that - the OP hasn't told us all the details. An
> interrupt that hits every millisecond (or some other regular time), used
> as part of the global timebase and for executing regular functions, is
> very common. Maybe he has something like this, maybe not.

Yes, we do know. He told us in the first post.

"I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw counter. I need to create a mixed sw/hw 64-bits counter. It's very simple, I configure a 32-bits hw timer to run at 1kHz and increment an uint32_t variable in timer overflow ISR."

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

Jim Jackson

unread,

Feb 7, 2020, 5:04:03 PM2/7/20

to

>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
>
> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)

Ah yes. Early versions of Windows NT. Crashed if they had an uptime
of 49 and a bit days - I wonder why :-)

Mind you getting early NT to stay up that long without crashing or needing
a reboot was bloody difficult.

upsid...@downunder.com

unread,

Feb 8, 2020, 1:39:48 AM2/8/20

to

Which NT version was that ?

My NT 3.51 very seldom needed reboots. In many years I booted it only
three time after, Eastern, Christmas and summer vacations, since I did
not want to leave the computer unattended for a week or more at a
time.

Jim Jackson

unread,

Feb 8, 2020, 7:09:22 AM2/8/20

to

On 2020-02-08, upsid...@downunder.com <upsid...@downunder.com> wrote:
> On Fri, 7 Feb 2020 22:03:59 -0000 (UTC), Jim Jackson
><j...@franjam.org.uk> wrote:
>
>>>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
>>>
>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>
>>Ah yes. Early versions of Windows NT. Crashed if they had an uptime
>>of 49 and a bit days - I wonder why :-)
>>
>>Mind you getting early NT to stay up that long without crashing or needing
>>a reboot was bloody difficult.
>
> Which NT version was that ?

Sorry, I wasn't part of the MS team, but it was a very early version. And
it was a long time ago. I remember someone did the calculation above - and
it was reported to MS support. I think there was no feedback other than, a
set of patches later on.

The Windows NT file servers also went on a temporary go slow periodically,
which had our MS team baffled for a while. Eventually someone twigged that
it was when you went to the graphical console and interrupted the
screensaver that the sluggish performance went away - yes the screensaver
was hogging the CPU and hitting fileserve performance! Needless to say the
Unix team sniggered at that.

David Brown

unread,

Feb 8, 2020, 10:24:33 AM2/8/20

to

I believe it was Windows 95 that had this problem - and it was not
discovered until about 2005, because no one had kept Windows 95 running
for 49 days.

Maybe early NT had a similar fault, of course. But people /did/ have NT
running for long uptimes from very early on, so such a bug would have
been found fairly quickly.

David Brown

unread,

Feb 8, 2020, 10:27:14 AM2/8/20

to

That's another good approach, yes.

pozz

unread,

Feb 8, 2020, 11:00:25 AM2/8/20

to

This helps if the bug is deterministic. If it isn't and it doesn't
happen after startup at the first wrap-around, it takes 49 days to have
another possibility to see the bug.

upsid...@downunder.com

unread,

Feb 8, 2020, 11:37:38 AM2/8/20

to

On Sat, 8 Feb 2020 16:24:30 +0100, David Brown
<david...@hesbynett.no> wrote:

>On 07/02/2020 23:03, Jim Jackson wrote:
>>>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
>>>
>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>
>> Ah yes. Early versions of Windows NT. Crashed if they had an uptime
>> of 49 and a bit days - I wonder why :-)
>>
>> Mind you getting early NT to stay up that long without crashing or needing
>> a reboot was bloody difficult.
>>
>
>I believe it was Windows 95 that had this problem - and it was not
>discovered until about 2005, because no one had kept Windows 95 running
>for 49 days.

That is a more believable explanation.

>Maybe early NT had a similar fault, of course. But people /did/ have NT
>running for long uptimes from very early on, so such a bug would have
>been found fairly quickly.

Both VAX/VMS as well as Windows NT use 100 ns as the basic unit for
time of the day timing.

On Windows NT on single processor the interrupt rate was 100 Hz, on
muliprocessor 64 Hz.

Some earlier Windows versions used 55 Hz (or was it 55 ms) clock
interrupt rate, so I really don't understand from where the 1 ms clock
tick or 49 days is from.

Kent Dickey

unread,

Feb 8, 2020, 12:03:31 PM2/8/20

to

In article <r1h1lj$i5f$1...@dont-email.me>, pozz <pozz...@gmail.com> wrote:
>I need a timestamp in millisecond in linux epoch. It is a number that
>doesn't fit in a 32-bits number.
>

>I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
>counter. I need to create a mixed sw/hw 64-bits counter. It's very
>simple, I configure a 32-bits hw timer to run at 1kHz and increment an
>uint32_t variable in timer overflow ISR.
>

>Now I need to implement a GetTick() function that returns a uint64_t. I
>know it could be difficult, because of race conditions. One solutions is
>to disable interrupts, but I remember another solution.

This is actually a very tricky problem. I believe it is not possible to
solve it with the constraints you have laid out above. David Brown's solution
in his GetTick() function is correct, but it doesn't discuss why.

If you have a valid 64-bit counter which you can only reference 32-bits at
a time (which I'll make functions, read_high32() and read_low32(), but these
can be hardware registers, volatile globals, or real functions), then an
algorithm to read it reliably is basically your original algorithm:

uint64_t
GetTick()
{
old_high32 = read_high32();
while(1) {
low32 = read_low32();
new_high32 = read_high32();
if(new_high32 == old_high32) {
return ((uint64_t)new_high32 << 32) | low32;
}
old_high32 = new_high32;
}
}

This code does not need to mask interrupts, and it works on multiple CPUs.
This works even if interrupts occur at any point for any duration, even
if the code is interrupted for more than 49 days.

However, you don't have a valid 64-bit counter you can only read 32-bits at a
time. You have a free-running hardware counter which read_low32() returns.
It counts up every 1ms, and eventually wraps from 0xffff_ffff to 0x0000_0000
and causes an interrupt (which lots of people have helpfully calculated at
about 49 days). Let's assume that interrupt calls this handler:

volatile uint32_t ticks_high = 0;
void
timer_wrap_interrupt()
{
ticks_high++;
}

where by convention only this code will write to ticks_high (this is a very
important limitation). And so my function read_high32() is simply:
{ return ticks_high; }.

Unfortunately, with this design, I believe it is not possible to implement
a GetTick() function which does not sometimes fail to return a correct time.
There is a fundamental race between the interrupt and the timer value rolling
to 0 which software cannot account for.

The problem is it's possible for software to read the HW counter and see it
has rolled over from 0xffff_ffff to 0 BEFORE the interrupt occurs which
increments ticks_high. This is an inherent race: the timer wraps to 0, and
signals an interrupt. It's possible, even if for only a few cycles, to
read the register and see the zero before the interrupt is taken.

Shown more explicitly, the following are all valid states (let's assume
ticks_high is 0, read_low32() just ticked to 0xffff_fffe):

Time read_low32() ticks_high
-------------------------------------------------
0 0xffff_fffe 0
1ms 0xffff_ffff 0
1.99999ms 0xffff_ffff 0
2ms 0x0000_0000 0
Interrupt is sent and is now pending
2ms+delta 0x0000_0000 1

The issue is: what is "delta", and can other code (including your GetTick()
function) run between "2ms" and "2ms+delta"? And the answer is almost
assuredly "yes". This is a problem.

The GetTick() routine above can read g_high32==0, read_low32()==0, and then
g_high32==0 again at around time 2ms+small_amount, and return 0, even though
a cycle or two ago, read_low32() returned 0xffff_ffff. So time appears to
jump backwards 49 days when this happens.

There are a variety of solutions to this problem, but they all involve
extra work and ignoring the 32-bit rollover interrupt. So, remove
timer_wrap_interrupt(), and then do:

1) Have a single GetTick() routine, which is single-tasking (by
disabling interrupts, or a mutex if there are multiple processors).
This requires something to call GetTick() at least once every 49 days
(worst case). This is basically the Rich C./David Brown solution, but
they don't mention that you need to remove the interrupt on 32-bit overflow.

2) Use a higher interrupt rate. For instance, if we can take the interrupt
when read_low32() has carry from bit 28 to bit 29, then we can piece together
code which can work as long as GetTick() isn't delayed by more than 3-4 days.
This require GetTick() to change using code given under #4 below.

3) Forget the hardware counter: just take an interrupt every 1ms, and
increment a global variable uint64_t ticks64 on each interrupt, and then
GetTick just returns ticks64. This only works if the CPU hardware supports
atomic 64-bit accesses. It's not generally possible to write C code for a
32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
the interrupt handler deal with two 32-bit variables ticks_low and
ticks_high, and then you still need the GetTicks() to have a while loop to
read the two variables.

4) Use a regular existing interrupt which occurs at any rate, as long as it's
well over 1ms, and well under 49 days. Let's assume you have a 1-second
interrupt. This can be asynchronous to the 1ms timer. In that interrupt
handler, you sample the 32-bit hardware counter, and if you notice it
wrapping (previous read value > new value), increment ticks_high.
You need to update the global volatile variable ticks_low as well as the
current hw count. And this interrupt handler needs to be the only code
changing ticks_low and ticks_high. Then, GetTick() does the following:

uint32_t local_ticks_low, local_ticks_high;
[ while loop to read valid ticks_low and ticks_high value into the
local_* variables ]
uint64_t ticks64 = ((uint64_t)local_ticks_high << 32) | local_ticks_low;
ticks64 += (int32_t)(read_low32() - local_ticks_low);
return ticks64;

Basically, we return the ticks64 from the last regular interrupt, which could
be 1 second ago, and we add in the small delta from reading the hw counter.
Again, this requires the 1-second interrupt to be guaranteed to happen before
we get close to 49 days since the last 1-second interrupt (if it's really
a 1-second interrupt, it easily meets that criteria. If you try to pick
something irregular, like a keypress interrupt, then that won't work). It
does not depend on the exact rate of the interrupt at all.

I wrote it above with extra safety--It subtracts two 32-bit unsigned variables,
gets a 32-bit unsigned result, treats that as a 32-bit signed result, and adds
that to the 64-bit unsigned ticks count. It's not strictly necessary to do
the 32-bit signed result cast: it just makes the code more robust in case
the HW timer moves backwards slightly. Imagine some code tries to adjust the
current timer value by setting it backwards slightly (say, some code trying
to calibrate the timer with the RTC or something). Without the cast to
32-bit signed int, this slight backwards move would result in ticks64
jumping ahead 49 days, which would be bad. In C, this is pretty easy, but it
should be carefully commented so no one removes any important casts.

Kent

Richard Damon

unread,

Feb 8, 2020, 12:35:01 PM2/8/20

to

On 2/8/20 12:03 PM, Kent Dickey wrote:
> Shown more explicitly, the following are all valid states (let's assume
> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>
> Time read_low32() ticks_high
> -------------------------------------------------
> 0 0xffff_fffe 0
> 1ms 0xffff_ffff 0
> 1.99999ms 0xffff_ffff 0
> 2ms 0x0000_0000 0
> Interrupt is sent and is now pending
> 2ms+delta 0x0000_0000 1
>
> The issue is: what is "delta", and can other code (including your GetTick()
> function) run between "2ms" and "2ms+delta"? And the answer is almost
> assuredly "yes". This is a problem.

But, as long as the timing is such that we can not do BOTH the
read_low32() and the read of ticks_high in that delta, we can't get the
wrong number.

This is somewhat a function of the processor, and how much the
instruction pipeline 'skids' when an interrupt occurs. The processor
that he mentioned, A STM32L4R9, which uses an M4 processor, doesn't have
this much of a skid, so that can't be a problem unless you do something
foolish like disable the interrupts while doing the sequence.

If we put a proper barrier instruction between the read low command and
the second read high (and we may need that just to avoid getting a
cached value that we read from the first read), and declare it as
volatile so the compiler doesn't do its own caching, then the problem
doesn't occur. Again, not a problem on his processor, as it is a single
core processor (but we still need the volatile).

Rick C

unread,

Feb 8, 2020, 12:41:36 PM2/8/20

to

On Saturday, February 8, 2020 at 10:24:33 AM UTC-5, David Brown wrote:
> On 07/02/2020 23:03, Jim Jackson wrote:
> >>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
> >>
> >> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
> >
> > Ah yes. Early versions of Windows NT. Crashed if they had an uptime
> > of 49 and a bit days - I wonder why :-)
> >
> > Mind you getting early NT to stay up that long without crashing or needing
> > a reboot was bloody difficult.
> >
>
> I believe it was Windows 95 that had this problem - and it was not
> discovered until about 2005, because no one had kept Windows 95 running
> for 49 days.

49 days? You mean 49 minutes?

> Maybe early NT had a similar fault, of course. But people /did/ have NT
> running for long uptimes from very early on, so such a bug would have
> been found fairly quickly.

Never used NT, but I used W2k and it was great! W2k was widely pirated so MS started a phone home type of licensing with XP which was initially not well received, but over time became accepted. Now people reminisce about the halcyon days of XP.

Networking under W2k required a lot of manual setting up. But it was not hard to do. A web site, World of Windows Networking made it easy until it was bought and ruined with advertising and low quality content.

Now I have trouble just getting to Win10 computers to share a file directory.

--

Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209

David Brown

unread,

Feb 8, 2020, 1:57:54 PM2/8/20

to

On 08/02/2020 18:41, Rick C wrote:
> On Saturday, February 8, 2020 at 10:24:33 AM UTC-5, David Brown
> wrote:
>> On 07/02/2020 23:03, Jim Jackson wrote:
>>>>> A 32 bit counter incremented at 1 kHz will roll over every 3
>>>>> years or so.
>>>>
>>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>>
>>> Ah yes. Early versions of Windows NT. Crashed if they had an
>>> uptime of 49 and a bit days - I wonder why :-)
>>>
>>> Mind you getting early NT to stay up that long without crashing
>>> or needing a reboot was bloody difficult.
>>>
>>
>> I believe it was Windows 95 that had this problem - and it was not
>> discovered until about 2005, because no one had kept Windows 95
>> running for 49 days.
>
> 49 days? You mean 49 minutes?

Oh, Win95 OSR 2 was not /too/ bad. It could keep going long enough to
play a game or too. (For actual work, I moved from Win3.1 to OS/2,
until NT 4.0 came out.)

>
>
>> Maybe early NT had a similar fault, of course. But people /did/
>> have NT running for long uptimes from very early on, so such a bug
>> would have been found fairly quickly.
>
> Never used NT, but I used W2k and it was great! W2k was widely
> pirated so MS started a phone home type of licensing with XP which
> was initially not well received, but over time became accepted. Now
> people reminisce about the halcyon days of XP.

Did you not use NT 4.0 ? It was quite solid. W2K was also good, but XP
took a few service packs before it became reliable enough for serious use.

>
> Networking under W2k required a lot of manual setting up. But it was
> not hard to do. A web site, World of Windows Networking made it easy
> until it was bought and ruined with advertising and low quality
> content.

I don't remember any significant issues with networking with W2K. I see
a lot more now with Win10, and even Win7 when people have forgotten to
turn off the automatic updates.

>
> Now I have trouble just getting to Win10 computers to share a file
> directory.
>

Indeed. And a recent update just stopped local names on the network
working properly with DNS - Windows 10 has just decided that you need to
use the full name (with the ".network.net" or whatever added), or you
have to manually force it to use that suffix automatically. The DNS and
DHCP setup we have has been working happily for many years - I don't
know what MS have done to screw it up in the latest Win 10 and Win 7
updates.

Kent Dickey

unread,

Feb 8, 2020, 2:33:47 PM2/8/20

to

In article <5jC%F.78169$8Y7....@fx05.iad>,

Richard Damon <Ric...@Damon-Family.org> wrote:
>On 2/8/20 12:03 PM, Kent Dickey wrote:
>> Shown more explicitly, the following are all valid states (let's assume
>> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>>
>> Time read_low32() ticks_high
>> -------------------------------------------------
>> 0 0xffff_fffe 0
>> 1ms 0xffff_ffff 0
>> 1.99999ms 0xffff_ffff 0
>> 2ms 0x0000_0000 0
>> Interrupt is sent and is now pending
>> 2ms+delta 0x0000_0000 1
>>
>> The issue is: what is "delta", and can other code (including your GetTick()
>> function) run between "2ms" and "2ms+delta"? And the answer is almost
>> assuredly "yes". This is a problem.
>
>But, as long as the timing is such that we can not do BOTH the
>read_low32() and the read of ticks_high in that delta, we can't get the
>wrong number.
>
>This is somewhat a function of the processor, and how much the
>instruction pipeline 'skids' when an interrupt occurs. The processor
>that he mentioned, A STM32L4R9, which uses an M4 processor, doesn't have
>this much of a skid, so that can't be a problem unless you do something
>foolish like disable the interrupts while doing the sequence.

The interrupt skid matters for how large the window is, but the problem
happens even if the "skid" was 0.

Look at it this way: the hardware counter logic is something like:

always @(posedge clk) begin
if(do_inc) begin
cntr += 1;
if(cntr == 0) begin
interrupt = 1;
end
end
end

Then at cycle 0 cntr=ffff_ffff and do_inc=0. At cycle 1, do_inc=1 and cntr=0
and interrupt=1.

In that cycle, software could read cntr=0. The interrupt CANNOT have taken
place yet since interrupts aren't instaneous--the signal hasn't even made it
to the interrupt controller yet, it's just this clock module has decided to
request an interrupt. (The ARM GIC support asynchronous interrupts, so it
takes several clocks just for it to register the interrupt).

This is always somewhat a function of the processor, but the problem is
inherent to all CPUs. A simple 6502 or 8086 or whatever has the same problem
and cannot fix it easily either.

The hardware cannot get this case right without some extreme craziness. That
would be a pre-interrupt detection circuit, prepared to drive the interrupt
early so the CPU reacts in time.

The right way to look at it--hardware interrupts are delayed tens or
hundreds of cycles always from when you think they happen to when you receive
it. Then you'll get your algorithms right.

Kent

Message has been deleted

Rick C

unread,

Feb 8, 2020, 3:12:15 PM2/8/20

to

That's why stuff that is hard on a CPU is so easy in an FPGA. Even if I have a soft core in my FPGA design I know the clock cycle timing of the CPU and it doesn't have the many cycles of delay to processing an interrupt. On my design the next clock after an interrupt is asserted the CPU is fetching the first instruction of the interrupt handler as well as saving the return address and status register.

Commercial CPUs provide a bunch of hard to use features like interrupts with priority, etc. because programmers think of the CPU cycles as being a precious commodity and so want to make a single CPU do many things. The CPU is often much smaller than the memory on the die and all of it is inordinately inexpensive really.

On an FPGA it is easier to just provide the 64 bit counter in the first place. lol But if you want, it is easy to make the counter interrupt the CPU before the counter rolls over and the software can increment the upper half at the correct time as well as make the full 64 bits an atomic read.

This thread is a perfect example of why I prefer FPGAs for most applications.

--

Rick C.

++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209

Robert Wessel

unread,

Feb 8, 2020, 3:30:42 PM2/8/20

to

The "tick count" in the (Win32) OS was always 1000Hz (as reported by
GetTickCount(), for example). The physical ticks were massaged to
correctly update that count.

Robert Wessel

unread,

Feb 8, 2020, 3:46:28 PM2/8/20

to

On Sat, 08 Feb 2020 11:03:23 -0600, ke...@provalid.com (Kent Dickey)
wrote:

If you have an atomic 32-bit read, and a 32-bit compare-and-swap, it's
not too hard. I had to deal with this in Windows back before the OS
grew a GetTickCount64().

The basic idea is to store a 32-bit extension word, with* 8 bits that
you match to the top of system tick, plus 24 bit of extension (IOW,
you'll get a 56-bit tick out of the deal).

Basically, you get the system tick, then read the extension word, and
compare the high bytes. If they're the same (IOW, the extension word
was set in the same epoch as the system tick is currently in), just
concatenate the low 24 bits of the extension word to the left of the
system tick, and return that.

If the high bytes are different, determine if there's been a rollover
(high byte of system tick lower than high byte of extension word).
Then update the extension word value appropriately (unconditionally
copy the system tick high byte to the extension word high byte, and
increment the low 24 bits if there was a rollover). CAS that updated
value back to extension word in memory (ignore failures). Loop to the
start of the process to re-get the extended tick.

The one thing this requires is that this routine get polled at least
once every ~49 days.

*You can fiddle the exact parameters here a bit, this is just what I
did.

Dimiter_Popoff

unread,

Feb 8, 2020, 4:18:33 PM2/8/20

to

I think he refers to the scenario where the lower 32 bits are a
hardware timer register and the upper are interrupt incremented by
software somewhere.
It is possible that you read the lower 32 bits as ffffffff then
the upper 32 bits *after* the transition ffffffff -> 0 and the
interrupt processing; there is no sensible way of knowing if that
occurred based only on reading these two 32 bit values.

If 1ms is the target the solution is easy, just do IRQ every
1 ms and handle both longwords in the IRQ service routine (while
masked). IIRC the OP suggested that in his original post.

Reading the 64 bits using 32 bit reads then is trivial, e.g. this is
how the timebase registers are read on 32 bit power (read upper, read
lower, read upper again, if changed do it all over again), I think
Kent referred to that in his previous post as well, probably more
have mentioned it on the thread.

Dimiter

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Les Cargill

unread,

Feb 8, 2020, 5:52:07 PM2/8/20

to

This is why it's important to be able to seed the timer counter.

--
Les Cargill

Robert Wessel

unread,

Feb 9, 2020, 12:10:15 AM2/9/20

to

On Sat, 8 Feb 2020 23:18:28 +0200, Dimiter_Popoff <d...@tgi-sci.com>
wrote:

The point of the described algorithm is specifically to detect and
handle the case where the base (short) timer value overflows, but
without locks, or requiring more than a single additional word in
storage.

upsid...@downunder.com

unread,

Feb 9, 2020, 12:26:01 AM2/9/20

to

With multimedia timers enabled that may be the case, however, without
them, the sleep time granulation is 10 ms, which would indicate a 100
Hz update rate. Also SetTimeAdjustment() win32 call assumes 100 Hz (or
64 Hz) update rate.

upsid...@downunder.com

unread,

Feb 9, 2020, 1:36:02 AM2/9/20

to

On Sat, 8 Feb 2020 19:57:48 +0100, David Brown
<david...@hesbynett.no> wrote:

>> Never used NT, but I used W2k and it was great! W2k was widely
>> pirated so MS started a phone home type of licensing with XP which
>> was initially not well received, but over time became accepted. Now
>> people reminisce about the halcyon days of XP.
>
>Did you not use NT 4.0 ? It was quite solid. W2K was also good, but XP
>took a few service packs before it became reliable enough for serious use.

NT 4.0 solid ??

NT4 moved graphical functions to kernel mode to speed up window
updates. When doing some operations in kernel mode on behalf of a user
mode function, the first thing that the kernel mode routine should do
is to check that the parameters passed to it were accessible from
_user_ mode. Unfortunately this was not done initially, so passing by
accident a NULL pointer to these functions crashed the whole computer,
not just the application. SP1 added these checks.

In general, each NT4 service pack introduced new bugs and soon the
next SP was released to correct the bugs introduced by the previous
SP. Thus every other SPs were actually usable.

Even NT5 beta was more stable than NT4 with most recent SP. NT5 beta
was renamed Windows 2000 before final release.

George Neuner

unread,

Feb 9, 2020, 1:49:51 AM2/9/20

to

On Sat, 08 Feb 2020 14:30:53 -0600, Robert Wessel
<robert...@yahoo.com> wrote:

Yes, but the hardware tick was at 18Hz (~55ms) up until XP and the
introduction of "multimedia" timers.

At first those "multimedia" timers were implemented by a realtime
priority thread using the CPU's cycle counter. In a quiet system you
could get down to ~50us.

However, 10+MHz HPET hardware timers were introduced in 2005 and
quickly became standard on retail systems. Support for HPET based
multimedia timers came in XPsp3 (2008).

Since Vista, if HPET is available, one channel of the timer is used to
support the system clock at 1KHz.

George

David Brown

unread,

Feb 9, 2020, 6:17:46 AM2/9/20

to

On 09/02/2020 07:35, upsid...@downunder.com wrote:
> On Sat, 8 Feb 2020 19:57:48 +0100, David Brown
> <david...@hesbynett.no> wrote:
>
>>> Never used NT, but I used W2k and it was great! W2k was widely
>>> pirated so MS started a phone home type of licensing with XP which
>>> was initially not well received, but over time became accepted. Now
>>> people reminisce about the halcyon days of XP.
>>
>> Did you not use NT 4.0 ? It was quite solid. W2K was also good, but XP
>> took a few service packs before it became reliable enough for serious use.
>
> NT 4.0 solid ??
>
> NT4 moved graphical functions to kernel mode to speed up window
> updates.

Yes. And that meant bugs in the graphics drivers could kill the whole
system, unlike in NT 3.x. And bugs in the graphics drivers were
certainly not unknown. However, with a little care it could run
reliably for long times. I don't remember ever having a software or OS
related crash or halt on our little NT 4 server.

(My NT 4 workstation eventually decided to wipe my start menu and
replace it with a single entry "eject computer", complete with icon.
And it kept asking me to insert a disk in drive C: and close the door.
But that was after many years of use and abuse.)

> When doing some operations in kernel mode on behalf of a user
> mode function, the first thing that the kernel mode routine should do
> is to check that the parameters passed to it were accessible from
> _user_ mode. Unfortunately this was not done initially, so passing by
> accident a NULL pointer to these functions crashed the whole computer,
> not just the application. SP1 added these checks.
>
> In general, each NT4 service pack introduced new bugs and soon the
> next SP was released to correct the bugs introduced by the previous
> SP. Thus every other SPs were actually usable.
>
> Even NT5 beta was more stable than NT4 with most recent SP. NT5 beta
> was renamed Windows 2000 before final release.
>

I certainly liked W2K, and found it quite reliable. But I still
remember NT 4.0 as good too.

Richard Damon

unread,

Feb 9, 2020, 3:55:48 PM2/9/20

to

I did forget the delay in the interrupt controller. With that delay, you
do have a fundamental issue between reading hardware registers and the
software counter.

A couple of solutions, some that have been mentioned:

Have a 1ms interrupt and in software keep the 64 bit counter.

I believe you can also program the counters to generate multiple
interrupts in the count cycle, if you generate on at 0 and one at a half
way point, knowing which interrupt was last seen you can tell if one is
'pending' based on the lower counter read.

Another option on that processor is to chain a couple of timers
together, so when the lower counter rolls over the upper counter counts
automatically, and I believe it handles it so there isn't a skew between
the counters. Then the read upper going direct to the hardware won't
have the issue.

pozz

unread,

Feb 9, 2020, 5:27:39 PM2/9/20

to

Il 06/02/2020 19:02, Rick C ha scritto:
> [...]
> Is the "call this code at least once in 3 years" requirement reasonable? In the systems I design that would not be a problem.

Of course this limitation isn't usually a real problem.

However there could be some situation where GetTick() is called after 49
days. For example, you can have an IoT device that starts sending data
(with timestamps) after the user make a request. And
timestamps/GetTick() is used only in the routine that sends data.

Maybe the user, after purchasing, is excited of this gadget and make the
request multiple times every day. After some weeks, he could forget to
have this gadget and maybe remember of it only after many days...

pozz

unread,

Feb 9, 2020, 5:34:26 PM2/9/20

to

Il 07/02/2020 10:43, David Brown ha scritto:
> On 07/02/2020 01:29, Rick C wrote:
>> On Thursday, February 6, 2020 at 4:35:56 PM UTC-5, David Brown
>> wrote:
>>> On 06/02/2020 19:02, Rick C wrote:
>
>>
>>> I think it's quite likely that the code already has a 1 KHz
>>> interrupt routine doing other things, so incrementing a "high"
>>> counter there would not be an issue.
>>
>> Actually, there is no need for a 1 kHz interrupt. I believe the OP
>> has a hardware counter ticking at 1 kHz so the overflow event would
>> be 49 days. There may be a faster interrupt for some other purpose,
>> but not needed for the hardware timer which may well be run on a low
>> power oscillator and backup battery.
>
> We don't know that - the OP hasn't told us all the details. An
> interrupt that hits every millisecond (or some other regular time), used
> as part of the global timebase and for executing regular functions, is
> very common. Maybe he has something like this, maybe not.

I think FreeRTOS is already configured to have a fast interrupt,
something similar to 1ms. I suspect it is used to check if some tasks,
blocked waiting the expiration of a timer, must be activated.

My first idea is to implement the 64-bits ms-resolution timestamp
counter as a completely different than OS ticks, but I think I could add
some code to OS ticks interrupt.

>>> But if there is no interrupt for other purposes, then it is a nice
>>> idea to do the update during the GetTick call like this. However,
>>> you need locking (or a more advanced lockfree solution, but that
>>> would likely be a fair bit less efficient on a microcontroller. It
>>> might be worth considering on a multi-processor system).
>>
>> My intent was to do something just plain simple, but in multitasking
>> system it is not so simple. I don't typically use complications like
>> interrupts. I do FPGA work where I roll the hardware for the
>> peripherals as well as designing the instruction set for the
>> processor. I seldom use multitasking other than potentially
>> interrupts which usually don't need to use locking and such. If
>> resources are not shared locking is not required.
>
> Fair enough. And we don't know if there is multi-tasking involved here
> or not. If GetTick is only called from one thread, there is no problem.
> Also if he has cooperative multitasking rather than pre-emptive, there
> will be no problem. Your suggested solution is good (IMHO), but it is
> important to understand its restrictions too.

Yes, I have a preemptive multi-tasking system (FreeRTOS).

pozz

unread,

Feb 9, 2020, 5:43:24 PM2/9/20

to

Il 08/02/2020 20:43, Ed Prochak ha scritto:
> On Thursday, February 6, 2020 at 1:02:49 PM UTC-5, Rick C wrote:

>> On Thursday, February 6, 2020 at 7:43:35 AM UTC-5, pozz wrote:
>>> I need a timestamp in millisecond in linux epoch. It is a number that
>>> doesn't fit in a 32-bits number.
>>>
>>> I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
>>> counter. I need to create a mixed sw/hw 64-bits counter. It's very
>>> simple, I configure a 32-bits hw timer to run at 1kHz and increment an
>>> uint32_t variable in timer overflow ISR.
>>>
>>> Now I need to implement a GetTick() function that returns a uint64_t. I
>>> know it could be difficult, because of race conditions. One solutions is
>>> to disable interrupts, but I remember another solution.
>>>

>>> extern volatile uint32_t ticks_high;
>>>
>>> uint64_t
>>> GetTick(void)
>>> {
>>> uint32_t h1 = ticks_high;
>>> uint32_t l1 = hwcnt_get();
>>> uint32_t h2 = ticks_high;
>>>
>>> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
>>> else return ((uint64_t)h1 << 32);
>>> }
>>>
>>> Is it correct in single-tasking? In the else branch, I decided to set
>>> the low part to zero. I think it's acceptable, because if h1!=h2, hw
>>> counter has just wrapped-around, so it is 0... maybe 1.
>>>
>>> What about preemptive multi-tasking? What happens if GetTick() is
>>> preempted by another higher-priority task that calls GetTick()?
>>>
>>> I think it's better to fix the else branch, because the higher-priority
>>> task could take more than a few milliseconds, so the previous assumption
>>> that hw counter is 0 (maybe 1) can be incorrect. The fix could be:
>>>
>>> uint64_t
>>> GetTick(void)
>>> {
>>> uint32_t h1 = ticks_high;
>>> uint32_t l1 = hwcnt_get();
>>> uint32_t h2 = ticks_high;
>>>
>>> if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
>>> else return ((uint64_t)h1 << 32) | hwcnt_get();
>>> }
>>
>> All this seems to be more complex than it needs to be. You guys are
>> focused on limitations when things happen fast. Do you know how
>> slow things can happen?

>>
>> A 32 bit counter incremented at 1 kHz will roll over every 3 years

>> or so. If you can assure that the GetTick is called once every
>> 3 years simpler code can be used.
>>
>> Have a 64 bit counter value which is the 32 bit counter incremented
>> by the 1kHz interrupt and another 32 bit counter (the high part of
>> the 64 bits) which is incremented when needed in the GetTick code.
>>
>> uint64_t
>> GetTick(void)
>> {
>> static uint32_t ticks_high;
>> uint32_t ticks_hw= hwcnt_get();
>> static uint32_t ticks_last;
>>
>> if (ticks_last > ticks_hw) ticks_high++;
>> ticks_last = ticks_hw;
>> return ((uint64_t)ticks_high << 32) | ticks_hw;
>> }
>>
>> I'm not so conversant in C and I'm not familiar with the
>> conventions of using time variables. Clearly the time will
>> need to be initialized by some means and ticks_high would
>> need to be initialized to correspond to the current time/date,
>> unless this is a run time variable only tracking time since boot up.

>>
>> Is the "call this code at least once in 3 years" requirement reasonable?
>> In the systems I design that would not be a problem.
>>

>> --
>>
>> Rick C.
>>
>> - Get 1,000 miles of free Supercharging
>> - Tesla referral code - https://ts.la/richard11209
>
> Good points, Rick, but this conversation has me wonder:
>
> Why use a design that is handling the high 32bits in the
> application layer and the low 32bits separately in at the ISR?

I think Rick suggested his solution, where high 32-bits are increased in
application layer, because it is simpler. As you read, increasing the
high 32-bits in ISR, force us to implement a trickier GetTick() with
multiple reads of high counter.

Unfortunately his solution doesn't work as is with preemptive scheduler
when multiple tasks call GetTick().

> Apparently you are using this only for interval timing?
> If you are looking to maintain calendar time, then you will
> need to store the high 32bits as Rick mentioned.
>
> restoring the high 32bits from nonvolatile storage is a boot up
> issue, and storing the value may require work outside the ISR.
> But is required only once in 3 years as Rick pointed out.
> But you would have some drift anyway since you have no way to
> measure the time the system is down.

At every bootup and at a regular interval I use NTP to synchronize the
internal calendar time.

pozz

unread,

Feb 9, 2020, 6:55:13 PM2/9/20

to

Il 08/02/2020 18:03, Kent Dickey ha scritto:
> [...]

> Unfortunately, with this design, I believe it is not possible to implement
> a GetTick() function which does not sometimes fail to return a correct time.
> There is a fundamental race between the interrupt and the timer value rolling
> to 0 which software cannot account for.

Good point, Kent. Thank you for your post that helps to fix some
critical bugs.

You're right, ISRs aren't executed immediately after the relative event
occurred. We should think ISR code runs after many cycles the interrupt
event.

> 1) Have a single GetTick() routine, which is single-tasking (by
> disabling interrupts, or a mutex if there are multiple processors).
> This requires something to call GetTick() at least once every 49 days
> (worst case). This is basically the Rich C./David Brown solution, but
> they don't mention that you need to remove the interrupt on 32-bit overflow.

I think you mentioned to disable interrupts to avoid any preemption from
RTOS scheduler, effectively blocking scheduler at all.
However I know it's a bad idea to enable/disable interrupts "manually"
with an RTOS.
Maybe the mutex for GetTick() is a better idea, something similar to this:

uint64_t
GetTick(void)
{
mutex_take();

static uint32_t ticks_high;
uint32_t ticks_hw = hwcnt_get();

static uint32_t ticks_last;

if (ticks_last > ticks_hw) ticks_high++;
ticks_last = ticks_hw;

mutex_give();

return ((uint64_t)ticks_high << 32) | ticks_hw;

}

> 2) Use a higher interrupt rate. For instance, if we can take the interrupt
> when read_low32() has carry from bit 28 to bit 29, then we can piece together
> code which can work as long as GetTick() isn't delayed by more than 3-4 days.
> This require GetTick() to change using code given under #4 below.
>
> 3) Forget the hardware counter: just take an interrupt every 1ms, and
> increment a global variable uint64_t ticks64 on each interrupt, and then
> GetTick just returns ticks64. This only works if the CPU hardware supports
> atomic 64-bit accesses. It's not generally possible to write C code for a
> 32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
> the interrupt handler deal with two 32-bit variables ticks_low and
> ticks_high, and then you still need the GetTicks() to have a while loop to
> read the two variables.

What about?

static volatile uint64_t ticks64;
void timer_isr(void) {
ticks64++;
}
uint64_t GetTick(void) {
uint64_t t1 = ticks64;
uint64_t t2;
while((t2 = ticks64) - t1 > 100) {
t1 = t2;
}
return t2;
}

If dangerous things happen (ISR executes during GetTick), t2-t1 is a
very big number. 100ms represent the worst case max duration of
ISRs/tasks that could preempt/interrupt GetTick. We could increase 100
even more.

> 4) Use a regular existing interrupt which occurs at any rate, as long as it's
> well over 1ms, and well under 49 days. Let's assume you have a 1-second
> interrupt. This can be asynchronous to the 1ms timer. In that interrupt
> handler, you sample the 32-bit hardware counter, and if you notice it
> wrapping (previous read value > new value), increment ticks_high.
> You need to update the global volatile variable ticks_low as well as the

> current hw count. And this hinterrupt andler needs to be the only code

> changing ticks_low and ticks_high. Then, GetTick() does the following:
>
> uint32_t local_ticks_low, local_ticks_high;
> [ while loop to read valid ticks_low and ticks_high value into the
> local_* variables ]
> uint64_t ticks64 = ((uint64_t)local_ticks_high << 32) | local_ticks_low;
> ticks64 += (int32_t)(read_low32() - local_ticks_low);
> return ticks64;

Do you mean...?

volatile uint32_t ticks_low;
volatile uint32_t ticks_high;

void interrupt_at_every_second(void)
{
uint32_t tl = get_low_ticks(); // from free-running 1ms counter
if (ticks_low > tl) {
ticks_high++;
}
ticks_low = tl;
}

uint64_t GetTick(void)
{
uint32_t h2;

uint32_t local_ticks_high = ticks_high;
uint32_t local_ticks_low = ticks_low;
while((h2 = ticks_high) != local_ticks_high) {
local_ticks_high = h2;
local_ticks_low = ticks_low;

pozz

unread,

Feb 9, 2020, 6:58:34 PM2/9/20

to

Il 09/02/2020 21:55, Richard Damon ha scritto:
> [...]

> Another option on that processor is to chain a couple of timers
> together, so when the lower counter rolls over the upper counter counts
> automatically, and I believe it handles it so there isn't a skew between
> the counters. Then the read upper going direct to the hardware won't
> have the issue.

I *believe* too, but are we completely sure?

Richard Damon

unread,

Feb 9, 2020, 7:24:19 PM2/9/20

to

It has been a bit since I have been through that processors
documentation, but I remember a choice of synchronization modes, one
which made the update simultaneous, at the cost of a bit more delay from
the trigger pulse (you would typically run the counter on the system
clock with a once per millisecond trigger pulse). If I am right about
what the documentation says, it is a clear guarantee, the person
designing the system should look that up to make sure they do it right.

Richard Damon

unread,

Feb 9, 2020, 7:31:11 PM2/9/20

to

On 2/9/20 6:55 PM, pozz wrote:
> Il 08/02/2020 18:03, Kent Dickey ha scritto:

>> 1) Have a single GetTick() routine, which is single-tasking (by
>> disabling interrupts, or a mutex if there are multiple processors).
>> This requires something to call GetTick() at least once every 49 days
>> (worst case). This is basically the Rich C./David Brown solution, but
>> they don't mention that you need to remove the interrupt on 32-bit
>> overflow.
>
> I think you mentioned to disable interrupts to avoid any preemption from
> RTOS scheduler, effectively blocking scheduler at all.
> However I know it's a bad idea to enable/disable interrupts "manually"
> with an RTOS.

The use of SHORT critical sections based on disabling interrupts is
almost never an issue, and most RTOSes that I know have that ability.
They are often given a name based on entering/exiting a critical section
as opposed to enable/disable the interrupts, in part to remind you that
they need to be well paired and the region short.

Rick C

unread,

Feb 9, 2020, 11:24:01 PM2/9/20

to

Yeah, and the reliability requirements of such consumer goods are typically not so stiff, so if they are not being used for months on end a malfunction is not unexpected. I recall having routers that needed a power cycle every week or two. I have a wifi extender that seems to work pretty well, but has needed to be power cycled a few times a year. Once in 3 years is one thing. Once every 49 days is another.

Still, such a requirement in this case doesn't really solve any problems it seems. So no need to worry with it.

--

Rick C.

--- Get 1,000 miles of free Supercharging
--- Tesla referral code - https://ts.la/richard11209

upsid...@downunder.com

unread,

Feb 10, 2020, 3:35:27 PM2/10/20

to

On Sun, 9 Feb 2020 23:34:26 +0100, pozz <pozz...@gmail.com> wrote:

>Il 07/02/2020 10:43, David Brown ha scritto:
>> On 07/02/2020 01:29, Rick C wrote:
>>> On Thursday, February 6, 2020 at 4:35:56 PM UTC-5, David Brown
>>> wrote:
>>>> On 06/02/2020 19:02, Rick C wrote:
>>
>>>
>>>> I think it's quite likely that the code already has a 1 KHz
>>>> interrupt routine doing other things, so incrementing a "high"
>>>> counter there would not be an issue.
>>>
>>> Actually, there is no need for a 1 kHz interrupt. I believe the OP
>>> has a hardware counter ticking at 1 kHz so the overflow event would
>>> be 49 days. There may be a faster interrupt for some other purpose,
>>> but not needed for the hardware timer which may well be run on a low
>>> power oscillator and backup battery.
>>
>> We don't know that - the OP hasn't told us all the details. An
>> interrupt that hits every millisecond (or some other regular time), used
>> as part of the global timebase and for executing regular functions, is
>> very common. Maybe he has something like this, maybe not.
>
>I think FreeRTOS is already configured to have a fast interrupt,
>something similar to 1ms. I suspect it is used to check if some tasks,
>blocked waiting the expiration of a timer, must be activated.
>
>My first idea is to implement the 64-bits ms-resolution timestamp
>counter as a completely different than OS ticks, but I think I could add
>some code to OS ticks interrupt.

Nice !

With a 64 bit counter with 1 ms resolution you can easily record any
event since the days of dinosaurs.

Using VAX/VMS or Windows NT 100 ns resolution, a 64 bit counter can be
used to represent a 60 000 year long period. If one sets the zero
time at JD=1 Julian date 1 (in 4714 BCE) so any historical events can
be represented with 100 ns resolutions.

Many processors have clock cycle counters. On x86 architectures, there
is a 64 bit Time Stamp Counter register, which is updated every clock
cycle. Even on a 4 GHz CPU, the counter rolls over after 136 years.
Thus the counter is barely sufficient to handle the counts during the
processor lifetime.

Robert Wessel

unread,

Feb 10, 2020, 7:40:10 PM2/10/20

to

As I mentioned elsewhere in the thread, if you have an atomic 32-bit
read, and a 32-bit CAS, you can do this without locks pretty simply.

Dimiter_Popoff

unread,

Feb 11, 2020, 6:02:37 AM2/11/20

to

And I replied without having understood what you meant :-). Sorry about
that.

Dimiter

Rick C

unread,

Feb 12, 2020, 10:27:23 AM2/12/20

to

On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
>
> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
> read, and a 32-bit CAS, you can do this without locks pretty simply.

I did a search but it didn't turn up. What's a CAS???

--

Rick C.

--+ Get 1,000 miles of free Supercharging
--+ Tesla referral code - https://ts.la/richard11209

David Brown

unread,

Feb 12, 2020, 12:49:20 PM2/12/20

to

On 12/02/2020 16:26, Rick C wrote:
> On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
>>
>> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
>> read, and a 32-bit CAS, you can do this without locks pretty simply.
>
> I did a search but it didn't turn up. What's a CAS???
>

Compare-and-swap. It is a common instruction for use in multi-threading
systems as a building block for atomic accesses and lock-free algorithms
(and for implementing locks):

<https://en.wikipedia.org/wiki/Compare-and-swap>

It corresponds roughly to the C code, executed atomically :

bool cas(uint32_t * p, uint32_t old, uint32_t new) {
if (*p == old) {
*p = new;
return true;
} else {
return false;
}
}

It is useful, but has its limits (the wikipedia page describes some, if
you are interested). In cases like this, it could be useful.

However, the OP is using an ARM - and like most (but not all) RISC cpus,
ARM does not have a CAS instruction. Instead, it has a load-link and
store-conditional pair, which is more powerful and flexible than CAS but
a little harder to use.

Rick C

unread,

Feb 12, 2020, 4:15:01 PM2/12/20

to

Someone was on my case about a self designed CPU not having some instruction that is essential for multitasking. Would this be the instruction? I'm not sure I understand. When you say *p == old, where is old kept? Is there really a stored value of old or is this a way of saying *p /= new??? In that case the code could be...

bool cas(uint32_t * p, uint32_t old, uint32_t new) {
if (*p == old) {
*p = new;
return true;
} else {

*p = new;
return false;
}
}

I write this because in my basic architecture memory is read/written on opposite phases of the CPU clock and all instructions are one clock cycle. The write is predetermined in the first phase of the clock, so the CPU can't have a RMW cycle. It can have a W/R cycle where the read data is the old data before the write. As long at the write is always done it can do the above in a single, non interruptible cycle... not that I'm contemplating performing multitasking. The code is more complex than warranted for a 600 LUT CPU. Just add another CPU. lol

Giving what you wrote more thought it seems pretty clear it has to be implemented the way you have it written.

I should it look up and learn something, lol.

--

Rick C.

-+- Get 1,000 miles of free Supercharging
-+- Tesla referral code - https://ts.la/richard11209

Robert Wessel

unread,

Feb 12, 2020, 4:44:13 PM2/12/20

to

No, the idea is to not update the word in memory unless it hasn't been
changed. The classic example is using CAS to add an item to a linked
list. You read the head pointer (that has to happen atomically, but
on most CPUs that just requires that it be aligned), construct the new
first element (most crucially the next pointer), and then if the head
pointer is unchanged, you can replace it with a pointer to the new
first item.

If the values are not equal, you don't want to update the head pointer
or you'll trash the linked list. In that case you retry the insertion
operation using the new head pointer.

CAS is intended to be safe to use to make that update, as it's atomic
- the read of the value in memory, the compare to the old value, and
the conditional update form an atomic block, and can't be interrupted
or messed with by other CPUs in the system.

CAS is pretty easy to simulate with LL/SC. In some cases you'd be
better off adjusting the algorithm to better use LL/SC. In this case
it depends on how you're accessing the low word of the timer. If you
have only a single threaded of execution, you can fake CAS by
disabling interrupts.

What ISA is this for?

Rick C

unread,

Feb 12, 2020, 7:44:16 PM2/12/20

to

Ok, this is more clear now. Wikipedia explains LL/SC pretty well. This is actually for multiple CPUs as much as multitasking. While you can just disable interrupts (assuming you can live with the interrupt latency issues) to make this work with a single CPU, if you are sharing the data structure with other CPUs the bus requires locking while these multiple transactions are happening. I assume the CPU has a signal to indicate a locked operation is happening to prevent other accesses from getting in and mucking up the works.

Is there a way to emulate this locking using semaphores? Someone I know is a big fan of Propeller CPUs which share memory and I don't know if they have such an instruction. They share memory by interleaving access.

> What ISA is this for?

Custom stack processor, related to the Forth VM. When designing FPGAs I want a CPU will deterministic timing, so 1 instruction = 1 clock cycle works well. Interrupt latency is zero or one depending on how you count it. Next cycle after an unmasked interrupt is asserted fetches the first instruction of the IRQ routine.

The CPU is not pipelined but the registers are aligned through the architecture to make it decode-execute/fetch rather than fetch-decode-execute. The fetch only depends on flags and instruction decode so it happens in parallel with the execute as far as timing is concerned. Someone insisted this was pipelined design because of these parallel parts.

It's nothing special, YAMC (Yet Another MISC CPU). I've never spent the time to optimize the design for speed. Instead I did some work to trying to hybridize the stack design with register-like access to the stack to minimize stack juggling. Once that happened, the number of instructions for the test case I was using (an IRQ for DDS calculations) dropped by either a third or half, I forget which. The big stumbling block for me is coming up with software to help write code for it. lol

--

Rick C.

-++ Get 1,000 miles of free Supercharging
-++ Tesla referral code - https://ts.la/richard11209

Robert Wessel

unread,

Feb 12, 2020, 7:57:59 PM2/12/20

to

On Wed, 12 Feb 2020 16:44:12 -0800 (PST), Rick C
<gnuarm.del...@gmail.com> wrote:

>On Wednesday, February 12, 2020 at 4:44:13 PM UTC-5, robert...@yahoo.com wrote:

>> No, the idea is to not update the word in memory unless it hasn't been
>> changed. The classic example is using CAS to add an item to a linked
>> list. You read the head pointer (that has to happen atomically, but
>> on most CPUs that just requires that it be aligned), construct the new
>> first element (most crucially the next pointer), and then if the head
>> pointer is unchanged, you can replace it with a pointer to the new
>> first item.
>>
>> If the values are not equal, you don't want to update the head pointer
>> or you'll trash the linked list. In that case you retry the insertion
>> operation using the new head pointer.
>>
>> CAS is intended to be safe to use to make that update, as it's atomic
>> - the read of the value in memory, the compare to the old value, and
>> the conditional update form an atomic block, and can't be interrupted
>> or messed with by other CPUs in the system.
>>
>> CAS is pretty easy to simulate with LL/SC. In some cases you'd be
>> better off adjusting the algorithm to better use LL/SC. In this case
>> it depends on how you're accessing the low word of the timer. If you
>> have only a single threaded of execution, you can fake CAS by
>> disabling interrupts.
>
>Ok, this is more clear now. Wikipedia explains LL/SC pretty well. This is actually for multiple CPUs as much as multitasking. While you can just disable interrupts (assuming you can live with the interrupt latency issues) to make this work with a single CPU, if you are sharing the data structure with other CPUs the bus requires locking while these multiple transactions are happening. I assume the CPU has a signal to indicate a locked operation is happening to prevent other accesses from getting in and mucking up the works.
>
>Is there a way to emulate this locking using semaphores? Someone I know is a big fan of Propeller CPUs which share memory and I don't know if they have such an instruction. They share memory by interleaving access.

In the algorithm I suggested, you could just put a mutex around the
sequence that emulates the CAS. That's safe, since the extension word
is never updated from inside an interrupt handler (unless you actually
intend for that to be possible, such as you were reading the extended
time value from inside and ISR). Even if that's slow, it's on a leg
of the code that will happen only rarely.

You still need the atomic read of the extension word (although that's
typically a non-issue, especially on a single hardware thread system).

David Brown

unread,

Feb 13, 2020, 3:57:44 AM2/13/20

to

On 13/02/2020 01:44, Rick C wrote:

> Ok, this is more clear now. Wikipedia explains LL/SC pretty well.
> This is actually for multiple CPUs as much as multitasking. While
> you can just disable interrupts (assuming you can live with the
> interrupt latency issues) to make this work with a single CPU, if you
> are sharing the data structure with other CPUs the bus requires
> locking while these multiple transactions are happening. I assume
> the CPU has a signal to indicate a locked operation is happening to
> prevent other accesses from getting in and mucking up the works.
>

Yes, cpus with CAS and other locked instructions (like atomic
read-modify-write sequences) need bus lock signals. These are quite
easy to work with from the software viewpoint, and a real PITA to
implement efficiently in hardware in a multi-core system with caches.
Thus you get them in architectures like x86 that are designed to be easy
to program, but not in RISC systems that are designed for fast and
efficient implementations.

CAS can be useful even on a single cpu, if you have multiple masters
(DMA, for example). And CAS or LL/SC can be useful on a single cpu if
you have pre-emptive multi-tasking and don't want to (or can't) disable
interrupts.

On a small processor like yours, disabling interrupts around critical
regions is almost certainly the easiest and most efficient solution.

(If I were making a cpu, I'd like to have a "temporary interrupt
disable" counter as well as a global interrupt disable flag. I'd have
an instruction to set this counter to perhaps 3 to 7 counts. That's
enough time to make a CAS, or an atomic read-modify-write.)

> Is there a way to emulate this locking using semaphores? Someone I
> know is a big fan of Propeller CPUs which share memory and I don't
> know if they have such an instruction. They share memory by
> interleaving access.
>

There is a whole field of possibilities with locking, synchronisation
mechanisms, and lock-free algorithms. Generally speaking, once you have
one synchronisation primitive, you can emulate any others using it - but
the efficiency can vary enormously.

George Neuner

unread,

Feb 13, 2020, 10:54:30 AM2/13/20

to

On Sun, 9 Feb 2020 12:17:42 +0100, David Brown

<david...@hesbynett.no> wrote:

>On 09/02/2020 07:35, upsid...@downunder.com wrote:
>> On Sat, 8 Feb 2020 19:57:48 +0100, David Brown
>> <david...@hesbynett.no> wrote:
>>
>>>> Never used NT, but I used W2k and it was great! W2k was widely
>>>> pirated so MS started a phone home type of licensing with XP which
>>>> was initially not well received, but over time became accepted. Now
>>>> people reminisce about the halcyon days of XP.
>>>
>>> Did you not use NT 4.0 ? It was quite solid. W2K was also good, but XP
>>> took a few service packs before it became reliable enough for serious use.
>>
>> NT 4.0 solid ??
>>
>> NT4 moved graphical functions to kernel mode to speed up window
>> updates.
>
>Yes. And that meant bugs in the graphics drivers could kill the whole
>system, unlike in NT 3.x. And bugs in the graphics drivers were
>certainly not unknown. However, with a little care it could run
>reliably for long times. I don't remember ever having a software or OS
>related crash or halt on our little NT 4 server.

Ditto.

I spent ~7 years in a small company as acting network admin in
addition to my regular development work. I watched over a pair of NT4
servers, a dozen NT4 workstations, and a handful of Win98 machines.

The NT servers never gave any problems. They ran 24/7 and were
rebooted only to replace a disk or install new software. We didn't
install all the service packs, so sometimes the servers would run for
more than a year without a reboot.

The workstations only rarely had problems despite being exposed to
software that was being developed on them. The machines ran 24/7 -
backups done after hours and on weekends. I can speak only to my own
experience as a developer: my workstation took a fair amount of abuse
from crashing and otherwise misbehaving software, but generally it was
rock solid and would run for months without something happening that
required a reboot to fix.

>> In general, each NT4 service pack introduced new bugs and soon the
>> next SP was released to correct the bugs introduced by the previous
>> SP. Thus every other SPs were actually usable.
>>
>> Even NT5 beta was more stable than NT4 with most recent SP. NT5 beta
>> was renamed Windows 2000 before final release.
>>
>
>I certainly liked W2K, and found it quite reliable. But I still
>remember NT 4.0 as good too.

In my experience, W2K was a bit flaky until SP2. After that, it
generally was stable.

Poster "upsidedown" (sorry, don't know your name) was right though
about the NT4 service packs. In my own experience:
- the initial OS release was a bit flaky
- SP1 was stable (at least for English speakers)
- SP2 was really flaky
- SP3 was stable
- SP4 was stable
- SP5 was a bit flaky
- SP6 was stable

I have been using Windows since 3.0 (which still ran DOS underneath).
I was quite happy with the reliability of NT4. I have had far more
problems with "more modern" versions: XP, Win7, and now Win10.

YMMV,
George

Bernd Linsel

unread,

Feb 13, 2020, 1:40:02 PM2/13/20

to

Rick C wrote:
> On Wednesday, February 12, 2020 at 4:44:13 PM UTC-5, robert...@yahoo.com wrote:
>
> Ok, this is more clear now. Wikipedia explains LL/SC pretty well. This is actually for multiple CPUs as much as multitasking. While you can just disable interrupts (assuming you can live with the interrupt latency issues) to make this work with a single CPU, if you are sharing the data structure with other CPUs the bus requires locking while these multiple transactions are happening. I assume the CPU has a signal to indicate a locked operation is happening to prevent other accesses from getting in and mucking up the works.
>
> Is there a way to emulate this locking using semaphores? Someone I know is a big fan of Propeller CPUs which share memory and I don't know if they have such an instruction. They share memory by interleaving access.
>
>

> Custom stack processor, related to the Forth VM. When designing FPGAs I want a CPU will deterministic timing, so 1 instruction = 1 clock cycle works well. Interrupt latency is zero or one depending on how you count it. Next cycle after an unmasked interrupt is asserted fetches the first instruction of the IRQ routine.
>
> The CPU is not pipelined but the registers are aligned through the architecture to make it decode-execute/fetch rather than fetch-decode-execute. The fetch only depends on flags and instruction decode so it happens in parallel with the execute as far as timing is concerned. Someone insisted this was pipelined design because of these parallel parts.
>
> It's nothing special, YAMC (Yet Another MISC CPU). I've never spent the time to optimize the design for speed. Instead I did some work to trying to hybridize the stack design with register-like access to the stack to minimize stack juggling. Once that happened, the number of instructions for the test case I was using (an IRQ for DDS calculations) dropped by either a third or half, I forget which. The big stumbling block for me is coming up with software to help write code for it. lol
>

One should mention that, at least in ARM and MIPS architectures, LL and
SC are not implemented with a global lock signal, but instead using
cache snooping (for uni- and multiprocessing systems).

LL just performs a simple load and additionally locks the (L1) data
cache line of that address (so that it cannot be replaced until SC or
another LL).

SC checks if data in that cache line has been modified since the last
LL; if so, it fails, otherwise it succeeds and writes the datum (whether
write-through or write-back is depended on CPU cache configuration and
the virtual address).

An SC instruction targeting an address that hasn't been a LL source
before always fails and invalidates all LL atomic flags, so that their
corresponding SC's will fail. Thus, an SC to a dummy address is
exploited to implement synchronization barriers (in addition to cache
sync instructions).

The possible number of concurrent LL/SC pairs depends on the CPU model,
most support only 1 pending SC after a LL, some allow up to 8 parallel
LL/SC pairs (from different cache lines).

Finally, an example: Emulated CAS on a MIPS32 CPU, works independed of
number of processors in the system:

// compare_and_swap
// input: a0 = unsigned *p, a1 = unsigned old, a2 = unsigned new
// returns: v0 = 1 (success) | 0 (failure), v1 = old value from *p

.set nomips16, nomicromips, noreorder, nomacro
compare_and_swap:
1: ll v1, 0(a0) // load linked from a0+0 in v1
bne v1, a1, 9f // if v1 != a1 (old),
// branch forward to label 9
move v0, zero // branch delay slot: load result 0
// executed "while" taking the branch

move v0, a2 // load a copy of a2 (new) into v0
sc v0, 0(a0) // store conditionally into a0+0
beq v0, zero, 1b // if unsuccessful (v0 == 0)
// retry at label 1
nop // branch delay slot: nothing to do

9: jr ra // else (v0 == 1) return v0
nop // jump delay slot: nothing to do

Ann.: This example could be further optimized for speed trading program
space, reordering the opcodes so that the preferred case (successful
CAS) executes linearly and branchless, forward branches likely not
taken, and backward branches likely taken (usable branch prediction has
only been introduced at MIPS R8).
L1 cache latency is usually 1 clock, when executing linear code, it is
hidden by prefetch and pipeline.
A L1 cache line is typically 64 bytes (16 words) wide, i.e. if the CPU
supports parallel LL/SCs, they must be at least 16 words apart,
otherwise the SC to the address of the first LL will always fail.

Regards,
Bernd