[En-Nut-Discussion] systime limit reduces valid uptime to 49 days

5 views
Skip to first unread message

Michael Müller

unread,
Nov 6, 2007, 11:10:41 AM11/6/07
to en-nut-d...@egnite.de
Hi,

I was quite shocked when looking at the code parts calculating the
system time. If I did not calculate complete rubbish it seems as if
there is a time overflow after 49 days! At least using Ethernut 1.3 with
32bit long variables. The comment at the head of NutGetMillis(void)
function tells about a maximum systime of 8 years. It seems to refer to
the old systick of 62ms instead of the current default value of 1ms.

timer.c
=======

u_long NutGetTickCount(void);

...

u_long NutGetSeconds(void)
{
return NutGetTickCount() / NutGetTickClock();
}

ostimer.c
=========
u_long NutGetTickClock(void)
{
return NUT_TICK_FREQ; /* = 1024 */
}

This reduces the maximum count of seconds to

2^32 / 1000 [s] = 49days!

Are there any suggestions how to handle this?
- Change the systick to 62ms again (was the reason just more precision
for the "user application" or was it useful / necessary for the OS, too?) ?
- Increase the tick variable of NutOS to a 64bit type
(unsigned long long)?


Best regards
Michael


_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Bernard Fouché

unread,
Nov 6, 2007, 1:43:39 PM11/6/07
to Ethernut User Chat (English)
Another possibility beside moving the tick counter to 64 bits:
periodically 'reduce' tick counters to never overflow.

For instance every day (or every 49 days!) the 32 bits tick count is set
back to zero, an other variable incremented and when one needs the time
in ms, a system function will recalculate a 64 bits value to return
including the current tick count plus this 'other variable'. So
NutGetTickCount() would return a 64 bits value, but work mostly in 32
bits in the interrupt code, so Ethernuts running with ATMEGAs won't be
impacted too much. (having these to count in 64 bits would be a big
penalty since the MCU core is mostly 8 bits)

But sure this 49 days limit is a trap for long time running devices,
some function may never timeout and stuck the whole thing. Currently I
have apps that need 1ms precision (or about) to be able to trigger I2C
operations, so moving to a tick every 62ms would be bad, but these apps
usually don't run 49 days in a row (and I use comparisons that handle
the 32 bits overflow since I have no feature that require a 49 days timer)

Yet another dirty trick: when the application considers to be idle, have
it reboot if the tick counter is over a particular value...

Bernard

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Michael Fischer

unread,
Nov 6, 2007, 2:50:34 PM11/6/07
to en-nut-d...@egnite.de
Hello Michael,

where is the problem that the timer will overflow after 49 days?
If NutOS use the correct functionality inside to check the timer
(knowing that an overflow can occur) it should be no problem.

Best regards,

Michael
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Michael Müller

unread,
Nov 7, 2007, 5:08:41 AM11/7/07
to Ethernut User Chat (English)
Hi Michael,

in addition to my first sent code parts the interrupt function doesn´t
look as if there would be any kind of overflow handling implemented.
(everything based on NutOS 4.4.0)

/*!
* \brief System timer interrupt handler.
*/
#if !(defined (__linux__) || defined(__APPLE__) || defined(__CYGWIN__))
#ifdef USE_TIMER
SIGNAL( SIG_TIMER )
#else
static void NutTimerIntr(void *arg)
#endif
{
nut_ticks++;
// nut_tick_dist[TCNT0]++;
}
#endif

To finally test this phenomenon I wrote a little test program. It first
of all sets the tick counter to one minute before overflow. Then it sets
the time by stime and outputs the time every second:


07.11.2007|07:00:00 tick count: 4294907393
07.11.2007|07:00:01 tick count: 4294908419
07.11.2007|07:00:02 tick count: 4294909445
07.11.2007|07:00:03 tick count: 4294910471
...
07.11.2007|07:00:54 tick count: 4294962797
07.11.2007|07:00:55 tick count: 4294963823
07.11.2007|07:00:56 tick count: 4294964849
07.11.2007|07:00:57 tick count: 4294965875
07.11.2007|07:00:58 tick count: 4294966901
19.09.2007|18:55:55 tick count: 631
19.09.2007|18:55:56 tick count: 1657
19.09.2007|18:55:57 tick count: 2683
19.09.2007|18:55:58 tick count: 3709
19.09.2007|18:55:59 tick count: 4735
19.09.2007|18:56:00 tick count: 5761
...


I can´t explain the date as I expected something far more in the past
but anyway the result is invalid.

Best regards
Michael

Michael Fischer schrieb:

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Harald Kipp

unread,
Nov 7, 2007, 6:03:04 AM11/7/07
to Ethernut User Chat (English)
Michael Müller schrieb:

> Hi,
>
> I was quite shocked when looking at the code parts calculating the
> system time.
Me too, at least a bit. But, as other people here already pointed out,
this is not a general problem. Timeout calculations should still work
during overflows.

> The comment at the head of NutGetMillis(void)
> function tells about a maximum systime of 8 years. It seems to refer to
> the old systick of 62ms instead of the current default value of 1ms.
>

Indeed, this info is outdated. Thanks for bringing this to our
attention. (Anyone out there to add a bug report at SourceForge or fix
it immediately?)

> Are there any suggestions how to handle this?
> - Change the systick to 62ms again (was the reason just more precision
> for the "user application" or was it useful / necessary for the OS, too?) ?
> - Increase the tick variable of NutOS to a 64bit type
> (unsigned long long)?
>

Initially Nut/OS runs on 3.68 MHz systems and the timer interrupt
handled a lot of things, so it was set to 62.5ms. AVRs became faster and
the change to 1ms had been mainly done to provide finer granularity for
time out values.

My preference would be a solution, which

1. avoids the 64 bit type long long, because it is not supported by all
compilers and may result in a porting nightmare.
2. avoids any additional code running in interrupt context. Such
additional code will increase interrupt latency. See also Bernard's posting.

Actually the problem is with the calendar functions on boards w/o RTC
chip. Thus, the ideal solution I can think of, would be an additional
time_t variable, which holds the number of seconds since the epoch. This
variable may be updated when calling NutGetSeconds() or similar
functions, or in the idle thread. The latter has the disadvantage, that
it may run the update too often. Doing the update in a timer query
routine seems to be the most economical solution, but requires, that the
application calls NutGetSeconds(), time() or similar at least once
within 24 days.

Btw. I do not think, that it is a good idea to reset nut_ticks, because
it will interfere with running time outs.

Harald

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Harald Kipp

unread,
Nov 7, 2007, 6:13:27 AM11/7/07
to Ethernut User Chat (English)
Michael Müller schrieb:

> 07.11.2007|07:00:58 tick count: 4294966901
> 19.09.2007|18:55:55 tick count: 631
>
>
> I can´t explain the date as I expected something far more in the past
> but anyway the result is invalid.
>
Why further in the past? You explained earlier, that the overflow takes
place after 49 days. Without re-reading my HP Calculator Manual on how
to do calendar calculations, it looks like 49 days to me.

As we all know, time and date calculations can be quite mind boggling. :-)

Harald

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Michael Müller

unread,
Nov 7, 2007, 8:35:10 AM11/7/07
to Ethernut User Chat (English)
Hi all,

I wrote a little workaround for the discussed overflow problem. It is
quite special for the default configuration to avoid too many
calculations with every function call. It also prevents larger variable
types and the change if system tick.

/* 2^32 ( sizeof(long) ) / 1024 (ticks / s) */
#define TICK_OVERFLOW_FACTOR 0x400000

u_long NutGetSeconds(void)
{
static int tickOverflowCounter = 0;
static u_long lastTickCount = 0;
u_long currentTickCount;

currentTickCount = NutGetTickCount();

/* overflow occurred? */
if ( lastTickCount > currentTickCount )
tickOverflowCounter++;

lastTickCount = currentTickCount;

return ( NutGetTickCount() / NutGetTickClock() ) +
( tickOverflowCounter * TICK_OVERFLOW_FACTOR );
}

My former described test program survived the overflow this way. I
assume that NutGetSeconds is called at least once every 49days.

@Harald: yes the calculation of unix time vs sys ticks and everything
around is quite mind bending ;-) I somehow kept 1970 in mind...

Another affected part (I hope the only one - just trusted in the
reference search function of Eclipse) is the NutTimerProcessElapsed
function. I think it is useful to consider the overflow here, too.

void NutTimerProcessElapsed(void)
{
NUTTIMERINFO *tn;
u_long ticks;
u_long ticks_new;

// calculate ticks since last call
ticks = NutGetTickCount();

/* overflow in ticks? */
if ( nut_ticks_resume > ticks )
{
/* hope it survives the compiler optimization */
/* little trick to stay inside 32bit variable space */
ticks_new = (0xffffffff - nut_ticks_resume);
ticks_new += ticks + 1;
}
else
{
ticks_new = ticks - nut_ticks_resume;
}

nut_ticks_resume = ticks;
...


Best regards
Michael


Harald Kipp schrieb:

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Alain M.

unread,
Nov 8, 2007, 8:57:58 AM11/8/07
to Ethernut User Chat (English)
Hi Harald,

(I am top-answering because I will not comment item-by-item)

I agre 100% with you, I use this arangement for many years:
- one long variable with a milisencond counter
- one time_t with the correct time in seconds since 1970.

Let me include an extra explanation:
The milisencons overflow works ok because of the way "C" represents a
long int: if you make a *subtraction* between two numbers before and
after the overflow or if you add some other long int number, the result
*will*not*be*afected*

BUT FOR THIS TO WORK you should never make comparisons between numbers
but you have to calculate the time diff and test if it is positive or
negative. Example:
if ( (now_ms - end_time_ms) >=0) // this works
if ( now_ms >= end_time_ms ) // this will fail at overflow

Alain

Harald Kipp escreveu:

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Reply all
Reply to author
Forward
0 new messages