Bug with clock nanoseconds

44 views
Skip to first unread message

Manuel Coutinho

unread,
Mar 30, 2009, 12:41:09 PM3/30/09
to rtems...@rtems.com

Hi

 

We have discovered a bug of RTEMS relating to the clock. Even though we only tested it for the ERC32 board (for now), we believe it is common throughout all the targets.

 

The test consists in checking if the time given by rtems_clock_get is always increasing. The code is as follows:

 

 

rtems_task Init(rtems_task_argument argument) {

    struct timespec old;

    struct timespec new;

 

    rtems_clock_get_uptime(&old);

 

    while (1) {

        rtems_clock_get_uptime(&new);

 

        if (new.tv_sec < old.tv_sec)

            failedTestExit(TEST_CASE, 10, "seconds decrease");

 

        if (new.tv_sec == old.tv_sec) {

            if (new.tv_nsec < old.tv_nsec) {

                /* this is the error given */

                printk("            new = %d %d\n", new.tv_sec, new.tv_nsec);

                printk("            old = %d %d\n", old.tv_sec, old.tv_nsec);

 

                /* commented just to continue the test */

                /*failedTestExit(TEST_CASE, 10, "second equals but nanoseconds decreased");*/

            }

        }

 

        old = new;

 

        /* stop after 10 seconds */

        if (new.tv_sec > 10)

            sucessfullTestExit(TEST_CASE);

    }

 

    sucessfullTestExit(TEST_CASE);

}

 

 

This is the output we got using the SIS simulator:

            new = 0 50002000

            old = 0 59990000

            new = 0 90002000

            old = 0 99990000

            new = 0 100003000

            old = 0 109991000

            new = 0 120003000

            old = 0 129991000

            new = 0 160002000

            old = 0 169989000

            new = 0 210002000

            old = 0 219989000

            new = 0 260002000

            old = 0 269990000

            new = 0 300001000

            old = 0 309989000

            new = 0 320003000

            old = 0 329991000

            new = 0 360002000

            old = 0 369990000

            new = 0 370002000

            old = 0 379989000

            new = 0 380001000

            old = 0 389989000

            new = 0 420002000

            old = 0 429990000

            new = 0 470002000

            old = 0 479989000

            new = 0 500004000

            old = 0 509991000

            new = 0 520001000

            old = 0 529989000

            new = 0 550003000

            old = 0 559991000

            new = 0 600002000

            old = 0 609990000

            new = 0 650002000

            old = 0 659989000

            new = 0 670001000

            old = 0 679988000

            new = 0 690002000

            old = 0 699990000

            new = 0 730002000

            old = 0 739989000

            new = 0 770001000

            old = 0 779989000

            new = 0 790004000

            old = 0 799992000

            new = 0 800001000

            old = 0 809989000

            new = 0 840001000

            old = 0 849989000

            new = 0 860004000

            old = 0 869991000

            new = 0 870001000

            old = 0 879989000

            new = 0 890004000

            old = 0 899991000

            new = 0 900001000

            old = 0 909989000

            new = 0 920004000

            old = 0 929991000

            new = 0 930001000

            old = 0 939989000

            new = 0 950004000

            old = 0 959992000

            new = 0 960001000

            old = 0 969989000

            new = 1 40002000

            old = 1 49990000

            new = 1 60002000

 

 

As you can see, the problem is raised when a clock tick occurs: the time read in “new” is older than “old”. This cannot happen since “new” is updated AFTER “old”! This problem occurs if a clock interrupt is triggered after the number of clock ticks have been read but BEFORE the nanosecond field has been read. The resulting number of clock ticks is “small” (should be plus one).

 

We can use the interrupt pending CPU registers info to determine if, while we are reading the number of clock ticks that occurred and the nanoseconds, a clock interrupt could be pending.

 

We do not have enough knowledge to say that this solution can be used for all boards (determining if an interrupt is pending). In the SPARC architecture, at least, it is possible. If it is possible in all architectures, then a solution to RTEMS would be to change the code of the _TOD_Get_uptime function to:

 

 

void _TOD_Get_uptime(struct timespec *uptime) {

    ISR_Level level;

    struct timespec offset;

    volatile uint32_t pending;

 

    /* assume uptime checked by caller */

 

    offset.tv_sec = 0;

    offset.tv_nsec = 0;

 

    _ISR_Disable(level);

    *uptime = _TOD_Uptime;

    if (_Watchdog_Nanoseconds_since_tick_handler)

        offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();

 

    /* code added: */

    pending = isClockInterruptPending();

 

    _ISR_Enable(level);

 

    /* code added */

    /* if an interrupt occurred while interrupts were disabled and the nanoseconds is too little */

    /* means that a clock interrupt occurred BEFORE the nanoseconds were read */

    if (pending && offset.tv_nsec < rtems_configuration_get_microseconds_per_tick() / 2) {

        struct timespec clockTick = {0, rtems_configuration_get_microseconds_per_tick()*1000};

        _Timespec_Add_to(&offset, &clockTick); /* so add one clock tick to the offset */

    }

 

    /* else, the clock tick occurred AFTER the nanoseconds were read, so no problem */

 

    _Timespec_Add_to(uptime, &offset);

}

 

 

At least, with these modifications, the test passes :)!

 

Kind regards

Manuel Coutinho

Joel Sherrill

unread,
Mar 30, 2009, 12:49:10 PM3/30/09
to Manuel Coutinho, rtems...@rtems.com
Manuel Coutinho wrote:
>
> Hi
>
>
>
> We have discovered a bug of RTEMS relating to the clock. Even though
> we only tested it for the ERC32 board (for now), we believe it is
> common throughout all the targets.
>

No you have discovered a bug in the ERC32's get nanoseconds since last
tick handler.
Apparently it doesn't recognize that is has been longer than a tick
since the last clock
tick. It should determine that is has overflowed and return 1 clock
tick + partial clock
tick.

It is returning the wrong answer.

>
>
> The test consists in checking if the time given by rtems_clock_get is
> always increasing. The code is as follows:
>
>
>

> .....


....


--
Joel Sherrill, Ph.D. Director of Research & Development
joel.s...@OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985


_______________________________________________
rtems-users mailing list
rtems...@rtems.com
http://rtems.rtems.org/mailman/listinfo/rtems-users

Aitor Viana

unread,
Mar 31, 2009, 3:30:59 AM3/31/09
to Joel Sherrill, rtems...@rtems.com
Just tried in tsim-leon and happens the same.

I didn't completely understand what's the problem Manuel could you explain to me again? is early in the morning :D

Best regards,

Aitor

Manuel Coutinho

unread,
Mar 31, 2009, 4:30:26 AM3/31/09
to Aitor Viana, Joel Sherrill, rtems...@rtems.com

Hi. Thanks for testing in leon2…was just about to do that :)!

 

Basically, the test aims to see that the clock is always increasing. But the result is that, sometimes, it decreases.

This happens when a clock tick occurs.

 

The RTEMS algorithm to read the time consists of:

 

1. Disabling interrupts

2. Read how many clock interrupts occurred

3. Read nanoseconds elapsed

4. Enable interrupts

 

The problem occurs if an interrupt occurs between the 1 and 3 steps: -> the number of clock ticks is NOT updated (because interrupts are disabled) and the nanoseconds field (which is read directly from the CPU registers) is restarted.

 

For example, the clock is 2 clock ticks and 99999 us (a clock tick is just about to occur). Suppose you disable interrupts, read 2 clock ticks, then the CPU clock counter overflows and the nanoseconds is reset to 0 and you read 0 nanoseconds.

You end up reading 2 clock ticks and 0 nanoseconds when in fact you should read 3 clock ticks and 0 nanoseconds - you cannot stop the clock counter (nor do you wish to!).

 

Hope this explains the problem a little bit better :)

 

By the way, by glancing at the code of the remaining BSPs, specifically at the Clock_driver_nanoseconds_since_last_tick macro, I don’t believe this is a problem just with SPARC. I don’t see this problem address in any board. But, of course, I might be wrong since it was just a very brief look.

 

Kind regards

Manuel Coutinho

 


Aitor Viana

unread,
Mar 31, 2009, 5:09:28 AM3/31/09
to Manuel Coutinho, rtems...@rtems.com
Yep,

crystal clear. BTW, isClockInterruptPending() is not present in the RTEMS distribution, it should be implemented for all BSPs I guess.

Best regards,

Aitor

Manuel Coutinho

unread,
Mar 31, 2009, 5:22:00 AM3/31/09
to Aitor Viana, rtems...@rtems.com

I’m not sure that the isClockInterruptPending() is a function that can be available to all BSPs. Some CPU might not support this. I don’t know :(.

In either case, here’s the function for the ERC32 board (I know, the names can be improved):

 

#define INTERRUPT_PENDING_ADDRESS 0x01F80048

 

/**

 * RTC interrupt level

 **/

#define RTC_ISR_INTERRUPT_LEVEL 13

 

#define RTC_INTERRUPT_PENDING_MASK (1 << RTC_ISR_INTERRUPT_LEVEL)

 

uint32_t isClockInterruptPending() {

    uint32_t *pointer_pending = (uint32_t *) INTERRUPT_PENDING_ADDRESS;

    return pending (*pointer_pending ) & RTC_INTERRUPT_PENDING_MASK;

Aitor Viana

unread,
Mar 31, 2009, 5:56:51 AM3/31/09
to Manuel Coutinho, rtems...@rtems.com
The current solucion also gets a problem because really depends on the CONFIGURE_MICROSECONDS_PER_TICK parameter. If this value is small enough it is possible to get another overrun of tv_nsec while executing

 /* code added */

    /* if an interrupt occurred while interrupts were disabled and the nanoseconds is too little */

    /* means that a clock interrupt occurred BEFORE the nanoseconds were read */

    if (pending && offset.tv_nsec < rtems_configuration_get_microseconds_per_tick() / 2) {

        struct timespec clockTick = {0, rtems_configuration_get_microseconds_per_tick()*1000};

        _Timespec_Add_to(&offset, &clockTick); /* so add one clock tick to the offset */

    }


I see that _TOD_Uptime is updated every clock tick in cpukit/score/src/coretodtickle.c (see bellow) with microseconds precision. It is not enough with the next patch?


=== modified file 'cpukit/score/src/coretodgetuptime.c'

--- cpukit/score/src/coretodgetuptime.c 2009-03-31 07:56:38 +0000

+++ cpukit/score/src/coretodgetuptime.c 2009-03-31 09:49:25 +0000

@@ -20,6 +20,7 @@

 #include <rtems/score/isr.h>

 #include <rtems/score/timespec.h>

 #include <rtems/score/tod.h>

+#include <rtems/config.h>


 /*

  *  _TOD_Get_uptime

@@ -46,8 +47,8 @@


   _ISR_Disable( level );

     *uptime = _TOD_Uptime;

-    if ( _Watchdog_Nanoseconds_since_tick_handler )

-      offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();

+//    if ( _Watchdog_Nanoseconds_since_tick_handler )

+//      offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();


   _ISR_Enable( level );


Manuel Coutinho

unread,
Mar 31, 2009, 6:20:07 AM3/31/09
to Aitor Viana, rtems...@rtems.com

This solution is actually very robust. Only if the number of microseconds per tick is very small (~10 us) should you get problems. This is, if two clock interrupts can occur while interrupts are disabled. But we consider this to be impossible.

 

Remember that the clock ticks and nanoseconds are read while interrupts are disabled. So after the disabling of interrupts at most only one clock tick could occur.

 

The _TOD_Tickle_ticks only states that one more clock tick occurred. The nanoseconds field is not updated. Haven’t tested, but don’t believe that just by commenting those lines the problem will be solved.

Joel Sherrill

unread,
Mar 31, 2009, 6:39:54 AM3/31/09
to Aitor Viana, rtems...@rtems.com
Aitor Viana wrote:
> Yep,
>
> crystal clear. BTW, isClockInterruptPending() is not present in the
> RTEMS distribution, it should be implemented for all BSPs I guess.
>
I repeat. It is the responsibility of the clock tick device driver
to reliably and correctly report the number of nanoseconds since
the last time rtems_clock_tick was called. If that was slightly
greater than 1 tick ago, then it needs to return a number of nanoseconds
slightly greater than 1 tick.

Any code modifications outside the device driver are not correct and
will not address all cases of using this extension.

This might be a common issue with get nanoseconds handlers but it will
NOT be addressed in this way. Check the is pending bit or that the
counter overflowed in the get nanoseconds handler and return the correct
number.

--joel

> ------------------------------------------------------------------------
>
> *From:* ait...@googlemail.com <mailto:ait...@googlemail.com>
> [mailto:ait...@googlemail.com <mailto:ait...@googlemail.com>]
> *On Behalf Of *Aitor Viana
> *Sent:* Tuesday, March 31, 2009 8:31 AM
> *To:* Joel Sherrill
> *Cc:* Manuel Coutinho; rtems...@rtems.com
> <mailto:rtems...@rtems.com>
> *Subject:* Re: Bug with clock nanoseconds

> rtems...@rtems.com <mailto:rtems...@rtems.com>
> http://rtems.rtems.org/mailman/listinfo/rtems-users

Manuel Coutinho

unread,
Mar 31, 2009, 6:59:15 AM3/31/09
to Joel Sherrill, Aitor Viana, rtems...@rtems.com
Yes, I agree :)

> -----Original Message-----
> From: Joel Sherrill [mailto:joel.s...@oarcorp.com]
> Sent: Tuesday, March 31, 2009 11:40 AM
> To: Aitor Viana
> Cc: Manuel Coutinho; rtems...@rtems.com

> > Hi. Thanks for testing in leon2.was just about to do that :)!

Manuel Coutinho

unread,
Mar 31, 2009, 7:30:20 AM3/31/09
to Joel Sherrill, Aitor Viana, rtems...@rtems.com
Here are the functions for ERC32 and Leon2 - ERC32 and Leon2 already have
the is_interrupt_pending functions :D :

ERC32:

uint32_t bsp_clock_nanoseconds_since_last_tick(void) {
uint32_t clicks;

clicks = ERC32_MEC.Real_Time_Clock_Counter;

/* Down counter */
clicks = (uint32_t) (BSP_Configuration.microseconds_per_tick - clicks) *
1000;

/* if an interrupt occurred while interrupts were disabled and the
nanoseconds is too little */
/* means that a clock interrupt occurred BEFORE the nanoseconds were
read */

if (ERC32_Is_interrupt_pending(ERC32_INTERRUPT_REAL_TIME_CLOCK) &&
clicks < BSP_Configuration.microseconds_per_tick / 2)
clicks += BSP_Configuration.microseconds_per_tick * 1000;

return clicks;
}


LEON2:

uint32_t bsp_clock_nanoseconds_since_last_tick(void) {
uint32_t clicks;

clicks = LEON_REG.Timer_Counter_1;

/* Down counter */
clicks = (uint32_t) (BSP_Configuration.microseconds_per_tick - clicks) *
1000;

/* if an interrupt occurred while interrupts were disabled and the
nanoseconds is too little */
/* means that a clock interrupt occurred BEFORE the nanoseconds were
read */

if (LEON_Is_interrupt_pending(LEON_INTERRUPT_TIMER1) && clicks <
BSP_Configuration.microseconds_per_tick / 2)
clicks += BSP_Configuration.microseconds_per_tick * 1000;

return clicks;
}


I've tested and the result seams ok :)
Kind regards
Manuel Coutinho

Aitor.Via...@esa.int

unread,
Mar 31, 2009, 8:06:09 AM3/31/09
to Joel Sherrill, rtems...@rtems.com


> I repeat.  It is the responsibility of the clock tick device driver
> to reliably and correctly report the number of nanoseconds since
> the last time rtems_clock_tick was called.  If that was slightly
> greater than 1 tick ago, then it needs to return a number of nanoseconds
> slightly greater than 1 tick.


I agree but...

I think bsp_clock_nanoseconds_since_last_tick() for LEON2 and ERC32 are well implemented. They return the precise amount of nanoseconds since last tick and I dont really see the proposed workaround (see bellow)

LEON2:

uint32_t bsp_clock_nanoseconds_since_last_tick(void) {
   uint32_t clicks;

   clicks = LEON_REG.Timer_Counter_1;

   /* Down counter */
   clicks = (uint32_t) (BSP_Configuration.microseconds_per_tick - clicks) *
1000;

   /* if an interrupt occurred while interrupts were disabled and the
nanoseconds is too little */
   /* means that a clock interrupt occurred BEFORE the nanoseconds were
read */
   if (LEON_Is_interrupt_pending(LEON_INTERRUPT_TIMER1) && clicks <
BSP_Configuration.microseconds_per_tick / 2)
       clicks += BSP_Configuration.microseconds_per_tick * 1000;

   return clicks;
}

This implementation does not make sense without the framework of the _TOD_Get_uptime() function where the problem is an interrupt between getting _TOD_Uptime and getting nanoseconds since last tick.

Moreover, CONFIGURE_MICROSECONDS_PER_TICK  defines the RTEMS time granularity so I dont see the point making this ad-hoc modification in bsp_clock_nanoseconds_since_last_tick() function to make _TOD_Get_Uptime() work properly when we can do it applying this simple patch

=== modified file 'cpukit/score/src/coretodgetuptime.c'
--- cpukit/score/src/coretodgetuptime.c 2009-03-31 07:56:38 +0000
+++ cpukit/score/src/coretodgetuptime.c 2009-03-31 11:58:51 +0000
@@ -46,8 +46,8 @@

   _ISR_Disable( level );
     *uptime = _TOD_Uptime;
-    if ( _Watchdog_Nanoseconds_since_tick_handler )
-      offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();
+//    if ( _Watchdog_Nanoseconds_since_tick_handler )
+//      offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();

   _ISR_Enable( level );

loosing no resolution.

BTW this patch is tested and working.


Joel Sherrill

unread,
Mar 31, 2009, 8:51:09 AM3/31/09
to Aitor.Via...@esa.int, rtems...@rtems.com
It is correct. RTEMS asked the BSP a very simple question:

"How many nanoseconds has it been since you last called rtems_clock_tick?"

The driver was answering this question with the "number of nanoseconds
since the last counter hit zero" not since it last called rtems_clock_tick.

Most of the time the answers are the same but there is a small window
where it is not and this case highlights that.


> Moreover, CONFIGURE_MICROSECONDS_PER_TICK defines the RTEMS time
> granularity so I dont see the point making this ad-hoc modification in
> bsp_clock_nanoseconds_since_last_tick() function to make
> _TOD_Get_Uptime() work properly when we can do it applying this simple
> patch
>

You are losing the ability to have the timestamp include "partial ticks".
If you call get uptime 10 times in a row, you will likely get only 1 or two
(if a tick occurs) distinct values. Getting nanosecond accurate timestamps
is the entire functional goal of the handler being called.


> === modified file 'cpukit/score/src/coretodgetuptime.c'
> --- cpukit/score/src/coretodgetuptime.c 2009-03-31 07:56:38 +0000
> +++ cpukit/score/src/coretodgetuptime.c 2009-03-31 11:58:51 +0000
> @@ -46,8 +46,8 @@
>
> _ISR_Disable( level );
> *uptime = _TOD_Uptime;
> - if ( _Watchdog_Nanoseconds_since_tick_handler )
> - offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();
> +// if ( _Watchdog_Nanoseconds_since_tick_handler )
> +// offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();
>
> _ISR_Enable( level );
>
> loosing no resolution.
>
> BTW this patch is tested and working.
>
>

--
Joel Sherrill, Ph.D. Director of Research & Development
joel.s...@OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985


_______________________________________________
rtems-users mailing list
rtems...@rtems.com

http://rtems.rtems.org/mailman/listinfo/rtems-users

Joel Sherrill

unread,
Mar 31, 2009, 8:51:57 AM3/31/09
to Aitor.Via...@esa.int, rtems...@rtems.com
FWIW I would like to here from Jiri or someone else
at Gaisler. And this needs a PR to track the patches
for each BSP.

--joel

Aitor.Via...@esa.int

unread,
Mar 31, 2009, 9:38:40 AM3/31/09
to Joel Sherrill, rtems...@rtems.com, rtems-use...@rtems.org

> It is correct.  RTEMS asked the BSP a very simple question:
>
> "How many nanoseconds has it been since you last called rtems_clock_tick?"
>
> The driver was answering this question with the "number of nanoseconds
> since the last  counter hit zero" not since it last called rtems_clock_tick.


In this case I think is correct as the "counter hit zero" event corresponds to the "last called rtems_clock_tick()"

> You are losing the ability to have the timestamp include "partial ticks".
> If you call get uptime 10 times in a row, you will likely get only 1 or two
> (if a tick occurs) distinct values.  Getting nanosecond accurate timestamps
> is the entire functional goal of the handler being called.


Yes that's completely true. I want to ask something. The problem is caused because _TOD_Get_uptime() function is performing...

*uptime = _TOD_Uptime
if( _Watchdog_Nanoseconds_since_tick_handler )
offset.tv_nsec = (*_Watchdog_Nanoseconds_since_tick_handler)();


and if a clock tick interrupt occurs in between the consistency between the uptime and offset values is lost. Should not be the _TOD_Get_uptime() function preventing this?
Reply all
Reply to author
Forward
0 new messages