is ATOMIC_BLOCK preferable to cli/sti?

1,268 views
Skip to first unread message

ro...@mail.com

unread,
Sep 25, 2015, 9:49:22 AM9/25/15
to Developers
i've been working on some small changes to HardwareSerial.cpp to make access to the Rx buffer indexes atomic when the Rx buffer is >256 bytes long, and it has been suggest that i should be using

#if (SERIAL_RX_BUFFER_SIZE>256) ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
  // do the atomic //
  // stuff in here //
}

instead of:

#if (SERIAL_RX_BUFFER_SIZE>256)
  uint8_t oldSREG = SREG;
  cli();
#endif 
// do the atomic //
// stuff in here //
#if (SERIAL_RX_BUFFER_SIZE>256)
  SREG = oldSREG;
#endif

is there any 'official' stance on which method is preferred? i did a quick check of the arduino IDE source code and found very little use of ATOMIC_BLOCK anywhere.


also, i see these lines at the end of Serial.begin are:
132|  sbi(*_ucsrb, RXEN0);
133|  sbi(*_ucsrb, TXEN0);
134|  sbi(*_ucsrb, RXCIE0);
135|  cbi(*_ucsrb, UDRIE0);
136|}

question: should line 135 in fact be cbi or is this an error? my query is based solely upon line 135 breaking the pattern, with no understanding of what exactly the code is doing!


cheers,
rob   :-)

Andrew Kroll

unread,
Sep 25, 2015, 1:11:17 PM9/25/15
to devel...@arduino.cc
ATOMIC_BLOCK is a fancy macro for it that allows you to do nice things without forgetting... like, restoring the previous state, and whatnot.

--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.



--
Visit my github for awesome Arduino code @ https://github.com/xxxajk

Gabriel Staples

unread,
Sep 25, 2015, 1:13:52 PM9/25/15
to devel...@arduino.cc
To answer just the part about the atomic block:

ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
  // do the atomic //
  // stuff in here //
}

Is exactly identical to:

uint8_t oldSREG = SREG;
cli();
//do the atomic stuff in here
SREG = oldSREG;

In the second option, you can even add in braces if you want, like so, just to make it look like the first option:

uint8_t oldSREG = SREG;
cli();
{
  //do the atomic stuff in here
}
SREG = oldSREG;


Which method you use is just a matter of personal preference and readability. The 1st option is simply a clever combination of macros and inline functions. Perhaps the 1st option is more readable and self-documenting.

Using the ATOMIC_BLOCK option requires including the <util/atomic.h> header file.


Let's walk through the source code. I'll put source code in blue.

Line 205 to 206:
#define ATOMIC_BLOCK(type) for ( type, __ToDo = __iCliRetVal(); __ToDo ; __ToDo = 0 )

Lines 48 to 52:
static __inline__ uint8_t __iCliRetVal(void)
{
 cli();
 return 1;
}

Line 244 to 245:
#define ATOMIC_RESTORESTATE uint8_t sreg_save __attribute__((__cleanup__(__iRestore))) = SREG

__attribute__ is a gcc directive, placed on the variable sreg_save in this case. __attribute__ is documented here: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes

__cleanup__(cleanup_function) is the same thing as cleanup(cleanup_function), which is explained in the link above. Basically, when the variable sreg_save goes out of scope, the cleanup_function is automatically called, and a pointer to the variable is automatically passed as a function to the cleanup function.

The cleanup function, __iRestore, is defined on Lines 68 to 72:
static __inline__ void __iRestore(const uint8_t *__s)
{
 SREG = *__s;
 __asm__ volatile ("" ::: "memory");
}

I don't understand the __asm__ volatile thing at the moment (I'm not yet versed in assembly language), but SREG = *__s is basically the same thing as doing SREG = sreg_save, since the cleanup function attribute was placed directly after, and therefore is applied to, the sreg_save variable above.

Furthermore, since sreg_save is defined inside the for loop (follow the macros and you'll see it), its scope ends when the for loop ends.

So, let's combine it together.

Place ATOMIC_RESTORESTATE into ATOMIC_BLOCK, like this:
ATOMIC_BLOCK(ATOMIC_RESTORESTATE)

and you get: 
ATOMIC_BLOCK(uint8_t sreg_save __attribute__((__cleanup__(__iRestore))) = SREG)

Now, expand the ATOMIC_BLOCK macro and you get:
for (uint8_t sreg_save __attribute__((__cleanup__(__iRestore))) = SREG, __ToDo = __iCliRetVal(); __ToDo ; __ToDo = 0)

Apply the cleanup __attribute__ and you get a call to the __iRestore cleanup function when sreg_save goes out of scope, as follows:
for (uint8_t sreg_save = SREG, __ToDo = __iCliRetVal(); __ToDo ; __ToDo = 0)
  //for loop scope ends
SREG = *pointer_to_sreg_save;
__asm__ volatile ("" ::: "memory");

__iCliRetVal() disables interrupts (via cli()) and returns 1, so you get:
for (uint8_t sreg_save = SREG, __ToDo = 1; __ToDo ; __ToDo = 0)
  //for loop scope ends
SREG = *pointer_to_sreg_save;
__asm__ volatile ("" ::: "memory");

Now, the comma before __ToDo means that __ToDo is also a variable of type uint8_t, just like sreg_save. It is initialized to 1, then tested to see if it is 1, then set to 0 after the first iteration. The for loop ends now that __ToDo is false. The for loop runs a single time. It was simply used in the macro as a very clever way to force variable scoping, so that sreg_save would go out of scope after the end of the for loop, so that the cleanup function could restore interrupts to their previous state.

So, combining it all together, this code:
ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
  //do atomic stuff here
}

turns into:

for (uint8_t sreg_save = SREG, __ToDo = 1; __ToDo; __ToDo = 0)
{
  //do atomic stuff here
SREG = sreg_save;

Therefore, the two techniques are identical, but I'd argue that using the ATOMIC_BLOCK macro is more readable...but I haven't seen it used at all yet in the Arduino core code either.



Thank you!

Sincerely, 

Gabriel Staples


On Fri, Sep 25, 2015 at 9:49 AM, ro...@mail.com <ro...@mail.com> wrote:

--

Gabriel Staples

unread,
Sep 25, 2015, 1:44:17 PM9/25/15
to devel...@arduino.cc
Minor corrections:

" and a pointer to the variable is automatically passed as a function parameter to the cleanup function."

And on the final result, I need to ensure it's clear:

This code:
ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
  //do atomic stuff here
}

turns into:

for (uint8_t sreg_save = SREG, __ToDo = 1; __ToDo; __ToDo = 0)
{
  cli();
  //do atomic stuff here
SREG = sreg_save;

I forgot the highlighted portion above. It is done in the calling of __iCliRetVal(), which also sets __ToDo to 1.

And we can go one more step. The for loop is just a single iteration, so now we have essentially:

uint8_t sreg_save = SREG;
cli();
{
  //do atomic stuff here
}
SREG = sreg_save;

...which brings us full circle. 

Furthermore, I believe this is true: so long as you are using any compiler optimization other than -O0 (optimization off), there is no overhead whatsoever for using these ATOMIC_BLOCK macros.

Tom Igoe

unread,
Sep 25, 2015, 1:48:13 PM9/25/15
to Arduino Developers

What's the memory footprint difference between the two, if the atomic block function includes an additional library?

sent on the go. please excuse brevity and mistaken auto corrections

Álvaro Lopes

unread,
Sep 25, 2015, 1:51:46 PM9/25/15
to devel...@arduino.cc, Gabriel Staples
On 25/09/15 18:44, Gabriel Staples wrote:
> turns into:
>
> for (uint8_t sreg_save = SREG, __ToDo = 1; __ToDo; __ToDo = 0)
> {
> cli();
> //do atomic stuff here
> }
> SREG = sreg_save;

Does not look correct: sreg_save is local to "for" loop, and not accessible outside. Also, "__ToDo" looks like it has an external scope.

Why not something like:

do {
uint8_t sreg_save = SREG;
cli();
do {
// Do atomic stuff here
} while (0);
SREG = sreg_save;
} while (0);


Alvie

Brian Cook

unread,
Sep 25, 2015, 2:03:39 PM9/25/15
to devel...@arduino.cc
Gabriel,

To answer just the part about the atomic block:

ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
  // do the atomic //
  // stuff in here //
}

Is exactly identical to:

uint8_t oldSREG = SREG;
cli();
//do the atomic stuff in here
SREG = oldSREG;

Incorrect.  Put a return inside.  The first works correctly.  The second does not.

In other words, ATOMIC_BLOCK guarantees no surprises.

- Brian

Brian Cook

unread,
Sep 25, 2015, 2:04:47 PM9/25/15
to devel...@arduino.cc

The optimizer does an excellent job of ensuring there is no difference.

Paul Stoffregen

unread,
Sep 25, 2015, 2:10:30 PM9/25/15
to devel...@arduino.cc
On 09/25/2015 10:48 AM, Tom Igoe wrote:

What's the memory footprint difference between the two, if the atomic block function includes an additional library?


ATOMIC_BLOCK with ATOMIC_RESTORESTATE and saving of SREG with cli() will compile to exactly the same code, if used in a similar structure where execution "falls through" to the end.

ATOMIC_BLOCK allows returning or leaving the atomic area in other ways than executing to the end.  Of course, that can also be done by explicitly manipulating the registers, but you have to be careful to add the proper SREG restore code at all exit points.  People who prefer ATOMIC_BLOCK usually feel safer delegating this responsibility to the compiler.

Even though you asked only for the "difference" (which is zero), both techniques have a "memory footprint" of one 8 bit local variable.  Typically the compiler will allocate that variable to a register.  But if the code inside the atomic section is large and complex, the compiler may temporarily push that variable onto the stack.

It should be mentioned both of these are AVR specific techniques.

Tom Igoe

unread,
Sep 25, 2015, 3:28:36 PM9/25/15
to Arduino Developers

Thanks Paul. The avr-specificity with be problematic in a higher level api designed to be portable across cores. Is that a problem in this case?

sent on the go. please excuse brevity and mistaken auto corrections

Mikael Patel

unread,
Sep 25, 2015, 4:59:51 PM9/25/15
to Developers
In Cosa a lot of effort is put into readability and maintainability. I have defined the following inline functions and macro to give a modern flavor atomic blocks.

inline uint8_t lock()
{
  uint8_t key = SREG;
  __asm__ __volatile__("cli" ::: "memory");
  return (key);
}

inline void unlock(uint8_t key)
{
  SREG = key;
  __asm__ __volatile__("" ::: "memory");
}

#define synchronized  for (uint8_t __key = lock(), i = 1; i != 0; i--, unlock(__key))

Simply prefix a block or a statement with "synchronized" and it is atomic. This is a nice clean and fast approach which hides all the details. I have also added an OOP class version where the constructor/destructor does the lock/unlocking.

class Lock {
public:
  Lock()
  {
    m_key = SREG;
    __asm__ __volatile__("cli" ::: "memory");
  }
  ~Lock()
  {
    SREG = m_key;
    __asm__ __volatile__("" ::: "memory");
  }

private:
  uint8_t m_key;
};


Define a Lock variable in a block and the code after the variable in the block is atomic. This handles both goto's and return statements something that the macro above does not without some additional help. See the linked files for more details. There are plenty of examples of the resulting style in the Cosa source code.

Cheers!

Matthijs Kooijman

unread,
Sep 26, 2015, 5:34:53 AM9/26/15
to devel...@arduino.cc
Hey folks, this discussion about using ATOMIC_BLOCK in AVR-specific code
touches upon another related subject, to which Mikael gives a possible
solution. Since this issue is really separate from (not) using
ATOMIC_BLOCK in AVR code, I've changed the subjct, so please try to
reply to this new subject only for comments to this.



Arduino has a portable interrupts() and noInterrupts() function, to
enable and disable interrupts. However, there is no portable "What is
the current interrupt state" function, so it is not currently possible
to disable interrupts, do something, and then *restore the previous
state*. You can only *enable* the interrupts afterwards, which obviously
causes all kinds of problems if interrupts were previously disabled.
This is mostly an issue for libraries and I guess libraries are not
often called with interrupts disabled, but I'm still surprised this
issue has not come up before.

So, I believe Arduino should have some way to get the current interrupt
state, and to conditionally enable interrupts based on the previous
state.

Mikael shows that Cosa has a lock() / unlock() function for this. Though
I don't really like the name "lock" (which is usually used for other
concepts in concurrent programming), the way these functions work seems
useful, so I'll just stick to these names in this discussion
(suggestions for better names welcome!)

Our current `noInterrupts()` function can be made to work like `lock()`
by letting it return the current interrupt state. A downside is that
when the return value is not used, this will add a bit of overhead
(since, at least on AVR, the read of SREG is volatile and cannot be
optimized away).

Making `interrupts()` work like `unlock()` is more tricky, since that
requires adding a parameter, breaking existing code. Converting it to an
inline function with an optional parameter could work, but that breaks C
code (only allowing C++). Adding a new function/macro that takes an
argument to exist alongside `interrupts()` is probably better, then.

The return type of the `lock()` function proposed by Mikael is
`uint8_t`, but is probably good to add a typedef for that, like
`interrupt_state_t`, in case other architectures need a different type
of this flag. Alternatively you could use a bool, but (on AVR) that
requires a bit more code to save and restore the flag and there could be
architectures that need > 1 bit of data to store.

> #define synchronized for (uint8_t __key = lock(), i = 1; i != 0; i--,
> unlock(__key))
Having a portable block like this really seems worthwile to me. It's
basically just a portable version of ATOMIC_BLOCK.

> class Lock {
A Lock variable like this, using a destructor to restore interrupt state
is also a nice thing to have. One might wonder if having three different
ways to achieve the same is a bit too much, though each of these three
will be useful in different contexts.

> Define a Lock variable in a block and the code after the variable in the
> block is atomic. This handles both goto's and return statements something
> that the macro above does not without some additional help. See the linked
> files for more details. There are plenty of examples of the resulting style
> in the Cosa source code.
FWIW, I think the ATOMIC_BLOCK macro _does_ handle gotos and returns,
becauses it uses the cleanup attribute (basically a gcc-extension to add
a "destructor" to an arbitrary variable), so you might want to consider
this one as well.

Gr.

Matthijs
signature.asc

Matthijs Kooijman

unread,
Sep 26, 2015, 5:39:06 AM9/26/15
to devel...@arduino.cc
Hey Tom,

> Thanks Paul. The avr-specificity with be problematic in a higher level api
> designed to be portable across cores. Is that a problem in this case?
Nope, the question was about AVR-specific HardwareSerial code, so that
should not be a problem.

Gr.

Matthijs
signature.asc

Paul Stoffregen

unread,
Sep 26, 2015, 6:46:51 AM9/26/15
to devel...@arduino.cc
AVR and ARM architecture have a subtle but important difference, which
should be kept in mind when designing and documenting interrupt-related
APIs.

AVR has a single global interrupt enable bit. When the AVR processor
enters an interrupt, this bit is automatically cleared. Normally, that
bit must remain cleared within the ISR. The hardware automatically sets
it when returning to the main program. Some programs do re-enable
interrupts within their ISR, but doing so is a very dangerous game.
Great care must be taken to prevent boundless recursion into ISRs, which
could quickly cause the stack to overwrite the AVR's tiny memory. While
tricks can be played in software, AVR's is fundamentally designed to
service one interrupt at a time.

ARM** has a similar bit which masks all interrupts. However, ARM uses a
totally different and separate mechanism to determine which interrupts
can be serviced while other interrupts are running. ARM does *not*
change its global interrupt mask bit when entering or exiting an
interrupt. Instead, internal state (not easily visible to the
programmer) about interrupt priority level is updated. ARM is
fundamentally designed to allow servicing many prioritized nested
interrupts.

If an API is created which exposes the enable/disable or masked/unmasked
state for all interrupts, this subtle hardware architecture difference
will be exposed to users, especially if they work with
attachInterrupt(). Even the current API is quite dangerous, where a
programmer may enable interrupts within an ISR, then disable them again
near the end, unknowingly tying their code tightly to AVR. When run on
ARM, such code will leave all interrupts locked out because ARM doesn't
manipulate the its global interrupt mask as part of the hardware entry &
exit of ISRs.

I believe a "lock" style API is much safer and more portable than
exposing the enable/disable or masked/unmasked state. The 2 functions
we have today already expose this. Hopefully a safer, more hardware
neutral API can be designed and the current API eventually deprecated.



** Technically, this applies to ARM's microcontroller architectures with
NVIC interrupt controller. I glossed over several minor details, in the
interest of keeping this message short & (hopefully) clear.



Andrew Kroll

unread,
Sep 26, 2015, 6:55:14 AM9/26/15
to devel...@arduino.cc

In my libraries I define an ATOMIC_BLOCK. AFAIK this readability hack originates from Atmel for the AVR specifically. Personally, I like the readability of it.

Paul Stoffregen

unread,
Sep 26, 2015, 7:02:15 AM9/26/15
to devel...@arduino.cc
On 09/26/2015 03:55 AM, Andrew Kroll wrote:

In my libraries I define an ATOMIC_BLOCK. AFAIK this readability hack originates from Atmel for the AVR specifically. Personally, I like the readability of it.


I'm pretty sure Dean Camera, acting as an individual long before his employment at Atmel, deserves the credit for ATOMIC_BLOCK in avr-libc.  Other than hosting the AVR Freaks forum, my understanding is Atmel wasn't involved in developing this software.


Andrew Kroll

unread,
Sep 26, 2015, 7:49:33 AM9/26/15
to devel...@arduino.cc

To be clear, I define it for ARM. If Dean did it, that's cool.

To use it in C, compile the C code with the C++ compiler, and tag the entire file with extern C. that way you get to use the macro, while retaining C linkage.

Mikael Patel

unread,
Sep 26, 2015, 8:30:46 AM9/26/15
to devel...@arduino.cc
Matthijs, thanks for suggestions. Learn something new every day! I must confess that I was not really taking portability into account. Instead focusing of how to writing code that is easy to understand, has more "natural" flow of expression, and syntactical sugar that helped reduce errors/forgetting something.

I would do just as you suggest with a typedef for the processor state. The uint8_t is very much an AVR thing.

Naming is really difficult. You could help explain the classic P() and V() for signal() and wait() on semaphore. I think they are the first letters in the Dutch words for signal() and wait() :)

The gcc-attribute __cleanup__ was something I had missed. This will help clean up some of the Cosa code though it has consequences on the synchronized macro (i.e. unlock is removed and replaced by the cleanup function).

Last but not least, Cosa is very much a C++ project. I understand that you have much more to consider with both legacy and supporting both C and C++.

Cheers! Mikael
Reply all
Reply to author
Forward
0 new messages