Use of volatile in structure definitions to force word accesses

Tim Wescott

unread,

Sep 23, 2015, 3:55:23 PM9/23/15

to

This question came up in the thread titled "Re: Ada, was: Re: Fundamental
C question about "if" statements".

I felt it deserved its own thread, because it's far off topic to what's
become a very interesting, but far off topic thread.

I, for one, would really like to see one of our resident compiler experts
answer the question at the bottom.

On Wed, 23 Sep 2015 19:36:57 +0000, Simon Clubley wrote:

> On 2015-09-23, Tim Wescott <seemyw...@myfooter.really> wrote:
>> On Wed, 23 Sep 2015 18:36:43 +0000, Simon Clubley wrote:
>>> Good luck doing that in a reliable way on ARM with gcc when you are
>>> accessing device registers.
>>>
>>> If you access a bitfield in the lower 8 bits of a register, the gcc
>>> optimiser can turn that into 8-bit ldrb/strb opcodes instead of 32-bit
>>> ldr/str opcodes and thereby causing the program to break if you need
>>> to access your registers in units of greater than 8 bits.
>>
>> That's a very interesting comment, because that's basically How It Is
>> Done here, and I've had several years worth of success on both ST and
>> TI/
>> Luminary devices. I'm not sure if that's because of the way I define
>> my bitfields (absolutely everything is defined as a struct of
>> bitfields, then a union of that struct and a 32-bit integer), or if
>> I've been lucky,
>> or what.
>>
>>
> Do your MCUs _require_ register access in units of 16/32 bits ?
>
> If so, you may be doing something which works for now, but may not work
> in the future.
>
> It might be worth having a quick look with objdump just to be sure.
>
> I have personally had this happen to me and it broke my code due to the
> register access size constraints no longer been true.
>
>> So it looks like:
>>
>> struct SPeriphRegisterBits {
>> unsigned int bit0 : 1; unsigned int bits1_10 : 10; unsigned
>> int bit11 : 1; unsigned int : 16; unsigned int
>> bits28_31 : 4;
>> };
>>
>> union UPerihRegister {
>> struct SPeriphRegisterBits bits;
>> unsigned int all;
>> }
>>
>> All of the various register definitions then get collected into a
>> structure for the peripheral, which gets declared as "extern const",
>> and defined in the linker command file.
>>
>>
> This is a bit more involved than what I was doing. I wonder if this is
> what is saving you for now.

Yes, my MCU's require access in units of 16 or 32 bits, at least in
places -- at least, if the ST data book is to be believed that's the case.

I would REALLY like one of the compiler guys to weigh in on this, because
I'm wondering if collecting all those unions into a struct and calling it
volatile isn't forcing a 32-bit access. It would be nice to know if I'm
just bamboozling the optimizer into doing what I want, or if I'm actually
_telling_ it to do the right thing.

So, the final definitions (that I left out) is

struct SPeripheralRegs
{
UPeriphRegister1 REG1;
UPeriphRegister2 REG2;
-- etc --
};

extern volatile SPeripheralRegs THIS_PERIPHERAL;

Then I access things by

THIS_PERIPHERAL.REGISTER.bits.whatever = something;

Is (oh compiler guys!!!) this directing the compiler to act as if I wrote:

UPeriphRegister A;

A.all = static_cast<volatile unsigned int>THIS_PERIPHERAL.REGISTER.all;

A.bits.whatever = something;

static_cast<volatile unsigned int>THIS_PERIPHERAL.REGISTER.all = A.all;

I could see how it might do this by the intent of the language, but I
could see that I may just be bamboozling the optimizer, rather than
intelligently directing it. So -- I'm curious, in a more-than-idly sort
of way.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Simon Clubley

unread,

Sep 23, 2015, 5:56:34 PM9/23/15

to

On 2015-09-23, Tim Wescott <seemyw...@myfooter.really> wrote:
>
> I would REALLY like one of the compiler guys to weigh in on this, because
> I'm wondering if collecting all those unions into a struct and calling it
> volatile isn't forcing a 32-bit access. It would be nice to know if I'm
> just bamboozling the optimizer into doing what I want, or if I'm actually
> _telling_ it to do the right thing.
>

I've duplicated this using a variant of your structs and caused gcc to
generate invalid code for accessing registers.

I wasn't able to duplicate the range of cases I thought I could, but I
found this in one of my header files I created some years ago:

*
* Using bitfield structs versus bitmasks to access register bitfields:
* My original plan was to use bitfield structs to access the register
* bitfields. However, gcc is generating code to access the bitfields
* at a byte, and not longword, level. As the MMIO fields are longword
* access only, this results in invalid code been generated.
*
* Marking the enclosing 32-bit struct, instead of the underlying bitfields,
* as volatile results in code been correctly generated for most cases.
* However, when the field to be modified is a byte in the least significant
* 8 bits of the register, then gcc still generates a byte level store
* even though it performed a longword level read. There may also be other
* cases which I have not yet come across.
*
* As a result, I have switched to using bitmasks to access bitfields in a
* register.
*

so it's clear that _where_ you place the volatile attribute is critical
to how the generated code looks (and how broken it is). A longword is
32 bits BTW; that's my DEC background showing. :-)

Here's a self contained example:

============================================================================
/*
* Compiled with:
*
* arm-none-eabi-gcc -g -c -O2 -mcpu=arm926ej-s -o tim.o tim.c
*/

struct SPeriphRegisterBits
{
// unsigned int bit0 : 1;
// unsigned int bits1_10 : 10;
unsigned int bit0 : 8;
unsigned int bits1_10 : 3;

unsigned int bit11 : 1;
unsigned int : 16;
unsigned int bits28_31 : 4;
};

union UPerihRegister
{
struct SPeriphRegisterBits bits;
unsigned int all;

};

struct SPeripheralRegs
{
union UPerihRegister REG1;
union UPerihRegister REG2;
};

//extern
volatile struct SPeripheralRegs THIS_PERIPHERAL;

//
//Then I access things by
//
//THIS_PERIPHERAL.REGISTER.bits.whatever = something;

volatile union UPerihRegister *ur1 = (union UPerihRegister *) 0x80001000;

volatile struct SPeriphRegisterBits *sr1 = (struct SPeriphRegisterBits *) 0x80001000;

void tim_test(void)
{
unsigned int val;

// val = ur1->bits.bit0;
// ur1->bits.bit0 = 1;

val = sr1->bit0;

sr1->bit0 = sr1->bit0 + 3;

THIS_PERIPHERAL.REG1.bits.bit0 = 3;
return;
}
============================================================================

and here's the objdump output:

============================================================================
tim.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <tim_test>:
unsigned int val;

// val = ur1->bits.bit0;
// ur1->bits.bit0 = 1;

val = sr1->bit0;
0: e59f3024 ldr r3, [pc, #36] ; 2c <tim_test+0x2c>
4: e5933000 ldr r3, [r3]
8: e5d32000 ldrb r2, [r3]

sr1->bit0 = sr1->bit0 + 3;
c: e5d32000 ldrb r2, [r3]
10: e2822003 add r2, r2, #3
14: e20220ff and r2, r2, #255 ; 0xff
18: e5c32000 strb r2, [r3]

THIS_PERIPHERAL.REG1.bits.bit0 = 3;
1c: e59f300c ldr r3, [pc, #12] ; 30 <tim_test+0x30>
20: e3a02003 mov r2, #3
24: e5c32000 strb r2, [r3]
return;
}
28: e12fff1e bx lr
...
============================================================================

Note the use of ldrb and strb instead of ldr and str in the generated code.

So it's confirmed that having a 8-bit bitfield in the LSBs of the struct
causes the generated code to be broken, at least in the version of gcc
I am using.

My notes from a few years ago reflect my memory that I was able to
generate broken code for other cases as well, depending on exactly
what was marked as volatile (which admittedly was a factor I had
forgotten).

You don't happen to have any structs with the lower 8 bits being a
single bitfield by any chance do you ? It would be interesting to
see what objdump says in your case.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Simon Clubley

unread,

Sep 23, 2015, 7:14:13 PM9/23/15

to

On 2015-09-23, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
>
> I've duplicated this using a variant of your structs and caused gcc to
> generate invalid code for accessing registers.
>

I've now got gcc to generate bad code (at least as far as accessing
registers are concerned) whenever the bitfield is an 8-bit bitfield
which lies on a byte boundary.

A variant of the first example:

============================================================================
/*
* Compiled with:
*
* arm-none-eabi-gcc -g -c -O2 -mcpu=arm926ej-s -o tim.o tim.c
*/

struct SPeriphRegisterBits
{
// unsigned int bit0 : 1;
// unsigned int bits1_10 : 10;

unsigned int : 8;
unsigned int : 8;

unsigned int bits1_10 : 3;
unsigned int bit11 : 1;

unsigned int bits28_31 : 4;
unsigned int bit0 : 8;

};

union UPerihRegister
{
struct SPeriphRegisterBits bits;
unsigned int all;
};

struct SPeripheralRegs
{
union UPerihRegister REG1;
union UPerihRegister REG2;
};

//extern
volatile struct SPeripheralRegs THIS_PERIPHERAL;

//
//Then I access things by
//
//THIS_PERIPHERAL.REGISTER.bits.whatever = something;

volatile union UPerihRegister *ur1 = (union UPerihRegister *) 0x80001000;

volatile struct SPeriphRegisterBits *sr1 = (struct SPeriphRegisterBits *) 0x80001000;

void tim_test(void)
{
unsigned int val;

// val = ur1->bits.bit0;
// ur1->bits.bit0 = 1;

val = sr1->bit0;

sr1->bit0 = sr1->bit0 + 3;

THIS_PERIPHERAL.REG1.bits.bit0 = 3;
return;
}
============================================================================

and the code it generates:

============================================================================
tim.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <tim_test>:
unsigned int val;

// val = ur1->bits.bit0;
// ur1->bits.bit0 = 1;

val = sr1->bit0;
0: e59f3024 ldr r3, [pc, #36] ; 2c <tim_test+0x2c>
4: e5933000 ldr r3, [r3]

8: e5d32003 ldrb r2, [r3, #3]

sr1->bit0 = sr1->bit0 + 3;

c: e5d32003 ldrb r2, [r3, #3]

10: e2822003 add r2, r2, #3
14: e20220ff and r2, r2, #255 ; 0xff

18: e5c32003 strb r2, [r3, #3]

THIS_PERIPHERAL.REG1.bits.bit0 = 3;
1c: e59f300c ldr r3, [pc, #12] ; 30 <tim_test+0x30>
20: e3a02003 mov r2, #3

24: e5c32003 strb r2, [r3, #3]

return;
}
28: e12fff1e bx lr
...
============================================================================

Do any of your structs have any 8-bit bitfields which are on a byte
boundary ? It would be interesting to see what gcc is doing in your
case.

The cross compiler I am using is gcc 4.5.{something} although, based
on the other problem reports I have come across, I suspect it's still
a problem in at least some newer versions as well.

Les Cargill

unread,

Sep 23, 2015, 7:38:59 PM9/23/15

to

Tim Wescott wrote:
<snip>

> I could see how it might do this by the intent of the language, but I
> could see that I may just be bamboozling the optimizer, rather than
> intelligently directing it. So -- I'm curious, in a more-than-idly sort
> of way.
>

objdump is faster than we are at this :)

--
Les Cargill

David Brown

unread,

Sep 24, 2015, 9:22:53 AM9/24/15

to

On my testing, this second read is ldr. (Note that the first "ldr r3,
[r3]" is just because the sr1 pointer is not made const or static.)

> sr1->bit0 = sr1->bit0 + 3;
> c: e5d32000 ldrb r2, [r3]
> 10: e2822003 add r2, r2, #3
> 14: e20220ff and r2, r2, #255 ; 0xff
> 18: e5c32000 strb r2, [r3]
>

On my testing, both accesses are 32-bit. This leads to a little more
code between the load and the store, to get the masking right.

> THIS_PERIPHERAL.REG1.bits.bit0 = 3;
> 1c: e59f300c ldr r3, [pc, #12] ; 30 <tim_test+0x30>
> 20: e3a02003 mov r2, #3
> 24: e5c32000 strb r2, [r3]

Same again - it's all 32-bit accesses, but the code is more complex for
the masking.

> return;
> }
> 28: e12fff1e bx lr
> ...
> ============================================================================
>
> Note the use of ldrb and strb instead of ldr and str in the generated code.
>
> So it's confirmed that having a 8-bit bitfield in the LSBs of the struct
> causes the generated code to be broken, at least in the version of gcc
> I am using.

All we have confirmed here is that you have a gcc version with the old
volatile bitfield bug. (Note that arguably it was not a bug, since the
C standards don't say anything about what size accesses to a bitfield
should use. gcc used the smallest access because it gave shorter and
faster code. But even if it is not technically a bug, it is still a
surprise - and leads to problems on architectures that expect accesses
to be 32-bit in such circumstances. Hence the option was created, and
made standard on targets such as ARM that need it.)

What version of gcc are you using here?

David Brown

unread,

Sep 24, 2015, 9:29:04 AM9/24/15

to

On 24/09/15 01:12, Simon Clubley wrote:
>
> The cross compiler I am using is gcc 4.5.{something} although, based
> on the other problem reports I have come across, I suspect it's still
> a problem in at least some newer versions as well.
>

That explains it - the issue was fixed in 4.6.

There are a few questionable issues in later versions of the compiler,
in regard to particularly complex bitfield structures. The trouble here
is that volatile accesses are not well defined in C - much of it is left
as "implementation defined" (meaning the compiler can choose, but it
should be consistent and preferably documented) or even "unspecified"
(meaning it has to follow certain behaviour patterns, but the compiler
can choose which to use - and it can choose arbitrarily from case to case).

As with all uses of volatile, if you try to make the code unreasonably
complicated, you run the risk of code that does not do as you hoped -
usually because you don't understand it yourself, but sometimes because
of imprecise standards definitions and sometimes because of compiler bugs.

But as long as you stick to rational usage of volatile bitfields, and
gcc 4.6 or later, you should not see any problems.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 24, 2015, 9:30:54 AM9/24/15

to

Tim Wescott <seemyw...@myfooter.really> wrote:

> I would REALLY like one of the compiler guys to weigh in on this, because
> I'm wondering if collecting all those unions into a struct and calling it
> volatile isn't forcing a 32-bit access.

I'm not a compiler guy, but having looked at bitfields a short time ago,
my understanding is that the language standard always allows the
compiler to access bitfields in smaller units if it feels like it. GCC
has a '-fstrict-volatile-bitfields' flag for disabling this
optimization.

-a

David Brown

unread,

Sep 24, 2015, 9:41:24 AM9/24/15

to

On 23/09/15 21:55, Tim Wescott wrote:
> This question came up in the thread titled "Re: Ada, was: Re: Fundamental
> C question about "if" statements".
>
> I felt it deserved its own thread, because it's far off topic to what's
> become a very interesting, but far off topic thread.
>

Maybe I should have read this post before answering in the other thread...

> I, for one, would really like to see one of our resident compiler experts
> answer the question at the bottom.

Let me try (see below).

It is the case for many MCU's, including most ARM devices.

>
> I would REALLY like one of the compiler guys to weigh in on this, because
> I'm wondering if collecting all those unions into a struct and calling it
> volatile isn't forcing a 32-bit access. It would be nice to know if I'm
> just bamboozling the optimizer into doing what I want, or if I'm actually
> _telling_ it to do the right thing.

As long as you are using gcc 4.6 (or any other compiler that implements
volatile bitfield accesses with exactly the field size given in the
bitfield definition), you are fine. C does not specify what size such
accesses should have, but leaves it up to the implementation. So gcc
4.5 and earlier were technically correct to the C standards - the
implementation picked a size that gave smaller and faster code, but that
unfortunately did not work on many ARM microcontrollers.

>
> So, the final definitions (that I left out) is
>
> struct SPeripheralRegs
> {
> UPeriphRegister1 REG1;
> UPeriphRegister2 REG2;
> -- etc --
> };
>
> extern volatile SPeripheralRegs THIS_PERIPHERAL;
>
> Then I access things by
>
> THIS_PERIPHERAL.REGISTER.bits.whatever = something;
>
> Is (oh compiler guys!!!) this directing the compiler to act as if I wrote:
>
> UPeriphRegister A;
>
> A.all = static_cast<volatile unsigned int>THIS_PERIPHERAL.REGISTER.all;
>
> A.bits.whatever = something;
>
> static_cast<volatile unsigned int>THIS_PERIPHERAL.REGISTER.all = A.all;
>

That's pretty much what happens, yes. With each RMW set done
independently for each bitfield access.

> I could see how it might do this by the intent of the language, but I
> could see that I may just be bamboozling the optimizer, rather than
> intelligently directing it. So -- I'm curious, in a more-than-idly sort
> of way.
>

It is implementation dependent (defined by the compiler), but gcc 4.6
onwards defines it to work in the way you expect here.

I've pasted some bits from my replies on the other thread here, in case
people prefer to reply in this thread. But I will continue to read and
comment on both threads. Whether that will be helpful or confusing
remains to be seen...

On 23/09/15 21:19, Tim Wescott wrote:
>
> So it looks like:
>
> struct SPeriphRegisterBits
> {
> unsigned int bit0 : 1;
> unsigned int bits1_10 : 10;
> unsigned int bit11 : 1;
> unsigned int : 16;
> unsigned int bits28_31 : 4;
> };
>
> union UPerihRegister
> {
> struct SPeriphRegisterBits bits;
> unsigned int all;
> }

A few comments here.

Don't use "unsigned int" - use uint32_t. Make it all absolutely clear
and fully specified.

You don't need to do this in two parts. Put it altogether, using an
anonymous struct (this is a very common extension to C99, which has
existed in gcc since close to the dawn of time, and it is now part of C11).

I prefer to make typedefs of all my types, so that I don't need union or
struct tags later. But that is a matter of style. And I strongly
dislike systems Hungarian style, such as prefixing a type with U just
because it happens to be a union (especially if you haven't used
typedef, so that you have to have the "union" tag too).

Make your padding bits explicit with a name. The C++ memory model says
that unnamed fields may not be changed by any write operations (C is
more lenient), which can cause trouble for fields. It's better to make
all your padding explicit.

I also like to add a static assertion to check that there are no
mistakes when defining structures like these. Of preference I also use
"-Wpadded" in gcc to warn of any unexpected padding - but that does not
play well with most manufacturer-supplied headers.

typedef union {
struct {
uint32_t bit0 : 1;
uint32_t bits1_10 : 10;
uint32_t bit11 : 1;
uint32_t padding : 16;
uint32_t bits28_31 : 4;
};
uint32_t all;
} periphRegister_t;

static_assert(sizeof(periphRegister_t) == 4,
"Check bitfield definition for PeriphRegister");

Having said all that, your definition will also work correctly (assuming
you've got the volatile there somewhere, and are using gcc 4.6 or above
which fixes the volatile bitfield problem).

>
> All of the various register definitions then get collected into a
> structure for the peripheral, which gets declared as "extern const", and
> defined in the linker command file.

I presume you mean "extern volatile const", and that's only for the
read-only registers. It's common to define registers like this:

#define periphRegister (*((volatile periphRegister_t*) 0x12345678))

(Registers can, of course, be collected in larger structures first.)

The advantage of that is that everything is defined in a C header file -
you don't need to mess around with linker files too. And since the
compiler knows exactly what addresses are used, it can generate more
efficient code than if they have to be sorted out at link time.

If you write:

void test1(void) {
periphRegister.bit0 = 1;
periphRegister.bit11 = 0;
}

then the compiler will (correctly) generate two independent RMW
sequences, optimising only the calculations of the address to use.

If you want both assignments to be done in a single RMW operation, you
must do so explicitly - that's what the "all" field is for.

void test2(void) {
periphRegister_t cache;
cache.all = periphRegister.all;
cache.bit0 = 1;
cache.bit11 = 0;
periphRegister.all = cache.all;
}

John Devereux

unread,

Sep 24, 2015, 1:11:18 PM9/24/15

to

The *C language* standard may allow it, but on a particular machine
architecture there can be an ABI that specifies these things. For
example the ARM eabi. The OP was using the bitfield feature for
accessing hardware registers on microcontrollers. So the architecture
will be pretty much fixed by definition and it should be safe to use
(compilers bugs aside).

--

John Devereux

Tim Wescott

unread,

Sep 24, 2015, 3:09:04 PM9/24/15

to

On Thu, 24 Sep 2015 15:28:59 +0200, David Brown wrote:

> On 24/09/15 01:12, Simon Clubley wrote:
>>
>> The cross compiler I am using is gcc 4.5.{something} although, based on
>> the other problem reports I have come across, I suspect it's still a
>> problem in at least some newer versions as well.
>>
>>
> That explains it - the issue was fixed in 4.6.
>
> There are a few questionable issues in later versions of the compiler,
> in regard to particularly complex bitfield structures. The trouble here
> is that volatile accesses are not well defined in C - much of it is left
> as "implementation defined" (meaning the compiler can choose, but it
> should be consistent and preferably documented) or even "unspecified"
> (meaning it has to follow certain behaviour patterns, but the compiler
> can choose which to use - and it can choose arbitrarily from case to
> case).
>
> As with all uses of volatile, if you try to make the code unreasonably
> complicated, you run the risk of code that does not do as you hoped -
> usually because you don't understand it yourself, but sometimes because
> of imprecise standards definitions and sometimes because of compiler
> bugs.

Or, as I alluded to in my original post, you run the risk of making code
that's complicated to confuse the optimizer enough that it throws up its
hands and does what you want at the moment, without guaranteeing that
future optimizers wouldn't be able to unwind what you did and break it by
making it "right".

Which is why I was asking about the intent of the volatile keyword WRT
unions.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 24, 2015, 3:32:23 PM9/24/15

to

John Devereux <jo...@devereux.me.uk> wrote:

> The *C language* standard may allow it, but on a particular machine
> architecture there can be an ABI that specifies these things.

Yep, but why learn bad habits that will break your code when you change
platforms in the next project?

-a

Grant Edwards

unread,

Sep 24, 2015, 3:47:06 PM9/24/15

to

Even if a particular C compiler's manual says that adding "volatile"
to a struct member will do something-or-other to the access width (and
testing with today's version compiler on today's host with today's
options validates that claim), I've found over the decades that
relying on features like that ends up causing far more trouble than it
avoids.

Sure, it seems cool now. But, a few years from now when it stops
working and it takes somebody two months to find the cause of the
mysterious data corruptions -- you don't want to bump into that
somebody in a dark alley.

--
Grant Edwards grant.b.edwards Yow! All this time I've
at been VIEWING a RUSSIAN
gmail.com MIDGET SODOMIZE a HOUSECAT!

Simon Clubley

unread,

Sep 24, 2015, 4:34:04 PM9/24/15

to

On 2015-09-24, David Brown <david...@hesbynett.no> wrote:
> On 24/09/15 01:12, Simon Clubley wrote:
>>
>> The cross compiler I am using is gcc 4.5.{something} although, based
>> on the other problem reports I have come across, I suspect it's still
>> a problem in at least some newer versions as well.
>>
>
> That explains it - the issue was fixed in 4.6.
>

Interesting.

One of the things which has clearly thrown me here is that I went and
looked at the comp.lang.ada thread from a few weeks ago before commenting
here about bitfields.

In that thread, the OP is using a reasonably recent version of GNAT
(the name for the Ada gcc front end) from ACT. The last time I checked,
ACT were using gcc 4.7 for their own internal branch of the gcc tree
so I thought the problem was present upto at least gcc 4.7.

However, it's possible that the Ada front end is still exposing the
problem in a way that the C front end no longer is.

Simon Clubley

unread,

Sep 24, 2015, 4:44:59 PM9/24/15

to

On 2015-09-24, David Brown <david...@hesbynett.no> wrote:
>

> If you want both assignments to be done in a single RMW operation, you
> must do so explicitly - that's what the "all" field is for.
>
> void test2(void) {
> periphRegister_t cache;
> cache.all = periphRegister.all;
> cache.bit0 = 1;
> cache.bit11 = 0;
> periphRegister.all = cache.all;
> }
>

You have to do this in Ada as well at the moment.

There's currently an Ada Issue which talks about adding the ability
to the next version of Ada to list the bitfields to be updated in
a single assignment statement so that the Ada version of the above
would become a single RMW sequence without having to use a temporary
variable.

It's here if you are interested:

http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai12s/ai12-0127-1.txt?rev=1.1

Hans-Bernhard Bröker

unread,

Sep 24, 2015, 5:47:09 PM9/24/15

to

Am 23.09.2015 um 21:55 schrieb Tim Wescott:

> I would REALLY like one of the compiler guys to weigh in on this, because
> I'm wondering if collecting all those unions into a struct and calling it
> volatile isn't forcing a 32-bit access.

As far as the language definition is concerned, "volatile" is not really
supposed to have anything to do with that.

Volatile specifies that all accesses to an object have to actually
happen, in the order specified by the source. So no optimize-out of
reads or writes, and no shuffling of volatile accesses among each other,
among other restrictions.

But it says nothing at all about the width of accesses. In th
Standardese language, the reason is that it's "implementation-defined
what constitutes access to a volatile-qualified object". One of the
meanings of that rather opaque statement is that they left open whether
access to a single element of the compound also constitutes an "access"
to the whole thing. And only such accesses would be protected by the
specified effect of "volatile". So no, the outer "volatile" is not
really specified to have any effect on referral to the inner elements.

The only way I can see to somewhat portably control access width is to
do away with the all the structs and bitfields altogether, and actually
use bitwise operators on volatile-qualified object of the necessary
size. I.e. if you write

volatile uint32_t *pointer_to_register = some_address;
*pointer_to_register |= some_mask;

then that _does_ force the compiler to actually do a 32-bit access,
because the object you're accessing is, beyond reasonable doubt,
qualified "volatile", so the rules do apply here: this object has to be
read, and it has to be written.

Bit fields can not portably achieve the same effect, because access to
them does not necessarily constitute access to the containing compound
data structure.

David Brown

unread,

Sep 24, 2015, 6:02:47 PM9/24/15

to

The intent is, I think, pretty clear - a volatile access is the same
regardless of whether or not a union is involved. The issue with
volatile bitfield access was that the size of the access was not well
defined (the standards don't give any clues), and for many purposes the
use of differently sized accesses can give significantly more efficient
code that is perfectly correct on that platform.

There are, I think, three main issues with regard to getting volatile to
work as the programmer intended. If you understand these, you should be
fine.

1. Programmers often don't understand volatile. Some believe it makes
accesses atomic - so that on an 8-bit micro, they confuse making a
uint16_t volatile with making it safe for swapping data between contexts
(different threads or interrupt functions). And some believe that
volatile accesses enforce certain orders on non-volatile accesses, or
calculations.

2. The standards are unclear - "what constitutes a volatile access is
implementation dependent". If v is a volatile variable, and your target
supports arithmetic operation directly on memory addresses, then v++ may
do a read, modify, then write, or it may do a single add to the memory
address. If you ask to read a volatile variable of a certain size, then
the compiler may read it in multiple parts, or it may read additional
data beside the actual variable, or it may read only part of the
variable (if it only needs part of it). Unless your compiler makes
guarantees in this area (such as gcc's -fstrict-volatile-bitfields), you
don't know.

3. Compilers may have bugs. The typical case is when complex volatile
operations interact with complex optimisations, and the compiler does
code movement or simplification that is not allowed for volatiles.

A key way to minimise the risks of anything going on is to keep
statements and expressions involving volatiles relatively simple. Try
to stick to a single read or a single write of a volatile at a time.
Then the code is clear to a human reader, and clear to the compiler.
(Having the volatile item deep within nested structs and unions is not a
problem, nor does it matter if the volatile qualifier is on the
innermost item, the outermost structure, a cast wrapper on the outside,
or anything inbetween.) If vp is a volatile pointer to a volatile int,
and vi is a volatile int, with foo() being a function, then writing
"vp[vi = foo()]++" may be legal C, but it is asking for trouble.

David Brown

unread,

Sep 24, 2015, 6:04:48 PM9/24/15

to

On 24/09/15 22:32, Simon Clubley wrote:
> On 2015-09-24, David Brown <david...@hesbynett.no> wrote:
>> On 24/09/15 01:12, Simon Clubley wrote:
>>>
>>> The cross compiler I am using is gcc 4.5.{something} although, based
>>> on the other problem reports I have come across, I suspect it's still
>>> a problem in at least some newer versions as well.
>>>
>>
>> That explains it - the issue was fixed in 4.6.
>>
>
> Interesting.
>
> One of the things which has clearly thrown me here is that I went and
> looked at the comp.lang.ada thread from a few weeks ago before commenting
> here about bitfields.
>
> In that thread, the OP is using a reasonably recent version of GNAT
> (the name for the Ada gcc front end) from ACT. The last time I checked,
> ACT were using gcc 4.7 for their own internal branch of the gcc tree
> so I thought the problem was present upto at least gcc 4.7.
>
> However, it's possible that the Ada front end is still exposing the
> problem in a way that the C front end no longer is.
>
> Simon.
>

I am afraid I can't answer for GNAT - the tiny amount of Ada that I have
tried does not touch on this sort of thing.

One other point is that only some targets have
-fstrict-volatile-bitfield enabled by default (ARM, for example) - on
some targets, you need to enable it manually if you want that behaviour.

Simon Clubley

unread,

Sep 24, 2015, 6:53:32 PM9/24/15

to

Yes, the OP in that thread was using ARM. I'm assuming there's something
in the Ada front end which is causing this to still be an issue for
Ada code.

David Brown

unread,

Sep 24, 2015, 7:27:14 PM9/24/15

to

On 24/09/15 23:47, Hans-Bernhard Bröker wrote:
> Am 23.09.2015 um 21:55 schrieb Tim Wescott:
>
>> I would REALLY like one of the compiler guys to weigh in on this, because
>> I'm wondering if collecting all those unions into a struct and calling it
>> volatile isn't forcing a 32-bit access.
>
> As far as the language definition is concerned, "volatile" is not really
> supposed to have anything to do with that.
>
> Volatile specifies that all accesses to an object have to actually
> happen, in the order specified by the source. So no optimize-out of
> reads or writes, and no shuffling of volatile accesses among each other,
> among other restrictions.
>
> But it says nothing at all about the width of accesses. In th
> Standardese language, the reason is that it's "implementation-defined
> what constitutes access to a volatile-qualified object". One of the
> meanings of that rather opaque statement is that they left open whether
> access to a single element of the compound also constitutes an "access"
> to the whole thing. And only such accesses would be protected by the
> specified effect of "volatile". So no, the outer "volatile" is not
> really specified to have any effect on referral to the inner elements.
>

I am not sure of that final point in regard to the specifications - but
I am very sure of how it works in all compilers I have seen. A volatile
qualifier on a struct, union or array applies equally to all members of
that struct, union or array (the same applies to the const qualifier).
So there is no difference between:

struct { volatile uint32_t a; volatile uint32_t b; } s1;

and

volatile struct { uint32_t a; uint32_t b; } s2;

> The only way I can see to somewhat portably control access width is to
> do away with the all the structs and bitfields altogether, and actually
> use bitwise operators on volatile-qualified object of the necessary
> size. I.e. if you write
>
> volatile uint32_t *pointer_to_register = some_address;
> *pointer_to_register |= some_mask;
>
> then that _does_ force the compiler to actually do a 32-bit access,
> because the object you're accessing is, beyond reasonable doubt,
> qualified "volatile", so the rules do apply here: this object has to be
> read, and it has to be written.

No, that does not force the compiler to use 32-bit accesses. It forces
the compiler to access all parts of the 32-bit object, but the compiler
is free to divide it into 4 8-bit accesses if it wants, and it can order
those accesses as it likes. It is perfectly acceptable (to the
standards) for the compiler to do 4 byte-sized RMW operations - or 4
byte-sized reads, followed by the or, then 2 16-bit writes in the
opposite order.

Of course, a compiler is not going to do that (unless alignment issues
or the bit-size of the processor forces it to break up the accesses).
But it could - according to the standards.

You are right that such full-object manipulation is less likely to bump
into compiler bugs than volatile bitfield operation, but volatile
bitfield operation is considered reliable enough for most targets and
most modern mainstream compilers (such as gcc post 4.6) - that is why it
is a common arrangement for manufacturer-supplied header files.

>
> Bit fields can not portably achieve the same effect, because access to
> them does not necessarily constitute access to the containing compound
> data structure.

We are talking here about accessing hardware registers - it is
inherently non-portable. Even portability across compilers (for those
few that now use anything other than gcc for ARM) is unlikely.

DJ Delorie

unread,

Sep 24, 2015, 11:26:33 PM9/24/15

to

Tim Wescott <seemyw...@myfooter.really> writes:
> I, for one, would really like to see one of our resident compiler experts
> answer the question at the bottom.

Well, I'm the resident compiler guy who fixed it in gcc :-)

As others have noted, you need the version of gcc that supports
-fstrict-volatile-bitfields, *and* you need to have a well-defined
(i.e. unambigious) struct (no holes, consistent types, etc), *and* you
need to have a port that supports it (ARM does), *and* it has to be a
size/alignment that the hardware supports (duh), *and* the
-fstrict-volatile-bitfields options needs to be enabled (a later version
of gcc has a third setting that means "unless the ABI says otherwise"
but the result is the same *if* your struct is well-defined) (-fs-v-b
might (should?) be the default for ports that support it, I don't recall
if they argued to change that).

So a struct like this should be OK:

volatile struct foo {
int32_t a:17;
int32_t b:5;
int32_t c:10;
};

but this would not be:

volatile struct foo {
char a:8;
int32_t b:24;
};

But, the only portable reliable way to guarantee the right type of
access is to not use bitfields.

Tim Wescott

unread,

Sep 25, 2015, 1:03:23 PM9/25/15

to

Thanks DJ, for the work as well as the comments.

I'm willing to put up with the non-portability caused by bitfields,
because at this point I don't see moving away from gcc for any reason
(well, unless some client pays me to, and if so they can pay for all the
rest, too).

David Brown

unread,

Sep 26, 2015, 11:01:27 AM9/26/15

to

As I see it, there is only one reasonable choice for an alternative to
gcc for ARM development, and that is clang/llvm. Clang is in many ways
gcc's strongest competitor - and also its closest partner, as the
developers work together on features such as the sanitizers. A key
point here is that ARM have chosen clang/llvm as their compiler for the
official ARM development tools of the future (replacing the Keil
compiler). Of course, ARM continues to support gcc development - they
want to sell cores, not compilers! But if you do change to clang in the
future, you can expect your bitfields to continue working - clang make a
point of keeping as good compatibility with gcc as possible.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Sep 30, 2015, 4:41:46 AM9/30/15

to

Tim Wescott <seemyw...@myfooter.really> wrote:

> I'm willing to put up with the non-portability caused by bitfields,
> because at this point I don't see moving away from gcc for any reason
> (well, unless some client pays me to, and if so they can pay for all the
> rest, too).

I've been looking at improving my skills with more modern techniques,
and this is what I'm planning on using:
<https://github.com/andersm/register_templates>

It combines the expessiveness of bit fields with the efficiency and
control of the traditional macro approach, and should be portable. The
caveats are that it's C++14 and so needs a modern compiler. It also
absolutely needs to be compiled with optimizations turned on for
everything to be precomputed at compile time, and I've found some GCC
ports to be fiddly about that (eg. the SuperH port of GCC 5.2.0 would
not precompute the bitmask at -Os, but the RX port had no such
problems).

-a

fra...@gmail.com

unread,

Oct 1, 2015, 8:34:56 PM10/1/15

to

You can ease the optimizer's work if make
mask = (1 << width) - 1

--
Francisco

David Brown

unread,

Oct 2, 2015, 4:44:10 AM10/2/15

to

To put that in context, that would be:

template<typename reg_type>
constexpr reg_type generate_mask(unsigned int width,
unsigned int position)
{
return (width ?
((((((reg_type) 1) << 1) << (width - 1)) - 1) << position)
: 0);
}

This has the advantage that it is C++11 compatible, and does not need
C++14 (C++14 allows more general constexpr functions than C++11).

For those that are wondering about the weirdness of shifting first by 1,
then by (width - 1), it is to avoid undefined behaviour in the code.
The C and C++ standards make "a << b" undefined if b is greater than or
equal to the maximum of the width of the type a and the width of an int.
That is, on a 32-bit system "a << 32" is undefined if a is 32-bit. If
"a" is a uint16_t, then "a << 16" is defined and always 0. But even
though for "uint32_t a", "a << 32" will almost certainly be evaluated to
0, it is still undefined. "(a << 1) << 31", on the other hand, /is/
defined, and it will be 0.

#include <stdint.h>

template<typename reg_type>
constexpr reg_type generate_mask(unsigned int width,
unsigned int position)
{
return (width ?
((((((reg_type) 1) << 1) << (width - 1)) - 1) << position)
: 0);
}

static_assert(generate_mask<uint8_t>(2, 3) == 0b11000, "Check mask");
static_assert(generate_mask<uint8_t>(8, 0) == 0xff, "Check mask");
static_assert(generate_mask<uint8_t>(8, 4) == 0xf0, "Check mask");
static_assert(generate_mask<uint8_t>(0, 4) == 0x00, "Check mask");

static_assert(generate_mask<uint64_t>(32, 16) == 0x0000ffffffff0000,
"Check mask");
static_assert(generate_mask<uint64_t>(64, 0) == 0xffffffffffffffff,
"Check mask");

Anders....@kapsi.spam.stop.fi.invalid

unread,

Oct 4, 2015, 2:12:50 PM10/4/15

to

David Brown <david...@hesbynett.no> wrote:
> On 02/10/15 02:34, fra...@gmail.com wrote:

>> You can ease the optimizer's work if make
>> mask = (1 << width) - 1

Yep, good observation. With this change the mask is precomputed at all
optimization levels above 0.

> For those that are wondering about the weirdness of shifting first by 1,
> then by (width - 1), it is to avoid undefined behaviour in the code.
> The C and C++ standards make "a << b" undefined if b is greater than
> or equal to the maximum of the width of the type a and the width of an
> int.

The static assertions in the reg_t template would trigger a build
failure in this case.

-a

David Brown

unread,

Oct 4, 2015, 5:29:14 PM10/4/15

to

No, they would not trigger here. (Your code would not have this
problem, since it does the shifts one at a time - but it /would/ have
this issue if you did the shifts all in a single shift expression.)
Your use of static assertions is a good idea, of course, but you should
try to get the details right on every corner case.

In this case, if you have a 32-bit int, with width equal to 32 and
position 0, then (1 << width) is undefined - but your static assertion
will not trigger until width is /greater/ than 32.

If you are curious as to why (1u << 32) is undefined (for a 32-bit int
C), the reason is that many cpus have a rotation instruction for
calculating x << y but the implementation details will vary when y is 32
or more. On some hardware, the result will always be 0 - but on other
hardware, the result will be calculated with y % 32, and thus x << y
will be x if y is 32. By leaving the result as undefined, the C
standards let compilers use a single fast instruction - if the behaviour
had been defined, then some compilers would have to add extra checks for
x << y calculations.

glen herrmannsfeldt

unread,

Oct 4, 2015, 11:35:57 PM10/4/15

to

David Brown <david...@hesbynett.no> wrote:
> On 04/10/15 20:12, Anders....@kapsi.spam.stop.fi.invalid wrote:

(snip)

>> Yep, good observation. With this change the mask is precomputed at all
>> optimization levels above 0.

(snip)

> In this case, if you have a 32-bit int, with width equal to 32 and
> position 0, then (1 << width) is undefined - but your static assertion
> will not trigger until width is /greater/ than 32.

> If you are curious as to why (1u << 32) is undefined (for a 32-bit int
> C), the reason is that many cpus have a rotation instruction for
> calculating x << y but the implementation details will vary when y is 32
> or more.

Rotate (by one) instructions were common on some early CPUs.
That included both N bit and N+1 (rotate through carry), where
combinations of such would shift between registers.

A little later, and as processors got more powerful, the multiple
bit shifts appeared. For some, there would be a processor loop
shifting one bit at a time, such that the shift time was proportional
(plus a small constant) the the shift amount. Consider that a
36 bit machine might allow for a 2**36 bit shift.

More specifically, the 8086 allows for a 256 bit shift, as the
shift amount is in an 8 bit register. With the 80286 the shift
amount was changed to modulo 32, for one to reduce the possible
instruction execution time, but also it allows for a barrel shifter
instead of a processor loop.

For another specific case, IBM S/360 does both single (32 bit)
and double register (64 bit) shifts modulo 64. Early versions
of Hercules used the C shift operator on 32 bit values, not
realizing that it needed to check for appropriate shift values.
Figuring that out was my biggest contribution to Hercules.

> On some hardware, the result will always be 0 - but on other
> hardware, the result will be calculated with y % 32, and thus x << y
> will be x if y is 32. By leaving the result as undefined, the C
> standards let compilers use a single fast instruction - if the behaviour
> had been defined, then some compilers would have to add extra checks for
> x << y calculations.

If more processors didn't do shifts modulo the register size,
I suspect C wouldn't have specified it that way. But they
do, and so does C.

-- glen

Anders....@kapsi.spam.stop.fi.invalid

unread,

Oct 6, 2015, 8:29:52 AM10/6/15

to

David Brown <david...@hesbynett.no> wrote:
> On 04/10/15 20:12, Anders....@kapsi.spam.stop.fi.invalid wrote:
>> The static assertions in the reg_t template would trigger a build
>> failure in this case.
> No, they would not trigger here.

You are of course right, I was looking at the wrong shift.

> If you are curious as to why (1u << 32) is undefined (for a 32-bit int
> C)

Shifting unsigned integers is defined:
"The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are zero-filled. If E1 has an unsigned type, the value of the result is
E1*2^E2, reduced modulo one more than the maximum value representable in
the result type."
"The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a non-negative value,
the value of the result is the integral part of the quotient of
E1/2^E2."

Accordingly, calculating the mask as an unsigned long long should avoid
problems and be more readable.

-a

David Brown

unread,

Oct 6, 2015, 9:51:28 AM10/6/15

to

On 06/10/15 14:29, Anders....@kapsi.spam.stop.fi.invalid wrote:
> David Brown <david...@hesbynett.no> wrote:
>> On 04/10/15 20:12, Anders....@kapsi.spam.stop.fi.invalid wrote:
>>> The static assertions in the reg_t template would trigger a build
>>> failure in this case.
>> No, they would not trigger here.
>
> You are of course right, I was looking at the wrong shift.
>
>> If you are curious as to why (1u << 32) is undefined (for a 32-bit int
>> C)
>
> Shifting unsigned integers is defined:
> "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
> are zero-filled. If E1 has an unsigned type, the value of the result is
> E1*2^E2, reduced modulo one more than the maximum value representable in
> the result type."
> "The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
> an unsigned type or if E1 has a signed type and a non-negative value,
> the value of the result is the integral part of the quotient of
> E1/2^E2."

I'm guessing that is from 5.8 in the C++ standards (N3337 is the freely
available version for C++11). But you forgot the paragraph above:

"The type of the result is that of the promoted left operand. The
behavior is undefined if the right operand is negative, or greater than
or equal to the length in bits of the promoted left operand."

So the left operand (1u in this case) is promoted if necessary (i.e., if
it is smaller than "int" or "unsigned int" it gets bumped up to "int" or
"unsigned int"). If the right operand is greater than or equal to the
length in bits of the promoted left operator, the result is undefined.

Thus on a 32-bit system, ((uint16_t) 1) << 16 is defined and will be 0,
as the calculation is done as though you used a 32-bit unsigned int and
then reduced the result to 16-bit. But ((uint32_t) 1) << 32 is /not/
defined - it could reasonably be 0, it could reasonably be 1, or the
compiler could assume that the programmer does not care what the result is.

(The standards for C have similar wording, and the same effect.)

>
> Accordingly, calculating the mask as an unsigned long long should avoid
> problems and be more readable.
>

You could certainly use ((uint64_t) 1u) for the shift - that would would
up to 63 bits. But you would hit the same problem at 64 bits. It is
better to have the somewhat ugly code here - it is correct, will work
regardless of the bit size of the system, and will still give optimal
code (assuming a sane compiler and sensible compiler options).