Aliasing through a union pointer

Ian Pilcher

unread,

Nov 23, 2015, 11:14:09 AM11/23/15

to

I am working with a "header" struct, all of whose members are
uint64_t's. I would like to treat the struct as an array of
uint64_t when calculating a checksum, byte swapping for endian
compatibility, etc.

Is casting a pointer to the struct to a pointer to union a safe
way to do this (from an aliasing point of view)? Is there a
better way to do it?

For example:

struct header {
uint64_t magic;
uint64_t version;
uint64_t size;
uint64_t uuid_lo;
uint64_t uuid_hi;
uint64_t type;
uint64_t offset;
uint64_t size;
uint64_t cksum;
};

#define HEADER_ARRAY_SIZE (sizeof(struct header) / sizeof(uint64_t))

union header_u {
struct header h;
uint64_t u[HEADER_ARRAY_SIZE];
};

/* Broken if the struct has any padding */
uint64_t naive_cksum(struct header *const hdr)
{
union header_u *const hu = (union header_u *)hdr;

uint64_t save, sum;
unsigned i;

save = hu->h.cksum;
hu->h.cksum = 0;

for (sum = 0, i = 0; i < HEADER_ARRAY_SIZE; ++i)
sum += hu->u[i];

hu->h.cksum = save;

return sum;
}

--
========================================================================
Ian Pilcher arequ...@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================

Eric Sosman

unread,

Nov 23, 2015, 11:43:59 AM11/23/15

to

The first difficulty that occurs to me is one that's already
occurred to you: You're screwed if the struct has any padding,
either interstitial or at the end. That seems fairly unlikely,
but is permitted by the language. As a "Just In Case" measure,
you might try asserting that

offsetof(struct header, cksum) + sizeof(uint64_t)
== sizeof(struct header)

... which would detect the presence of padding.

The next thing I think of is alignment: What if the alignment
of the union type is stricter than that of the struct? If anything
this is even less likely than the presence of padding, but I still
believe it's permitted. If it happens, then the pointer conversion
at the start of the naive_cksum() function could go awry.

A more serious matter -- an Actual Problem, I believe -- is
the modification (twice!) of an object referenced through a const-
qualified pointer. (Didn't your compiler complain?) You'd better
find another way to leave the stored checksum out of the computation.
In naive_cksum() you could just initialize `sum' to `-hu->h.cksum'
instead of to zero, but you'd need to be cagier with CRC's or other
less easily-inverted algorithms (where supplying a zero `cksum' might
not be appropriate, either).

... or you could assume that `cksum' is always the final element,
and stop the loop one iteration sooner. Or you could use offsetof
to determine which array index corresponds to `cksum', and skip that
one. Or you could ditch `const'.

It seems to me that once you've ensured there's no padding --
which you ought to do in any case -- the union doesn't buy you much:
Just cast `hdr' to a `uint64_t*' and blaze away. This also has the
virtue of avoiding the (highly unlikely) alignment problem, because
the struct alignment is at least good enough for a uint64_t.

--
eso...@comcast-dot-net.invalid
"Don't be afraid of work. Make work afraid of you." -- TLM

Eric Sosman

unread,

Nov 23, 2015, 11:49:55 AM11/23/15

to

On 11/23/2015 11:43 AM, Eric Sosman wrote:
>[...]

> A more serious matter -- an Actual Problem, I believe -- is
> the modification (twice!) of an object referenced through a const-

> qualified pointer. [...]

An even Bigger Actual Problem is the dope slap I'm about
to administer to myself for reading too hastily. "Sorry about
that, Chief."

Ian Pilcher

unread,

Nov 23, 2015, 2:22:33 PM11/23/15

to

On 11/23/2015 10:43 AM, Eric Sosman wrote:
> The first difficulty that occurs to me is one that's already
> occurred to you: You're screwed if the struct has any padding,
> either interstitial or at the end. That seems fairly unlikely,
> but is permitted by the language. As a "Just In Case" measure,
> you might try asserting that
>
> offsetof(struct header, cksum) + sizeof(uint64_t)
> == sizeof(struct header)
>
> ... which would detect the presence of padding.

Ah nice. I'm planning using gcc's __attribute__((packed)), so this
shouldn't be an issue, but I like the idea of the assert, or better
yet static_assert.

> It seems to me that once you've ensured there's no padding --
> which you ought to do in any case -- the union doesn't buy you much:
> Just cast `hdr' to a `uint64_t*' and blaze away. This also has the
> virtue of avoiding the (highly unlikely) alignment problem, because
> the struct alignment is at least good enough for a uint64_t.

For some reason, I thought that I wasn't allowed to do that. I have
vague memories of gcc warnings about type punning and strict aliasing.

I definitely don't get any such warnings when doing that, however, so
it looks like a case of over-thinking. (Root cause being that I'm
still not confident that I grasp the aliasing rules.)

On 11/23/2015 10:49 AM, Eric Sosman wrote:
> An even Bigger Actual Problem is the dope slap I'm about
> to administer to myself for reading too hastily. "Sorry about
> that, Chief."

At least you caught it on your own. Always more dignified that way.

Thanks for the help!

Eric Sosman

unread,

Nov 23, 2015, 2:38:54 PM11/23/15

to

On 11/23/2015 2:22 PM, Ian Pilcher wrote:
> On 11/23/2015 10:43 AM, Eric Sosman wrote:

>> [...]

>> It seems to me that once you've ensured there's no padding --
>> which you ought to do in any case -- the union doesn't buy you much:
>> Just cast `hdr' to a `uint64_t*' and blaze away. This also has the
>> virtue of avoiding the (highly unlikely) alignment problem, because
>> the struct alignment is at least good enough for a uint64_t.
>
> For some reason, I thought that I wasn't allowed to do that. I have
> vague memories of gcc warnings about type punning and strict aliasing.

You're "lying to the compiler" whether you transform the
struct pointer to a union pointer or to a uint64_t pointer, and
I see nothing about the first lie that would make it any less
sinful than the second. So, I'd suggest lying the simpler lie.

"Oh! What a tangled web we weave
When first we practise to deceive!"
-- Sir Walter Scott

> On 11/23/2015 10:49 AM, Eric Sosman wrote:
> > An even Bigger Actual Problem is the dope slap I'm about
> > to administer to myself for reading too hastily. "Sorry about
> > that, Chief."
>
> At least you caught it on your own. Always more dignified that way.

"More dignified" seems an overstatement, but I'll settle
for "less undignified." Sigh.

David Brown

unread,

Nov 23, 2015, 3:46:23 PM11/23/15

to

On 23/11/15 20:22, Ian Pilcher wrote:
> On 11/23/2015 10:43 AM, Eric Sosman wrote:
>> The first difficulty that occurs to me is one that's already
>> occurred to you: You're screwed if the struct has any padding,
>> either interstitial or at the end. That seems fairly unlikely,
>> but is permitted by the language. As a "Just In Case" measure,
>> you might try asserting that
>>
>> offsetof(struct header, cksum) + sizeof(uint64_t)
>> == sizeof(struct header)
>>
>> ... which would detect the presence of padding.
>
> Ah nice. I'm planning using gcc's __attribute__((packed)), so this
> shouldn't be an issue, but I like the idea of the assert, or better
> yet static_assert.
>

Be generous with your static assertions. They cost nothing in code size
or run-time, and protect against silly mistakes or incorrect
assumptions, as well as unexpected compiler effects. (If you don't have
C11, there are ways to make a static_assert macro.)

I'd advise against __attribute__((packed)) (or equivalents on other
compilers) - either it makes no difference because there is no padding,
or you will end up with structures with fields that don't match their
alignment. On some targets, that means run-time errors - on others, it
means seriously degraded code speed. So it is rarely a good idea.

Personally, I often put the padding spaces in explicitly in my structs -
especially if the exact layout is important. It also makes it easier to
manually count offsets if you want to do that. And gcc's "-Wpadded"
flag will warn you if you've got it wrong - as will your static assertions.

Jorgen Grahn

unread,

Nov 23, 2015, 5:20:50 PM11/23/15

to

On Mon, 2015-11-23, David Brown wrote:
...

> I'd advise against __attribute__((packed)) (or equivalents on other
> compilers) - either it makes no difference because there is no padding,
> or you will end up with structures with fields that don't match their
> alignment. On some targets, that means run-time errors - on others, it
> means seriously degraded code speed. So it is rarely a good idea.

I think I agree, but are there really cases where you say
__attribute__((packed)) and then suffer a crash because of it? The
compiler shouldn't generate a struct, and code to access it which will
crash if it's ever run.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Stephen Sprunk

unread,

Nov 23, 2015, 5:52:43 PM11/23/15

to

On 23-Nov-15 16:20, Jorgen Grahn wrote:
> On Mon, 2015-11-23, David Brown wrote: ....

>> I'd advise against __attribute__((packed)) (or equivalents on other
>> compilers) - either it makes no difference because there is no
>> padding, or you will end up with structures with fields that don't
>> match their alignment. On some targets, that means run-time errors
>> - on others, it means seriously degraded code speed. So it is
>> rarely a good idea.
>
> I think I agree, but are there really cases where you say
> __attribute__((packed)) and then suffer a crash because of it? The
> compiler shouldn't generate a struct, and code to access it which
> will crash if it's ever run.

As long as you access a misaligned member via a packed struct that is
itself properly aligned, then GCC will generate the necessary code to
safely do so.

If the struct itself is misaligned, or if you dereference a pointer to a
misaligned member, then all bets are off.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Nick Bowler

unread,

Nov 23, 2015, 5:54:24 PM11/23/15

to

On Mon, 23 Nov 2015 22:20:35 +0000, Jorgen Grahn wrote:
> On Mon, 2015-11-23, David Brown wrote:
> ...
>> I'd advise against __attribute__((packed)) (or equivalents on other
>> compilers) - either it makes no difference because there is no padding,
>> or you will end up with structures with fields that don't match their
>> alignment. On some targets, that means run-time errors - on others, it
>> means seriously degraded code speed. So it is rarely a good idea.
>
> I think I agree, but are there really cases where you say
> __attribute__((packed)) and then suffer a crash because of it? The
> compiler shouldn't generate a struct, and code to access it which will
> crash if it's ever run.

Yes. Since the packed struct members will not have the correct alignment,
using the & operator on the struct members will usually result in problems
if the pointer is subsequently dereferenced.

Note that gcc's packed attribute also relaxes the alignment requirements of
the struct iself, so in practice gcc almost always has to assume that any
member accesses are potentially unaligned, even if the struct has no
padding. This usually results in worse code.

Jorgen Grahn

unread,

Nov 23, 2015, 6:10:49 PM11/23/15

to

On Mon, 2015-11-23, Stephen Sprunk wrote:
> On 23-Nov-15 16:20, Jorgen Grahn wrote:
>> On Mon, 2015-11-23, David Brown wrote: ....
>>> I'd advise against __attribute__((packed)) (or equivalents on other
>>> compilers) - either it makes no difference because there is no
>>> padding, or you will end up with structures with fields that don't
>>> match their alignment. On some targets, that means run-time errors
>>> - on others, it means seriously degraded code speed. So it is
>>> rarely a good idea.
>>
>> I think I agree, but are there really cases where you say
>> __attribute__((packed)) and then suffer a crash because of it? The
>> compiler shouldn't generate a struct, and code to access it which
>> will crash if it's ever run.
>
> As long as you access a misaligned member via a packed struct that is
> itself properly aligned, then GCC will generate the necessary code to
> safely do so.
>
> If the struct itself is misaligned,

But that's broken code even if you don't use 'packed', right?
(Although you tend to use 'packed' when you want to pretend a char
array is a struct Foo, so the two tend to go hand in hand.)

> or if you dereference a pointer to a
> misaligned member, then all bets are off.

So saying &foo.bar suddenly becomes a bug waiting to happen, because
foo is a packed struct? Yes, I can see how that would be unavoidable.
But surely a compiler which provides 'packed' would emit a warning for
that code? In a sense foo.bar has no more of an address than a
bitfield has.

If it doesn't, then I agree about staying away from packed structs
(which I do anyway, but that strengthens my argument a lot).

Keith Thompson

unread,

Nov 23, 2015, 6:50:05 PM11/23/15

to

Jorgen Grahn <grahn...@snipabacken.se> writes:
> On Mon, 2015-11-23, David Brown wrote:
> ...
>> I'd advise against __attribute__((packed)) (or equivalents on other
>> compilers) - either it makes no difference because there is no padding,
>> or you will end up with structures with fields that don't match their
>> alignment. On some targets, that means run-time errors - on others, it
>> means seriously degraded code speed. So it is rarely a good idea.
>
> I think I agree, but are there really cases where you say
> __attribute__((packed)) and then suffer a crash because of it? The
> compiler shouldn't generate a struct, and code to access it which will
> crash if it's ever run.

Yes -- at least on systems where a misaligned access causes a crash.
(On x86 and x86_64, it doesn't; it's just a little slower.)

For example, given:

struct foo {
char c;
int x;
} __attribute__((packed));

if a struct foo object is, say, 4-byte aligned, then the member x is
likely to be misaligned. If you refer to the x member of a struct foo
object:

struct foo obj;
obj.x;

then gcc knows it's a member of a packed struct, and will generate
whatever code is necessary to access it. But if you take the address:

int *ptr = &obj.x;
... *ptr ...

then gcc will assume that *ptr is properly aligned, and a crash is
likely.

This could be fixed by disallowing taking the address of misaligned
members (as is done for bit fields), or of *any* member of a packed
struct, or, less realistically, by generating inefficient code
for all dereferences of pointers to types that require nontrivial
alignment.

See also <http://stackoverflow.com/q/8568432/827263>.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

supe...@casperkitty.com

unread,

Nov 23, 2015, 7:02:03 PM11/23/15

to

On Monday, November 23, 2015 at 5:50:05 PM UTC-6, Keith Thompson wrote:
> For example, given:
>
> struct foo {
> char c;
> int x;
> } __attribute__((packed));
>

> ... But if you take the address:

>
> int *ptr = &obj.x;
> ... *ptr ...
>
> then gcc will assume that *ptr is properly aligned, and a crash is
> likely.
>
> This could be fixed by disallowing taking the address of misaligned
> members (as is done for bit fields), or of *any* member of a packed
> struct, or, less realistically, by generating inefficient code
> for all dereferences of pointers to types that require nontrivial
> alignment.

Or by requiring that the above be written as something like:

> int __attribute__((packed)) *ptr = &obj.x;

I'm not sure if that's the right syntax for any particular compiler, but
the above would in some platforms cause all accesses made via *ptr to be
slower than they otherwise would, but not affect the performance of other
pointers to integers.

Stephen Sprunk

unread,

Nov 23, 2015, 8:05:57 PM11/23/15

to

On 23-Nov-15 17:10, Jorgen Grahn wrote:
> On Mon, 2015-11-23, Stephen Sprunk wrote:
>> On 23-Nov-15 16:20, Jorgen Grahn wrote:
>>> I think I agree, but are there really cases where you say
>>> __attribute__((packed)) and then suffer a crash because of it?
>>> The compiler shouldn't generate a struct, and code to access it
>>> which will crash if it's ever run.
>>
>> As long as you access a misaligned member via a packed struct that
>> is itself properly aligned, then GCC will generate the necessary
>> code to safely do so.
>>
>> If the struct itself is misaligned,
>
> But that's broken code even if you don't use 'packed', right?

Right, but it's worth mentioning anyway. GCC will only generate fixup
code for members it _knows_ are misaligned due to packing, and only for
the degree of misalignment that it expects.

For instance, a 4-byte member expected to be 2-byte misaligned may not
work if actually 1- or 3-byte misaligned--or vice versa.

> (Although you tend to use 'packed' when you want to pretend a char
> array is a struct Foo, so the two tend to go hand in hand.)

In most cases where I've found this useful, the buffer was known to be
properly aligned for the struct, so I could just cast a pointer and go.
In general, one must memcpy() from the buffer into a separate struct
object, but that is still usually easier than extracting each of the
members individually from the buffer.

>> or if you dereference a pointer to a misaligned member, then all
>> bets are off.
>
> So saying &foo.bar suddenly becomes a bug waiting to happen, because
> foo is a packed struct?

Yes. Here be dragons.

> Yes, I can see how that would be unavoidable. But surely a compiler
> which provides 'packed' would emit a warning for that code?

> ...

> If it doesn't, then I agree about staying away from packed structs
> (which I do anyway, but that strengthens my argument a lot).

I'd hope so, but I wouldn't count on it.

> In a sense foo.bar has no more of an address than a bitfield has.

The former has an address that is merely misaligned; the latter has no
address at all because individual bits aren't addressable.

David Brown

unread,

Nov 24, 2015, 3:55:31 AM11/24/15

to

On 23/11/15 23:20, Jorgen Grahn wrote:
> On Mon, 2015-11-23, David Brown wrote:
> ...
>> I'd advise against __attribute__((packed)) (or equivalents on other
>> compilers) - either it makes no difference because there is no padding,
>> or you will end up with structures with fields that don't match their
>> alignment. On some targets, that means run-time errors - on others, it
>> means seriously degraded code speed. So it is rarely a good idea.
>
> I think I agree, but are there really cases where you say
> __attribute__((packed)) and then suffer a crash because of it? The
> compiler shouldn't generate a struct, and code to access it which will
> crash if it's ever run.
>

The attribute tells gcc that it should generate the struct layout
exactly as /you/ tell it - /not/ as the target system normally requires
with respect to alignment. On some systems, misaligned accesses leads
to a bus fault or similar hardware exception - a crash, in most cases.
On some other systems, you will not get a hardware exception but you
will access incorrect data (for example, a 16-bit access will simply
ignore the lowest bit of the address). That means disaster of some sort.

If you use "packed", then /you/ take full responsibility for making sure
the accesses are correct. Perhaps you want to create a pointer to that
struct that is not fully aligned (that's going to involve some
implementation-dependent pointer casts). Perhaps you use memcpy() to
access the fields, or some other method that you know is safe.

And perhaps the compiler will generate combinations of smaller accesses
in order to make it safe on your particular target. That is all up to
the compiler in question - it is all implementation-dependent.

So if it is critical, you can use "packed" if you make sure it is safe
to do so on /your/ target, with /your/ compiler, using /your/ access
methods - and treat it as highly implementation-dependent.

Or just use explicit padding with a static assertion on the final size
(or particular offsets, if you prefer) to make the code clear, safe, and
portable (or at least giving clear error messages when you hit
portability issues).

Jorgen Grahn

unread,

Nov 24, 2015, 9:29:46 AM11/24/15

to

On Tue, 2015-11-24, David Brown wrote:
> On 23/11/15 23:20, Jorgen Grahn wrote:
>> On Mon, 2015-11-23, David Brown wrote:
>> ...
>>> I'd advise against __attribute__((packed)) (or equivalents on other
>>> compilers) - either it makes no difference because there is no padding,
>>> or you will end up with structures with fields that don't match their
>>> alignment. On some targets, that means run-time errors - on others, it
>>> means seriously degraded code speed. So it is rarely a good idea.
>>
>> I think I agree, but are there really cases where you say
>> __attribute__((packed)) and then suffer a crash because of it? The
>> compiler shouldn't generate a struct, and code to access it which will
>> crash if it's ever run.
>>
>
> The attribute tells gcc that it should generate the struct layout
> exactly as /you/ tell it - /not/ as the target system normally requires
> with respect to alignment.

...

> So if it is critical, you can use "packed" if you make sure it is safe
> to do so on /your/ target, with /your/ compiler, using /your/ access
> methods - and treat it as highly implementation-dependent.

Makes sense. I just wish the gcc documentation had expressed it that
way, so I could point people to it when they abuse it.

> Or just use explicit padding with a static assertion on the final size
> (or particular offsets, if you prefer) to make the code clear, safe, and
> portable (or at least giving clear error messages when you hit
> portability issues).

I prefer reading uint8_t by uint8_t from a buffer in most cases where
I could have used a struct "overlay". But yes, your way works too in
many important cases (and importantly will not fail at runtime).

Keith Thompson

unread,

Nov 24, 2015, 2:51:52 PM11/24/15

to

David Brown <david...@hesbynett.no> writes:
> On 23/11/15 23:20, Jorgen Grahn wrote:
>> On Mon, 2015-11-23, David Brown wrote:
>> ...
>>> I'd advise against __attribute__((packed)) (or equivalents on other
>>> compilers) - either it makes no difference because there is no padding,
>>> or you will end up with structures with fields that don't match their
>>> alignment. On some targets, that means run-time errors - on others, it
>>> means seriously degraded code speed. So it is rarely a good idea.
>>
>> I think I agree, but are there really cases where you say
>> __attribute__((packed)) and then suffer a crash because of it? The
>> compiler shouldn't generate a struct, and code to access it which will
>> crash if it's ever run.
>
> The attribute tells gcc that it should generate the struct layout
> exactly as /you/ tell it - /not/ as the target system normally requires
> with respect to alignment.

I wouldn't quite put it that way. Even with __attribute__((packed)),
you're not specifying the exact layout. You're just declaring the names
and types of the members, and letting the compiler determine the layout.
__attribute__((packed)) just causes the compiler to follow a different
set of rules in doing so.

> On some systems, misaligned accesses leads
> to a bus fault or similar hardware exception - a crash, in most cases.
> On some other systems, you will not get a hardware exception but you
> will access incorrect data (for example, a 16-bit access will simply
> ignore the lowest bit of the address). That means disaster of some sort.

If gcc knows that it's accessing a member of a packed struct, it will do
whatever is necessary, perhaps generating code to access it a byte at a
time. Problems occur only when you take the address of a member.

> If you use "packed", then /you/ take full responsibility for making sure
> the accesses are correct. Perhaps you want to create a pointer to that
> struct that is not fully aligned (that's going to involve some
> implementation-dependent pointer casts). Perhaps you use memcpy() to
> access the fields, or some other method that you know is safe.

No, the compiler takes *some* responsibility for making sure accesses
are correct. It's just not able to do so in all cases.

(That's not a general language rule, but it's how gcc behaves, and we're
talking about a gcc-specific feature.)

[...]

supe...@casperkitty.com

unread,

Nov 24, 2015, 3:20:25 PM11/24/15

to

On Tuesday, November 24, 2015 at 1:51:52 PM UTC-6, Keith Thompson wrote:
> If gcc knows that it's accessing a member of a packed struct, it will do
> whatever is necessary, perhaps generating code to access it a byte at a
> time. Problems occur only when you take the address of a member.

Many compilers (I believe gcc is among them) have distinct types for "packed"
and "unpacked" pointers; problems arise when storing a value of a "packed"
pointer type into a variable of a "normal" pointer type. I'm actually
not quite sure why such stores are allowed without a cast, since the mere
act of storing a misaligned pointer into a "normal" pointer type may knock a
program off the rails even if the pointer is only used in ways which would
be legal for a "packed" pointer. For example, while memcpy does not normally
impose any alignment restrictions on its operands, a compiler may
legitimately generate code for

void intcpy(int *dest, int *src)
{
memcpy(dest, src, sizeof int);
}

which fails if src or dest were not int-aligned. If one has a structure
which contains a packed member of type "int" which satisfies the alignment
requirement of neither "int" nor the structure itself, there is no way in
which code could store the address of that member into a normal pointer
without the program going off the rails. If the itself does not require
int alignment, it would be possible for the struct to have an unusual
alignment that makes the member alignment correct, but if the struct itself
requires integer alignment that can't happen.

David Brown

unread,

Nov 24, 2015, 4:58:26 PM11/24/15

to

memcpy is good for doing such copies - it has the advantage that if it
could be done faster using 16-bit or bigger moves, the compiler can do
that (if it is reasonably smart), with no worries about aliasing :-)

Also explicit packing "dummies" have the advantage that there is no
chance of undefined, implementation-defined, unspecified or generally
unexpected or undesired behaviour when they are accessed. (When storing
something in a struct, padding bytes get unspecified values.) This is
particularly important if you want to compare structs, or do something
like calculate a checksum over them.

And as noted before, be generous with your static assertions. You find
out problems at compile time, with no run-time costs.

David Brown

unread,

Nov 24, 2015, 5:04:40 PM11/24/15

to

On 24/11/15 20:51, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
>> On 23/11/15 23:20, Jorgen Grahn wrote:
>>> On Mon, 2015-11-23, David Brown wrote:
>>> ...
>>>> I'd advise against __attribute__((packed)) (or equivalents on other
>>>> compilers) - either it makes no difference because there is no padding,
>>>> or you will end up with structures with fields that don't match their
>>>> alignment. On some targets, that means run-time errors - on others, it
>>>> means seriously degraded code speed. So it is rarely a good idea.
>>>
>>> I think I agree, but are there really cases where you say
>>> __attribute__((packed)) and then suffer a crash because of it? The
>>> compiler shouldn't generate a struct, and code to access it which will
>>> crash if it's ever run.
>>
>> The attribute tells gcc that it should generate the struct layout
>> exactly as /you/ tell it - /not/ as the target system normally requires
>> with respect to alignment.
>
> I wouldn't quite put it that way. Even with __attribute__((packed)),
> you're not specifying the exact layout. You're just declaring the names
> and types of the members, and letting the compiler determine the layout.
> __attribute__((packed)) just causes the compiler to follow a different
> set of rules in doing so.

As often happens, your wording is more precise than mine.

>
>> On some systems, misaligned accesses leads
>> to a bus fault or similar hardware exception - a crash, in most cases.
>> On some other systems, you will not get a hardware exception but you
>> will access incorrect data (for example, a 16-bit access will simply
>> ignore the lowest bit of the address). That means disaster of some sort.
>
> If gcc knows that it's accessing a member of a packed struct, it will do
> whatever is necessary, perhaps generating code to access it a byte at a
> time. Problems occur only when you take the address of a member.

That is true with gcc, as far as I know - but it may not be the case on
other compilers with similar "packing" extensions. (I would need to
check with the old documentation, but I believe I have used a compiler
that supported packed structures but could generate code that crashed
when accessing them directly.) Also note that on some architectures,
such as the ARM, unaligned accesses are allowed (and slow, but more
efficient than multiple byte accesses) on some versions but not on
others. If code is supposed to be object-code portable between
different versions, it is easy to get this wrong.

So accessing packed members is /probably/ safe, but
implementation-dependent and something that needs to be checked.

>
>> If you use "packed", then /you/ take full responsibility for making sure
>> the accesses are correct. Perhaps you want to create a pointer to that
>> struct that is not fully aligned (that's going to involve some
>> implementation-dependent pointer casts). Perhaps you use memcpy() to
>> access the fields, or some other method that you know is safe.
>
> No, the compiler takes *some* responsibility for making sure accesses
> are correct. It's just not able to do so in all cases.
>
> (That's not a general language rule, but it's how gcc behaves, and we're
> talking about a gcc-specific feature.)
>

A good many compilers support structure packing in some way - several
using gcc attribute syntax. So while gcc is the focus here, it's worth
considering general cases too.

Eric Sosman

unread,

Nov 24, 2015, 5:31:06 PM11/24/15

to

On 11/24/2015 4:58 PM, David Brown wrote:
> [... "packed" structs and similar underhanded artifices ...]

> Also explicit packing "dummies" have the advantage that there is no
> chance of undefined, implementation-defined, unspecified or generally
> unexpected or undesired behaviour when they are accessed. (When storing
> something in a struct, padding bytes get unspecified values.) This is
> particularly important if you want to compare structs, or do something
> like calculate a checksum over them.

It's been my experience that these questions of packing and so on
nearly always arise in connection with an externally-imposed format of
some kind: You want to use a struct to represent an IP header or a
disk partition table entry or something of that sort. If that's the
situation, I've almost always found that trying to sculpt the struct
into the desired shape is inferior to just letting the struct take its
own preferred form and using a mediating buffer to translate to and
from the imposed format. Use separate "native" and "serialized" forms,
and you're quite likely to be better off than by trying to force both
forms into one object.

Not only does the separation of forms avoid all the alignment
issues discussed elsethread, but it also offers a convenient hook for
dealing with other matters. Does the external form require a specific
endianness that may or may not match your machine's? Does it use a
weird representation for negative numbers, or for floating-point
quantities? Are text strings supposed to be a count word followed
by a payload in some strange five-bit encoding? The import() and
export() functions can handle all such idiosyncracies cleanly and
quietly, so the rest of the program can deal with perfectly normal,
native-format structs with native-format numbers, zero-terminated
native-encoded strings, and so on. The translation functions are
also good places to compute and verify things like checksums, which
are usually specified in terms of the external format.

Somewhere, somebody is shrieking "Efficiency! All that copying?
Why should I un-pickle an entire object just to test the one field
that tells me whether I care about the others? Eeeek!" This shrieker
is overly fearful and insufficiently imaginative; there are plenty of
ways to attack such issues if they should prove important. (Also,
for some reason the shrieker seldom yowls about the speed penalties
of using the misaligned elements in his packed structs. Go figure.)

There are many situations where it is tempting to map an imposed
format with a struct, but resisting temptation is nearly always a
virtue, and a virtue with rewards.

supe...@casperkitty.com

unread,

Nov 24, 2015, 6:27:50 PM11/24/15

to

On Tuesday, November 24, 2015 at 4:31:06 PM UTC-6, Eric Sosman wrote:
> Not only does the separation of forms avoid all the alignment
> issues discussed elsethread, but it also offers a convenient hook for
> dealing with other matters. Does the external form require a specific
> endianness that may or may not match your machine's?

IMHO, a good systems programming language (which is what C tried to be, at
least for awhile) should allow a programmer to specify what is required from
various types, and what aspects a compiler should be free to optimize as
convenient. Unless a compiler needs to run on a machine with less than a meg
of memory (which would be rare nowadays, even if the target might have less
than a kilobyte) there really shouldn't be any difficulty with a compiler
allowing code to define structures very precisely in terms of physical layout
and then generating code to access members appropriately. Compilers should
be able to do a much better job of producing efficient code from such things
than user code would be able to do, especially if code which made use of
aliasing was required to include explicit aliasing barriers [but the language
actually included the barriers that were required]. Local machine endianness
needn't be a problem, because the definition of the structure could specify
what endianness was actually needed.

Ian Collins

unread,

Nov 24, 2015, 6:58:04 PM11/24/15

to

I agree with you here. I leaned that lesson many years ago when testing
embedded code on SPARC systems. The cost of a misaligned access was way
more than the cost of unpacking the messages (and there were a lot of
them!).

The "Why should I un-pickle an entire object just to test the one
field?" problem can be solved by lazy extraction: only extract that one
field and keep the raw buffer around to be processed later.

--
Ian Collins

supe...@casperkitty.com

unread,

Nov 24, 2015, 7:32:22 PM11/24/15

to

On Tuesday, November 24, 2015 at 5:58:04 PM UTC-6, Ian Collins wrote:
> The "Why should I un-pickle an entire object just to test the one
> field?" problem can be solved by lazy extraction: only extract that one
> field and keep the raw buffer around to be processed later.

It could be solved even more nicely if a language provided built-in support
for such a semantics. I would aver that:

1. While working with data in externally-imposed formats isn't necessary in
all projects, nor is it an obscure requirement that's only needed on a
few specialized systems. Such semantics often play an important rule in
things like audio-visual decoding, where performance can be very
important.

2. It would be possible to formulate a list of required features that would
handle the vast majority of applicable usage cases.

3. It would greatly ease porting of code between machines with different
architectures.

4. In many cases, the most efficient way to code something will vary
enormously based not just on the target architecture, but also on context.
If a field's value will be needed twice, it may make sense to cache it
in a native-format register, or to simply fetch it when needed, but a
programmer can't be expected to know which approach will be faster.
By contrast, if a typed-memory-barrier directive were required to separate
accesses to a memory-imaged type from accesses to any other type which
might alias it, a compiler would be able to apply optimizations much more
effectively than a programmer could.

IMHO, rather than trying to push optimization at the expense of everything
else, it would be more helpful for C to offer more ways for programmers to
specify their requirements declaratively rather than imperatively, and let
the compiler figure out how to satisfy such declarative requirements rather
than trying to infer requirements from imperative-language code.

Ian Collins

unread,

Nov 25, 2015, 12:28:29 AM11/25/15

to

The language doesn't because it can't.

Some targets simply don't support misaligned access and others support
it at great cost (a trap in the case of SPARC). On those that do,
compiler extensions (such as those offered by gcc) are the recognised
and appropriate solution. I would argue that it is important for the
programmer to be aware that they are entering the realm of
implementation defined behaviour in this case.

--
Ian Collins

Keith Thompson

unread,

Nov 25, 2015, 12:52:37 AM11/25/15

to

supe...@casperkitty.com writes:
> On Tuesday, November 24, 2015 at 1:51:52 PM UTC-6, Keith Thompson wrote:
>> If gcc knows that it's accessing a member of a packed struct, it will do
>> whatever is necessary, perhaps generating code to access it a byte at a
>> time. Problems occur only when you take the address of a member.
>
> Many compilers (I believe gcc is among them) have distinct types for "packed"
> and "unpacked" pointers; problems arise when storing a value of a "packed"
> pointer type into a variable of a "normal" pointer type.

[...]

As far as I know, gcc doesn't support "packed" pointers.

If it did, they would be relevant to this question on Stack Overflow:
http://stackoverflow.com/q/8568432/827263
and nobody has commented on such a thing in the 4 years since I
posted it.

supe...@casperkitty.com

unread,

Nov 25, 2015, 9:54:58 AM11/25/15

to

On Tuesday, November 24, 2015 at 11:28:29 PM UTC-6, Ian Collins wrote:
> The language doesn't because it can't.
>
> Some targets simply don't support misaligned access and others support
> it at great cost (a trap in the case of SPARC). On those that do,
> compiler extensions (such as those offered by gcc) are the recognised
> and appropriate solution. I would argue that it is important for the
> programmer to be aware that they are entering the realm of
> implementation defined behaviour in this case.

If "foo" was defined as a pointer to a big-endian 32-bit integer stored using
8 bits in each of four "char" values, what processor would not be able to, at
worst, accommodate a compiler replacing "unsigned long l = *foo;" with

unsigned char *p = foo;
l = (p[1] << 8) | p[3] | ((unsigned long)p[2] << 16) |
((unsigned long)p[0] << 24);

A programmer would want to avoid using image types unnecessarily, but in
cases where it's necessary to process data according to a memory image,
a compiler would often be in a better position than a programmer to make
decisions about efficiency.

There would be some possible image descriptions that would be meaningful on
some processors but not others (e.g. storing a 36-bit unsigned integer as
four 9-bit characters), but the solution for that is to say that *if code
compiles* it must behave as indicated, while allowing for the fact that not
all compilers will be able to compile all code. Having code which won't
compile on all implementations, but will run correctly on all platforms that
can compile it, would be miles better than having code which compiles on all
platforms, runs correctly on some, and arbitrarily malfunctions on the rest.

supe...@casperkitty.com

unread,

Nov 25, 2015, 11:30:21 AM11/25/15

to

On Tuesday, November 24, 2015 at 11:52:37 PM UTC-6, Keith Thompson wrote:
> > Many compilers (I believe gcc is among them) have distinct types for "packed"
> > and "unpacked" pointers; problems arise when storing a value of a "packed"
> > pointer type into a variable of a "normal" pointer type.
> [...]
>
> As far as I know, gcc doesn't support "packed" pointers.

Oh well. The Keil compiler I use does support them; the code necessary to
access them is basically the same as accessing unaligned structure members,
so I don't see any reason gcc shouldn't be able to support them as well.

On the Keil compiler, compiling for Cortex-m0 (no unaligned-read support)

typedef struct {int i1; char c2; int __packed i2;} S1;

int use_s1(S1 *p1, int __packed *p2)
{
return p1->i1 + p1->i2 + *p2;
}

yields

;;;168 int use_s1(S1 *p1, int __packed *p2)
000000 b570 PUSH {r4-r6,lr}
;;;169 {
000002 4604 MOV r4,r0
000004 460d MOV r5,r1
;;;170 return p1->i1 + p1->i2 + *p2;
000006 1d40 ADDS r0,r0,#5
000008 f7fffffe BL __aeabi_uread4
00000c 6821 LDR r1,[r4,#0]
00000e 1844 ADDS r4,r0,r1
000010 4628 MOV r0,r5
000012 f7fffffe BL __aeabi_uread4
000016 1820 ADDS r0,r4,r0
;;;171 }
000018 bd70 POP {r4-r6,pc}

From a type-system perspective, recognizing a qualifier on a variable type
would seem like it should be simple enough. I wonder why gcc wouldn't
support that?

Tim Rentsch

unread,

Nov 25, 2015, 1:11:00 PM11/25/15

to

Ian Pilcher <arequ...@gmail.com> writes:

> I am working with a "header" struct, all of whose members are
> uint64_t's. I would like to treat the struct as an array of
> uint64_t when calculating a checksum, byte swapping for endian
> compatibility, etc.
>
> Is casting a pointer to the struct to a pointer to union a safe
> way to do this (from an aliasing point of view)? Is there a
> better way to do it?
>
> For example:
>
> struct header {
> uint64_t magic;
> uint64_t version;
> uint64_t size;
> uint64_t uuid_lo;
> uint64_t uuid_hi;
> uint64_t type;
> uint64_t offset;
> uint64_t size;
> uint64_t cksum;
> };
>
> [..etc..]

If the definition of the struct is under your control, and if you
have the luxury of being able to use C11, this looks like a good
candidate for anonymous structs and unions:

struct header {
union {
struct {
uint64_t magic;
uint64_t version;
uint64_t size;
uint64_t uuid_lo;
uint64_t uuid_hi;
uint64_t type;
uint64_t offset;
uint64_t size;
uint64_t cksum;
};
uint64_t members_array[9];
};
};

Now the struct with union can be used just as the original struct
would, and also accessed as an array via 'members_array'.

Sadly there is no easy way to determine the number of elements
for 'members_array' automatically. It's easy to check it after
the fact, and should be, preferably with a static check. But
those are details I expect you can look after without difficulty.

Tim Rentsch

unread,

Nov 25, 2015, 1:39:10 PM11/25/15

to

Eric Sosman <eso...@comcast-dot-net.invalid> writes:

> On 11/23/2015 2:22 PM, Ian Pilcher wrote:
>> On 11/23/2015 10:43 AM, Eric Sosman wrote:
>>> [...]
>>> It seems to me that once you've ensured there's no padding --
>>> which you ought to do in any case -- the union doesn't buy you much:
>>> Just cast `hdr' to a `uint64_t*' and blaze away. This also has the
>>> virtue of avoiding the (highly unlikely) alignment problem, because
>>> the struct alignment is at least good enough for a uint64_t.
>>
>> For some reason, I thought that I wasn't allowed to do that. I have
>> vague memories of gcc warnings about type punning and strict aliasing.
>
> You're "lying to the compiler" whether you transform the
> struct pointer to a union pointer or to a uint64_t pointer, and
> I see nothing about the first lie that would make it any less

> sinful than the second. [...]

That's true in the first case, but not the second: the conversion
is guaranteed to work under 6.7.2.1 p15.

Next the question arises as to whether that pointer may be used
to index all the elements. This gets us into a murky area. For
example, if the struct is in (and starts at) memory allocated by
malloc(), then we are allowed to use a pointer to the start of
the malloc()'ed block to index an array of any kind of object.
Since all the accesses use type uint64_t, there is no violation
of effective type rules. I don't know how to get a clearcut
answer to this question; certainly not just by reading the
Standard.

In any case I agree with your recommendation to convert directly
to the uint64_t pointer type, and avoid using a pointer-to-union
type.

Richard Damon

unread,

Nov 25, 2015, 2:02:35 PM11/25/15

to

Can't you give the struct a tag, and use sizeof the tag (and divide by
sizeof uint64_t)

The one issue is the standard doesn't promise that struct members won't
have padding between members, even when all the same type (there
shouldn't be a need for one, but nothing prohibits it).

Eric Sosman

unread,

Nov 25, 2015, 2:17:40 PM11/25/15

to

On 11/25/2015 1:38 PM, Tim Rentsch wrote:
> Eric Sosman <eso...@comcast-dot-net.invalid> writes:
>
>> On 11/23/2015 2:22 PM, Ian Pilcher wrote:
>>> On 11/23/2015 10:43 AM, Eric Sosman wrote:
>>>> [...]
>>>> It seems to me that once you've ensured there's no padding --
>>>> which you ought to do in any case -- the union doesn't buy you much:
>>>> Just cast `hdr' to a `uint64_t*' and blaze away. This also has the
>>>> virtue of avoiding the (highly unlikely) alignment problem, because
>>>> the struct alignment is at least good enough for a uint64_t.
>>>
>>> For some reason, I thought that I wasn't allowed to do that. I have
>>> vague memories of gcc warnings about type punning and strict aliasing.
>>
>> You're "lying to the compiler" whether you transform the
>> struct pointer to a union pointer or to a uint64_t pointer, and
>> I see nothing about the first lie that would make it any less
>> sinful than the second. [...]
>
> That's true in the first case, but not the second: the conversion
> is guaranteed to work under 6.7.2.1 p15.

I think we're referring to different lies -- or, I'm applying
the single word "lie" to two different abuses:

- Converting the struct* to a union* might not work at all, so
applying the cast -- telling the compiler "Perform this valid
conversion" -- is lying to the compiler.

- Converting the struct* to a uint64_t* will work, no question
(for any struct whose first element is a uint64_t). The "lie"
in this case is using the converted pointer to access not only
the first element (perfectly all right), but to access later
elements as if the struct were an array of uint64_t.

So: I'll 'fess up to loose use of "lie" to cover different abuses,
but I'm allowed that liberty under the terms of my artistic lie-cense.

> Next the question arises as to whether that pointer may be used
> to index all the elements. This gets us into a murky area. For
> example, if the struct is in (and starts at) memory allocated by
> malloc(), then we are allowed to use a pointer to the start of
> the malloc()'ed block to index an array of any kind of object.
> Since all the accesses use type uint64_t, there is no violation
> of effective type rules. I don't know how to get a clearcut
> answer to this question; certainly not just by reading the
> Standard.

The Standard will tell us it's OK to manipulate an array of
uint64_t within the memory allocated for the struct, so long as we
don't step outside the allocation. But the Standard will also tell
us we might be manipulating nameless padding bytes that do not
belong to any of the struct's elements, and that's where the
trouble arises.

> In any case I agree with your recommendation to convert directly
> to the uint64_t pointer type, and avoid using a pointer-to-union
> type.

Right. The union gains nothing, and could even lose a smidgen.
Cut to the chase, I say.

Ian Collins

unread,

Nov 25, 2015, 2:59:58 PM11/25/15

to

supe...@casperkitty.com wrote:
> On Tuesday, November 24, 2015 at 11:28:29 PM UTC-6, Ian Collins wrote:
>> The language doesn't because it can't.
>>
>> Some targets simply don't support misaligned access and others support
>> it at great cost (a trap in the case of SPARC). On those that do,
>> compiler extensions (such as those offered by gcc) are the recognised
>> and appropriate solution. I would argue that it is important for the
>> programmer to be aware that they are entering the realm of
>> implementation defined behaviour in this case.
>
> If "foo" was defined as a pointer to a big-endian 32-bit integer stored using
> 8 bits in each of four "char" values, what processor would not be able to, at
> worst, accommodate a compiler replacing "unsigned long l = *foo;" with
>
> unsigned char *p = foo;
> l = (p[1] << 8) | p[3] | ((unsigned long)p[2] << 16) |
> ((unsigned long)p[0] << 24);

I'm sure any compiler could do that, but it would have to do so for any
assignment through a pointer to a a type other than (unsigned) char
unless it could prove the address was correctly aligned. An alternative
is to trap on an access violation and perform the access byte by byte
(as on SPARC). Neither should happen without the programmer making the
choice (as they do on SPARC).

--
Ian Collins

supe...@casperkitty.com

unread,

Nov 25, 2015, 4:01:42 PM11/25/15

to

On Wednesday, November 25, 2015 at 1:59:58 PM UTC-6, Ian Collins wrote:
> I'm sure any compiler could do that, but it would have to do so for any
> assignment through a pointer to a a type other than (unsigned) char
> unless it could prove the address was correctly aligned. An alternative
> is to trap on an access violation and perform the access byte by byte
> (as on SPARC). Neither should happen without the programmer making the
> choice (as they do on SPARC).

Perhaps you misunderstood my intention. The idea was not to say that all
pointers to integer types should tolerate unaligned loads, but rather that
there should be a means of defining types with precise layout and alignment
specifications, and that accesses *made through pointers to such types*
should do whatever is necessary to accommodate them. Generally code should
try to minimize use of such types within tight loops, but if it's necessary
to operate upon a large foreign-layout data structure that consists of lots
of little pieces, it may sometimes be faster to operate on the structure in
its native format than to convert to a native representation, process
everything, and then convert back, especially if much of the data will be
untouched.

In cases where a small embedded processor and a PC will exchange data in the
native format of the embedded processor, the embedded processor may not be
able to afford the code or CPU time necessary to manipulate everything using
character types, but the PC certainly would in most cases easily be able to
afford both. I would suggest that even code designed around the embedded
systems' storage formats would be much less efficient on the PC than would
code that was designed around the PC's storage formats, there would still be
great value in being able to have the PC "leisurely stroll" the same code
that the embedded system could run.

Ian Collins

unread,

Nov 25, 2015, 6:58:21 PM11/25/15

to

supe...@casperkitty.com wrote:
> On Wednesday, November 25, 2015 at 1:59:58 PM UTC-6, Ian Collins wrote:
>> I'm sure any compiler could do that, but it would have to do so for any
>> assignment through a pointer to a a type other than (unsigned) char
>> unless it could prove the address was correctly aligned. An alternative
>> is to trap on an access violation and perform the access byte by byte
>> (as on SPARC). Neither should happen without the programmer making the
>> choice (as they do on SPARC).
>
> Perhaps you misunderstood my intention. The idea was not to say that all
> pointers to integer types should tolerate unaligned loads, but rather that
> there should be a means of defining types with precise layout and alignment
> specifications, and that accesses *made through pointers to such types*
> should do whatever is necessary to accommodate them.

I see where you are coming from, but I don't think your suggestion goes
far enough. Consider passing the address of second to f() in this snippet:

packed struct S
{
uint8_t first;
uint32_t second;
};

void f( uint32_t* );

<snip>

> In cases where a small embedded processor and a PC will exchange data in the
> native format of the embedded processor, the embedded processor may not be
> able to afford the code or CPU time necessary to manipulate everything using
> character types, but the PC certainly would in most cases easily be able to
> afford both. I would suggest that even code designed around the embedded
> systems' storage formats would be much less efficient on the PC than would
> code that was designed around the PC's storage formats, there would still be
> great value in being able to have the PC "leisurely stroll" the same code
> that the embedded system could run.

There is - compiler extensions! Supporting misaligned access impacts
the entire program, not just specific accesses.

I have done a lot of work similar to what you describe above as well as
running parts of the embedded code on a workstation host. My solution
on SPARC (which can't do packed structs) was to compile the embedded
code as C++ and use customised types for the stricture members to handle
the packing an unpacking. A more general solution would be to use
opaque types and accessor functions.

At the time, enabling misaligned access on the SPARC build cased the
code to run slower than it did on the embedded target....

--
Ian Collins

Jorgen Grahn

unread,

Nov 25, 2015, 7:07:06 PM11/25/15

to

On Tue, 2015-11-24, Eric Sosman wrote:
> On 11/24/2015 4:58 PM, David Brown wrote:
>> [... "packed" structs and similar underhanded artifices ...]
>> Also explicit packing "dummies" have the advantage that there is no
>> chance of undefined, implementation-defined, unspecified or generally
>> unexpected or undesired behaviour when they are accessed. (When storing
>> something in a struct, padding bytes get unspecified values.) This is
>> particularly important if you want to compare structs, or do something
>> like calculate a checksum over them.
>
> It's been my experience that these questions of packing and so on
> nearly always arise in connection with an externally-imposed format of
> some kind: You want to use a struct to represent an IP header or a
> disk partition table entry or something of that sort. If that's the
> situation, I've almost always found that trying to sculpt the struct
> into the desired shape is inferior to just letting the struct take its
> own preferred form and using a mediating buffer to translate to and
> from the imposed format. Use separate "native" and "serialized" forms,
> and you're quite likely to be better off than by trying to force both
> forms into one object.

When I wrote "I prefer reading uint8_t by uint8_t from a buffer in
most cases where I could have used a struct 'overlay'" upthread, I was
referring to the first part of that.

I'm not so sure about the second part -- having

struct Foo deserialise(source);
void serialise(struct Foo, target);

In many cases, yes. But in other cases I find that the code
/on top of/ that gets nicer if I use some other approach. For
example, the data may not map well to a struct. And that's not
primarily an efficiency argument.

supe...@casperkitty.com

unread,

Nov 25, 2015, 7:44:59 PM11/25/15

to

On Wednesday, November 25, 2015 at 5:58:21 PM UTC-6, Ian Collins wrote:
> I see where you are coming from, but I don't think your suggestion goes
> far enough. Consider passing the address of second to f() in this snippet:
>

> __packed struct S

> {
> uint8_t first;
> uint32_t second;
> };

> void f( uint32_t* );

With regard to the above code, I'd say that the type of &thing.second should
be "uint32_t __packed *", which is not compatible with "uint32_t*", but could
be safely passed to "f" if the function had been declared

void f( uint32_t __packed* p);

or

void f( uint32_t __packed* restrict p);

Of course, declaring the argument type as "uint32_t __packed *" rather than
"uint32_t*" would degrade performance, especially in the absence of a
restrict qualifier, but (especially with "restrict") it may still be able
to outperform any user-written code for the same purpose. Consider, for
example, that if the type is little-endian, a compiler could optimize
"*p |= 768;" to simply "((uint8_t*)p)[1] |= 3;" whereas user code which
wasn't designed around the value being a multiple of 256 would more likely
have to read all four bytes, do the "or", and then write all four bytes
back, and a compiler would be far less likely to notice that it could skip
three of the reads and three of the writes.

> There is - compiler extensions! Supporting misaligned access impacts
> the entire program, not just specific accesses.

If the small system and big system alike both used new types whose storage
format and behavior were specified in precise terms, why would the existence
of such types need to affect any code using normal types on the PC?

> I have done a lot of work similar to what you describe above as well as
> running parts of the embedded code on a workstation host. My solution
> on SPARC (which can't do packed structs) was to compile the embedded
> code as C++ and use customised types for the stricture members to handle
> the packing an unpacking. A more general solution would be to use
> opaque types and accessor functions.

I've done something similar so I could run the same code on both the embedded
target using C, and on a PC using C++; unfortunately, the two languages are
diverging from each other rather than converging.

> At the time, enabling misaligned access on the SPARC build cased the
> code to run slower than it did on the embedded target....

Was it slower because the code was trying to do unaligned reads which then
generated traps, or because the code was using sequences of byte reads and
putting them together?

Ian Collins

unread,

Nov 25, 2015, 8:13:05 PM11/25/15

to

supe...@casperkitty.com wrote:
> On Wednesday, November 25, 2015 at 5:58:21 PM UTC-6, Ian Collins wrote:
>> I see where you are coming from, but I don't think your suggestion goes
>> far enough. Consider passing the address of second to f() in this snippet:
>>
>> __packed struct S
>> {
>> uint8_t first;
>> uint32_t second;
>> };
>
>> void f( uint32_t* );
>
> With regard to the above code, I'd say that the type of &thing.second should
> be "uint32_t __packed *", which is not compatible with "uint32_t*", but could
> be safely passed to "f" if the function had been declared
>
> void f( uint32_t __packed* p);
>
> or
>
> void f( uint32_t __packed* restrict p);

But that would preclude calling any (library) function that didn't have
the qualification. That's why I said supporting misaligned access
impacts the entire program.

<snip>

>> There is - compiler extensions! Supporting misaligned access impacts
>> the entire program, not just specific accesses.
>
> If the small system and big system alike both used new types whose storage
> format and behavior were specified in precise terms, why would the existence
> of such types need to affect any code using normal types on the PC?

Because misaligned pointers can leak.

>> I have done a lot of work similar to what you describe above as well as
>> running parts of the embedded code on a workstation host. My solution
>> on SPARC (which can't do packed structs) was to compile the embedded
>> code as C++ and use customised types for the stricture members to handle
>> the packing an unpacking. A more general solution would be to use
>> opaque types and accessor functions.
>
> I've done something similar so I could run the same code on both the embedded
> target using C, and on a PC using C++; unfortunately, the two languages are
> diverging from each other rather than converging.

That depends on how much of the "new" stuff in C99 you use :)

>> At the time, enabling misaligned access on the SPARC build cased the
>> code to run slower than it did on the embedded target....
>
> Was it slower because the code was trying to do unaligned reads which then
> generated traps, or because the code was using sequences of byte reads and
> putting them together?

The former.

--
Ian Collins

supe...@casperkitty.com

unread,

Nov 26, 2015, 12:34:53 AM11/26/15

to

On Wednesday, November 25, 2015 at 7:13:05 PM UTC-6, Ian Collins wrote:
> But that would preclude calling any (library) function that didn't have
> the qualification. That's why I said supporting misaligned access
> impacts the entire program.

And I'd say that library functions would be limited to using normal pointers;
I'm curious for which library problems you think that would be a problem.
if one wants to read a number from the a file into a bitfield, one can't use
fscanf to load data into the bit-field directly--one must read it into a
temporary variable and then copy that manually. How many library routines
are there that accept a non-character pointer type and write a lot of data
to it?

> Because misaligned pointers can leak.

Only if the type system allows them to be given to code that expects normal
pointers.

> That depends on how much of the "new" stuff in C99 you use :)

The languages also have different rules about things like aliasing and
various other oddities.

>> Was it slower because the code was trying to do unaligned reads which then
>> generated traps, or because the code was using sequences of byte reads and
>> putting them together?
>
> The former.

Well don't do that. >:*3 If one tells the compiler to *expect* unaligned
data (possible in at least some compilers, and IMHO should be defined in the
Standard even if compilers were allowed to refuse compilation of programs
specifying it) the performance impact would be greatly reduced to be no
worse than (and in many cases not as bad as) the cost of user code building
up larger types from smaller ones.

Malcolm McLean

unread,

Nov 26, 2015, 12:55:48 AM11/26/15

to

On Thursday, November 26, 2015 at 5:34:53 AM UTC, supe...@casperkitty.com wrote:
> On Wednesday, November 25, 2015 at 7:13:05 PM UTC-6, Ian Collins wrote:
> > But that would preclude calling any (library) function that didn't have
> > the qualification. That's why I said supporting misaligned access
> > impacts the entire program.
>
> And I'd say that library functions would be limited to using normal
> pointers;
> I'm curious for which library problems you think that would be a problem.
> if one wants to read a number from the a file into a bitfield, one can't > use fscanf to load data into the bit-field directly--one must read it
> into a temporary variable and then copy that manually. How many library
> routines are there that accept a non-character pointer type and write a
> lot of data to it?
>

memcpy takes a void *, which is only rarely a string buffer (strcpy()
is used instead), and is even more rarely not 4 byte aligned.
But it has to support any combination of size and alignment. It's
easy enough to patch a compiler to insert fast copies, although
for a long time you often got a simple byte-copying loop.

However often the reason memcpy() is called is because the data has
been abstracted away and passed in as a void *. So it can be hard
to apply the patch.

Ian Collins

unread,

Nov 26, 2015, 2:51:59 AM11/26/15

to

supe...@casperkitty.com wrote:
> On Wednesday, November 25, 2015 at 7:13:05 PM UTC-6, Ian Collins wrote:
>> But that would preclude calling any (library) function that didn't have
>> the qualification. That's why I said supporting misaligned access
>> impacts the entire program.
>
> And I'd say that library functions would be limited to using normal pointers;
> I'm curious for which library problems you think that would be a problem.
> if one wants to read a number from the a file into a bitfield, one can't use
> fscanf to load data into the bit-field directly--one must read it into a
> temporary variable and then copy that manually. How many library routines
> are there that accept a non-character pointer type and write a lot of data
> to it?

You don't have to. A single read or write is all that is required. If
I understand your concept correctly and you have a new qualifier, you
will need to copy data between qualified and unqualified pointers, which
is little different from copying data to and from packed external
message formats.

>> Because misaligned pointers can leak.
>
> Only if the type system allows them to be given to code that expects normal
> pointers.

See above.

>> That depends on how much of the "new" stuff in C99 you use :)
>
> The languages also have different rules about things like aliasing and
> various other oddities.

None of which make any practical difference in normal code. I mix C and
C++ often and I must admit I would have to go looking for any such
differences.

>>> Was it slower because the code was trying to do unaligned reads which then
>>> generated traps, or because the code was using sequences of byte reads and
>>> putting them together?
>>
>> The former.
>
> Well don't do that. >:*3 If one tells the compiler to *expect* unaligned
> data (possible in at least some compilers, and IMHO should be defined in the
> Standard even if compilers were allowed to refuse compilation of programs
> specifying it) the performance impact would be greatly reduced to be no
> worse than (and in many cases not as bad as) the cost of user code building
> up larger types from smaller ones.

Compilers for machines such as SPARC would have to refuse due to
hardware restrictions. So you don't gain any portability benefits over
compiler extensions.

--
Ian Collins

supe...@casperkitty.com

unread,

Nov 26, 2015, 2:39:43 PM11/26/15

to

On Thursday, November 26, 2015 at 1:51:59 AM UTC-6, Ian Collins wrote:
> You don't have to. A single read or write is all that is required. If
> I understand your concept correctly and you have a new qualifier, you
> will need to copy data between qualified and unqualified pointers, which
> is little different from copying data to and from packed external
> message formats.

Requiring code to occasionally use a temporary variable when exchanging data
between a library routine which wants a pointer to a native integer and a
data type that holds something else would be less bad than having to do
translation all the time. Besides, in cases where the called routine applies
a restrict qualifier (which most library routines actually should) a compiler
could safely generate and use the temporary variable automatically if
needed, or could pass the address of a compiler-generated temporary if the
compiler needs one for every purposes.

BTW, it's too bad there's no variation of "restrict" which would conveniently
indicate that two (or more) parameters may alias each other but will not
alias anything else in the universe that is used during function execution.
If memmove() had restrict-qualified parameters, it could actually behave just
as it does now while allowing callers to make more optimizations than are
presently possible, though implementations with which have weak pointer-
overlap detection could not be as efficient as they can be without the
qualifier (those which either have no ability to use relational operators
with pointers, or those where relational operators can check for arbitrary
pointer-region overlap, could be just as efficient as they are now, even
while abiding by the rules of "restrict").

Philip Lantz

unread,

Nov 26, 2015, 4:06:18 PM11/26/15

to

If you give the struct a tag, then it becomes a type definition instead
of an anonymous struct; that is, its members do not become members of
the outer struct/union; so you would also have to make the struct a
named member instead of anonymous, and that name would have to be used
at every reference.

Richard Damon

unread,

Nov 26, 2015, 4:09:54 PM11/26/15

to

Ok, haven't used this feature enough to understand all the subtleties.

Tim Rentsch

unread,

Nov 27, 2015, 4:08:20 PM11/27/15

to

Richard Damon <Ric...@Damon-Family.org> writes:

> On 11/25/15 10:10 AM, Tim Rentsch wrote:
>

>> [...] this looks like a good

>> candidate for anonymous structs and unions:
>>
>> struct header {
>> union {
>> struct {
>> uint64_t magic;
>> uint64_t version;
>> uint64_t size;
>> uint64_t uuid_lo;
>> uint64_t uuid_hi;
>> uint64_t type;
>> uint64_t offset;
>> uint64_t size;
>> uint64_t cksum;
>> };
>> uint64_t members_array[9];
>> };
>> };
>>
>> Now the struct with union can be used just as the original struct
>> would, and also accessed as an array via 'members_array'.
>>
>> Sadly there is no easy way to determine the number of elements
>> for 'members_array' automatically. It's easy to check it after
>> the fact, and should be, preferably with a static check. But
>> those are details I expect you can look after without difficulty.
>
> Can't you give the struct a tag, and use sizeof the tag (and
> divide by sizeof uint64_t)

I discovered while constructing the example what Philip Lantz
explained - anonymous struct's can't have tags. Sigh.

> The one issue is the standard doesn't promise that struct members
> won't have padding between members, even when all the same type
> (there shouldn't be a need for one, but nothing prohibits it).

Right. I didm't mention this because I thought it was obvious.

Tim Rentsch

unread,

Nov 27, 2015, 4:16:25 PM11/27/15

to

I agree that using the pointer to access an array is, or at least
might be, a lie. My point is that it is not a lie to the compiler.

>> Next the question arises as to whether that pointer may be used
>> to index all the elements. This gets us into a murky area. For
>> example, if the struct is in (and starts at) memory allocated by
>> malloc(), then we are allowed to use a pointer to the start of
>> the malloc()'ed block to index an array of any kind of object.
>> Since all the accesses use type uint64_t, there is no violation
>> of effective type rules. I don't know how to get a clearcut
>> answer to this question; certainly not just by reading the
>> Standard.
>
> The Standard will tell us it's OK to manipulate an array of
> uint64_t within the memory allocated for the struct, so long as we
> don't step outside the allocation. But the Standard will also
> tell us we might be manipulating nameless padding bytes that do
> not belong to any of the struct's elements, and that's where the
> trouble arises.

Right. I'm assuming there will be a check to make sure that
situation doesn't occur; I didn't bring it up because I thought
it was obvious.