Alignment requirements for _Atomic should be stated

131 views
Skip to first unread message

Jonathan Wakely

unread,
Dec 19, 2018, 9:21:05 AM12/19/18
to IA32 System V Application Binary Interface
The ABI doesn't say whether _Atomic variables should use the same alignment as non-atomic ones. This matters for cases like 64-bit integers on IA32, which might only be 4-byte aligned but cannot be lock-free unless 8-byte aligned.

Problem cases include:

_Atomic double d;
_atomic struct { long long x; };
_Atomic struct { int a,b; };

All of these require 8-byte alignment to be lock-free, but the ABI doesn't enforce that.

There is implementation divergence between GCC and Clang, because the psABI doesn't say what to do.

Jonathan Wakely

unread,
Dec 19, 2018, 9:23:32 AM12/19/18
to IA32 System V Application Binary Interface

Thiago Macieira

unread,
Dec 19, 2018, 1:19:09 PM12/19/18
to ia32...@googlegroups.com
And https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71660 and probably more.

I agree it needs to specify what to do. I'd say it's pretty clear that _Atomic
primitives need to get their alignment increased, so they do what is expected
by the programmer.

Whether your atomic structs above also should be over-aligned or whether they
should be externally locked is a different story. The simplest might be to say
that on x86 psABI, anything above sizeof 4 requires external locking, except
for _Atomic long long and _Atomic double..

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center



H.J. Lu

unread,
Dec 21, 2018, 12:55:11 PM12/21/18
to IA32 System V Application Binary Interface
GCC and Clang differ on this. How exactly should ia32 pABI be updated?

--
H.J.

Florian Weimer

unread,
Jan 2, 2019, 3:17:52 AM1/2/19
to H.J. Lu, IA32 System V Application Binary Interface
* H. J. Lu:

> GCC and Clang differ on this. How exactly should ia32 pABI be updated?

I believe _Atomic should change alginment, so the clang (and libstdc++
std::atomic) behavior is the right one.

Thanks,
Florian

H.J. Lu

unread,
Jan 2, 2019, 9:48:35 AM1/2/19
to Florian Weimer, IA32 System V Application Binary Interface
There are 2 issues:

1. How should __atomic_load work with unaligned object?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87237

Do we need to update psABI for this?

2. How should _Atomic impact alignment of type/object?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146

We can update psABI to specific that _Atomic changes alignments of long long
and double to 8 bytes. What should we do with long double?

--
H.J.

Florian Weimer

unread,
Jan 2, 2019, 9:58:28 AM1/2/19
to H.J. Lu, IA32 System V Application Binary Interface
* H. J. Lu:

> On Wed, Jan 2, 2019 at 12:17 AM Florian Weimer <fwe...@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > GCC and Clang differ on this. How exactly should ia32 pABI be updated?
>>
>> I believe _Atomic should change alginment, so the clang (and libstdc++
>> std::atomic) behavior is the right one.
>>
>> Thanks,
>> Florian
>
> There are 2 issues:
>
> 1. How should __atomic_load work with unaligned object?
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87237
>
> Do we need to update psABI for this?

Personally, I think that's undefined. Is there anything else we could
do?

We could provide a diagnostic if we know at compile time that the
alignment is insufficient. But in most cases (notably the i386 long
long case), that won't be possible. -fsanitize=undefined could have a
check for this, though.

> 2. How should _Atomic impact alignment of type/object?
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146
>
> We can update psABI to specific that _Atomic changes alignments of long long
> and double to 8 bytes.

Again I don't think we do not have much choice there.

> What should we do with long double?

Alignment is increased to 16 bytes?

Thanks,
Florian

Joseph Myers

unread,
Jan 2, 2019, 10:57:00 AM1/2/19
to IA32 System V Application Binary Interface, H.J. Lu
On Wed, 2 Jan 2019, Florian Weimer wrote:

> > What should we do with long double?
>
> Alignment is increased to 16 bytes?

That would imply increasing its size (since it's 12-byte for IA32), which
is not something _Atomic does for any type with GCC at present (and there
have been suggestions that the permission for an atomic type to be larger
than the corresponding non-atomic type should be removed from the C
standard, since it was intended for locks included directly in the atomic
object, which isn't an implementation approach used in practice).

--
Joseph S. Myers
jos...@codesourcery.com

Thiago Macieira

unread,
Jan 2, 2019, 11:01:37 AM1/2/19
to ia32...@googlegroups.com, H.J. Lu, Florian Weimer
On Wednesday, 2 January 2019 12:47:59 -02 H.J. Lu wrote:
> There are 2 issues:
>
> 1. How should __atomic_load work with unaligned object?
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87237

I'd say that for any type that the compiler can't guarantee at compile time
that it will be sufficiently aligned for an atomic instruction, it should
offload to the library and let the library decide. That includes all packed
types.

Florian Weimer

unread,
Jan 2, 2019, 11:05:39 AM1/2/19
to Joseph Myers, IA32 System V Application Binary Interface, H.J. Lu
* Joseph Myers:
Oh. With that future trajectory in mind, we should perhaps make it an
error.

The __atomic built-ins already do not support long double AFAICS.

Thanks,
Florian

Joseph Myers

unread,
Jan 2, 2019, 11:08:51 AM1/2/19
to IA32 System V Application Binary Interface, H.J. Lu
On Wed, 2 Jan 2019, Florian Weimer wrote:

> The __atomic built-ins already do not support long double AFAICS.

It's handled exactly the same way as any atomic type that isn't (1, 2, 4,
8, 16) bytes in size, i.e. calling a libatomic function that uses locks
(and the GCC testsuite verifies that atomic operations on _Atomic long
double and _Atomic _Complex long double work properly).

Florian Weimer

unread,
Jan 2, 2019, 11:10:04 AM1/2/19
to Thiago Macieira, ia32...@googlegroups.com, H.J. Lu
* Thiago Macieira:

> On Wednesday, 2 January 2019 12:47:59 -02 H.J. Lu wrote:
>> There are 2 issues:
>>
>> 1. How should __atomic_load work with unaligned object?
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87237
>
> I'd say that for any type that the compiler can't guarantee at compile time
> that it will be sufficiently aligned for an atomic instruction, it should
> offload to the library and let the library decide. That includes all packed
> types.

We should really avoid any solution which makes __atomic built-ins or
<stdatomic.h> unusable for performance reasons. It would force
programmers to go back to inline assembly, and we absolutely and
categorically do not want that.

Would the above rule require to call into the library for an atomic load
of an int variable which is potentially not 4-byte-aligned?

Thanks,
Florian

Florian Weimer

unread,
Jan 2, 2019, 11:18:05 AM1/2/19
to Joseph Myers, IA32 System V Application Binary Interface, H.J. Lu
* Joseph Myers:

> On Wed, 2 Jan 2019, Florian Weimer wrote:
>
>> The __atomic built-ins already do not support long double AFAICS.
>
> It's handled exactly the same way as any atomic type that isn't (1, 2, 4,
> 8, 16) bytes in size, i.e. calling a libatomic function that uses locks
> (and the GCC testsuite verifies that atomic operations on _Atomic long
> double and _Atomic _Complex long double work properly).

Hmm. I tried this:

long double
f (long double *p)
{
return __atomic_load_n (p, __ATOMIC_ACQUIRE);
}

And I get:

t.c: In function ‘f’:
t.c:4:3: error: operand type ‘long double *’ is incompatible with argument 1 of ‘__atomic_load_n’
return __atomic_load_n (p, __ATOMIC_ACQUIRE);
^~~~~~

GCC 8.2, x86-64 and i386, with and without -mcx16.

Thanks,
Florian

Thiago Macieira

unread,
Jan 2, 2019, 11:39:28 AM1/2/19
to Florian Weimer, ia32...@googlegroups.com, H.J. Lu
On Wednesday, 2 January 2019 14:10:00 -02 Florian Weimer wrote:
> We should really avoid any solution which makes __atomic built-ins or
> <stdatomic.h> unusable for performance reasons. It would force
> programmers to go back to inline assembly, and we absolutely and
> categorically do not want that.

Agreed, but why would there be a performance reason for correct code? The only
case I can think of is when the compiler doesn't know that a type will be
properly aligned by external means. We ought to teach people to add the
necessary alignas() or __builtin_assume_aligned() so the compiler will know
the detail and thus generate the optimal code.

> Would the above rule require to call into the library for an atomic load
> of an int variable which is potentially not 4-byte-aligned?

For a 4-byte sized variable that is *under-aligned*, yes. For it to be
properly atomic, even if straddling two cachelines, it requires an external
mutex. The compiler could inline the boundary detection if it wanted too (it's
three instructions), or not. It would be up to the implementation.

Ditto for __attribute__((packed)) int: if someone went through the trouble of
writing the attribute, they meant for the compiler to use a mutex.

For an *int*, it's UB to be under-aligned.

One thing comes to mind: complex atomics. Should their alignment be doubled
too? Like _Atomic _Complex short which is 4 bytes also getting aligned to 4
bytes, thus avoiding the under-alignment I mentioned above. And if we go this
way, should we also do the same for all custom types, like
std::pair<short,short>?

How about _Atomic _Complex double going to 16-byte alignment? That can be
atomically loaded even on 32-bit using MOVPD, but you can't guarantee SSE
support at runtime. Suggestion: if targetting -msse or higher, use MOVPD; if
not, call the __atomic_load library function and that performs a CPU check.

Joseph Myers

unread,
Jan 2, 2019, 12:04:08 PM1/2/19
to IA32 System V Application Binary Interface, H.J. Lu
On Wed, 2 Jan 2019, Florian Weimer wrote:

> Hmm. I tried this:
>
> long double
> f (long double *p)
> {
> return __atomic_load_n (p, __ATOMIC_ACQUIRE);
> }

You need to use the generic __atomic_load (or stdatomic.h). To quote
extend.texi:

The @samp{__atomic} builtins can be used with any integral scalar or
pointer type that is 1, 2, 4, or 8 bytes in length. 16-byte integral
types are also allowed if @samp{__int128} (@pxref{__int128}) is
supported by the architecture.

The four non-arithmetic functions (load, store, exchange, and
compare_exchange) all have a generic version as well. This generic
version works on any data type. It uses the lock-free built-in function
if the specific data type size makes that possible; otherwise, an
external call is left to be resolved at run time. This external call is
the same format with the addition of a @samp{size_t} parameter inserted
as the first parameter indicating the size of the object being pointed to.
All objects must be the same size.

(N2329 proposes adding extra stdatomic.h operations, and supporting
existing operations for more types, so that you can get e.g. the
equivalent of the atomic compound assignment operations with any memory
order rather than just seq_cst as at present. Implementing those
efficiently would mean new __atomic*fetch* built-in functions / argument
types supported for existing such functions, thereby e.g. making the
special handling of floating-point exceptions and rounding modes available
through built-in functions.)

Joseph Myers

unread,
Jan 2, 2019, 12:05:48 PM1/2/19
to ia32...@googlegroups.com, Florian Weimer, H.J. Lu
On Wed, 2 Jan 2019, Thiago Macieira wrote:

> One thing comes to mind: complex atomics. Should their alignment be doubled
> too? Like _Atomic _Complex short which is 4 bytes also getting aligned to 4

That's the existing rule in GCC for any type that's 2/4/8/16 bytes, when
_Atomic is specified with it. (With the bug we're discussing that this
currently fails to apply to double / long long inside structs on IA32.)

Florian Weimer

unread,
Jan 2, 2019, 1:42:34 PM1/2/19
to Thiago Macieira, ia32...@googlegroups.com, H.J. Lu
* Thiago Macieira:

> On Wednesday, 2 January 2019 14:10:00 -02 Florian Weimer wrote:
>> We should really avoid any solution which makes __atomic built-ins or
>> <stdatomic.h> unusable for performance reasons. It would force
>> programmers to go back to inline assembly, and we absolutely and
>> categorically do not want that.
>
> Agreed, but why would there be a performance reason for correct code?
> The only case I can think of is when the compiler doesn't know that a
> type will be properly aligned by external means. We ought to teach
> people to add the necessary alignas() or __builtin_assume_aligned() so
> the compiler will know the detail and thus generate the optimal code.

On i386, it's unclear what kind of optimizations a cmpiler can perform
for non-naturally-aligned types, particurlarly -fno-strict-aliasing.

>> Would the above rule require to call into the library for an atomic load
>> of an int variable which is potentially not 4-byte-aligned?
>
> For a 4-byte sized variable that is *under-aligned*, yes. For it to be
> properly atomic, even if straddling two cachelines, it requires an
> external mutex. The compiler could inline the boundary detection if it
> wanted too (it's three instructions), or not. It would be up to the
> implementation.
>
> Ditto for __attribute__((packed)) int: if someone went through the trouble of
> writing the attribute, they meant for the compiler to use a mutex.
>
> For an *int*, it's UB to be under-aligned.

So if you a have struct with __attribute__ ((packed)) and take an
address of an int member, you expect it to get a type of int
__attribute__ ((aligned (1))) *? I don't think this is how GCC works
today. Instead, there is -Waddress-of-packed-member. See this bug:

<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628>

There hasn't been much movement to change the GCC type system to track
pointer alignment, if I read the bug correctly.

I'm worried the run-time check is the only realistic way to implement
what you want, given this constraint. And I really don't think we
should emit a run-time check.

> One thing comes to mind: complex atomics. Should their alignment be doubled
> too? Like _Atomic _Complex short which is 4 bytes also getting aligned to 4
> bytes, thus avoiding the under-alignment I mentioned above.

> And if we go this way, should we also do the same for all custom
> types, like std::pair<short,short>?

We cannot do this for std::pair<short, short> because that would be an
ABI event. I think it would be possible for
std::atomic<std::pair<short, short>>, but at that point, we should start
talking to C++ people. 8-)

> How about _Atomic _Complex double going to 16-byte alignment? That can be
> atomically loaded even on 32-bit using MOVPD,

I've been told that this is not true; there are SSE2 implementations
which tear loads and stores. This is different from loading and storing
8-byte doubles with the FPU.

Thanks,
Florian

Thiago Macieira

unread,
Jan 2, 2019, 8:31:48 PM1/2/19
to Florian Weimer, ia32...@googlegroups.com, H.J. Lu
On Wednesday, 2 January 2019 16:42:30 -02 Florian Weimer wrote:
> * Thiago Macieira:
> > On Wednesday, 2 January 2019 14:10:00 -02 Florian Weimer wrote:
> >> We should really avoid any solution which makes __atomic built-ins or
> >> <stdatomic.h> unusable for performance reasons. It would force
> >> programmers to go back to inline assembly, and we absolutely and
> >> categorically do not want that.
> >
> > Agreed, but why would there be a performance reason for correct code?
> > The only case I can think of is when the compiler doesn't know that a
> > type will be properly aligned by external means. We ought to teach
> > people to add the necessary alignas() or __builtin_assume_aligned() so
> > the compiler will know the detail and thus generate the optimal code.
>
> On i386, it's unclear what kind of optimizations a cmpiler can perform
> for non-naturally-aligned types, particurlarly -fno-strict-aliasing.

I don't see how that affects anything. If the library functions being called
can accept unaligned pointers for a given size, then the functions must have a
codepath that uses mutexes. Whether they implement an additional check for
lock-free atomics if the pointers align is a QoI question. Since it's just
three instructions, two of them macro-fusing (CMP+JBE), I expect it to be a
clear choice for implementations.

Anyway, let's take this struct for the remainder of the discussion:
struct S { short s1, s2; };
// sizeof(S) == 4
// alignof(S) == 2

> >> Would the above rule require to call into the library for an atomic load
> >> of an int variable which is potentially not 4-byte-aligned?
> >
> > For a 4-byte sized variable that is *under-aligned*, yes. For it to be
> > properly atomic, even if straddling two cachelines, it requires an
> > external mutex. The compiler could inline the boundary detection if it
> > wanted too (it's three instructions), or not. It would be up to the
> > implementation.

This was an example of that struct S above: it's 4 bytes but under-aligned for
atomicity. The compiler can't guarantee that it is properly aligned to use MOV
loads and stores. That means it must generate a call to the unaligned, mutex-
locking library functions that load and store, with the optional alignment
detection code.

But if the code was:
alignas(int) S data;

Then the compiler does know it's sufficiently aligned and can use direct MOV.
*Provided* that the library implementation matches the behaviour if it got
called.

SImilarly for
S *ptr;
auto p = __builtin_assume_aligned(ptr, 4);

> > Ditto for __attribute__((packed)) int: if someone went through the trouble
> > of writing the attribute, they meant for the compiler to use a mutex.
> >
> > For an *int*, it's UB to be under-aligned.
>
> So if you a have struct with __attribute__ ((packed)) and take an
> address of an int member, you expect it to get a type of int
> __attribute__ ((aligned (1))) *? I don't think this is how GCC works
> today. Instead, there is -Waddress-of-packed-member. See this bug:
>
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628>
>
> There hasn't been much movement to change the GCC type system to track
> pointer alignment, if I read the bug correctly.

Yes, I'd expect that the compiler be able to or the developer be forced to
track the alignmend/packing of pointers it extracts from packed structures.
x86 may not need it, but other architectures do. I think it's a reasonable
thing to ask.

And if people don't, we have an UB that would only trigger on crossing a
cacheline. Maybe the code in question knows that won't happen -- it may be a
32-bit value 3 bytes into a file, which was mmap()ed and therefore known to
never cross a cacheline boundary.

> I'm worried the run-time check is the only realistic way to implement
> what you want, given this constraint. And I really don't think we
> should emit a run-time check.

What's your suggestion? That we always lock? Or that we never do?

> > One thing comes to mind: complex atomics. Should their alignment be
> > doubled
> > too? Like _Atomic _Complex short which is 4 bytes also getting aligned to
> > 4
> > bytes, thus avoiding the under-alignment I mentioned above.
> >
> > And if we go this way, should we also do the same for all custom
> > types, like std::pair<short,short>?
>
> We cannot do this for std::pair<short, short> because that would be an
> ABI event. I think it would be possible for
> std::atomic<std::pair<short, short>>, but at that point, we should start
> talking to C++ people. 8-)

I meant that. And for me, std::atomic<S> should behave the same as and have
the same size and alignment as _Atomic S. In fact, libc++ implements
std::atomic<S> by way of _Atomic S.

> > How about _Atomic _Complex double going to 16-byte alignment? That can be
> > atomically loaded even on 32-bit using MOVPD,

Note: I meant MOVAPD.

> I've been told that this is not true; there are SSE2 implementations
> which tear loads and stores. This is different from loading and storing
> 8-byte doubles with the FPU.

Well, it's probably not worth then. We could runtime-detect the CPU and match
-march= settings, but it's probably not worth the cost.

One more thing: what about interoperability with a 64-bit application running
in the same CPU? For what primitive types and for what user types do we
guarantee true atomic (lock-free) operations?

Florian Weimer

unread,
Jan 3, 2019, 8:09:57 AM1/3/19
to Thiago Macieira, ia32...@googlegroups.com, H.J. Lu
* Thiago Macieira:

> On Wednesday, 2 January 2019 16:42:30 -02 Florian Weimer wrote:
>> * Thiago Macieira:
>> > On Wednesday, 2 January 2019 14:10:00 -02 Florian Weimer wrote:
>> >> We should really avoid any solution which makes __atomic built-ins or
>> >> <stdatomic.h> unusable for performance reasons. It would force
>> >> programmers to go back to inline assembly, and we absolutely and
>> >> categorically do not want that.
>> >
>> > Agreed, but why would there be a performance reason for correct code?
>> > The only case I can think of is when the compiler doesn't know that a
>> > type will be properly aligned by external means. We ought to teach
>> > people to add the necessary alignas() or __builtin_assume_aligned() so
>> > the compiler will know the detail and thus generate the optimal code.
>>
>> On i386, it's unclear what kind of optimizations a cmpiler can perform
>> for non-naturally-aligned types, particurlarly -fno-strict-aliasing.
>
> I don't see how that affects anything. If the library functions being called
> can accept unaligned pointers for a given size, then the functions must have a
> codepath that uses mutexes. Whether they implement an additional check for
> lock-free atomics if the pointers align is a QoI question. Since it's just
> three instructions, two of them macro-fusing (CMP+JBE), I expect it to be a
> clear choice for implementations.

We are talking about compiler intrinsics that produce single
instructions. Currently, there is no function call overhead, and for
the most common memory orderings, there is not even a fencing
instruction on i386.

> Anyway, let's take this struct for the remainder of the discussion:
> struct S { short s1, s2; };
> // sizeof(S) == 4
> // alignof(S) == 2
>
>> >> Would the above rule require to call into the library for an atomic load
>> >> of an int variable which is potentially not 4-byte-aligned?
>> >
>> > For a 4-byte sized variable that is *under-aligned*, yes. For it to be
>> > properly atomic, even if straddling two cachelines, it requires an
>> > external mutex. The compiler could inline the boundary detection if it
>> > wanted too (it's three instructions), or not. It would be up to the
>> > implementation.
>
> This was an example of that struct S above: it's 4 bytes but under-aligned for
> atomicity. The compiler can't guarantee that it is properly aligned to use MOV
> loads and stores. That means it must generate a call to the unaligned, mutex-
> locking library functions that load and store, with the optional alignment
> detection code.
>
> But if the code was:
> alignas(int) S data;
>
> Then the compiler does know it's sufficiently aligned and can use direct MOV.
> *Provided* that the library implementation matches the behaviour if it got
> called.

Does this really work in C? struct S is not an atomic type, and I don't
see a way to perform an atomic load or store on it.

>> > Ditto for __attribute__((packed)) int: if someone went through the trouble
>> > of writing the attribute, they meant for the compiler to use a mutex.
>> >
>> > For an *int*, it's UB to be under-aligned.
>>
>> So if you a have struct with __attribute__ ((packed)) and take an
>> address of an int member, you expect it to get a type of int
>> __attribute__ ((aligned (1))) *? I don't think this is how GCC works
>> today. Instead, there is -Waddress-of-packed-member. See this bug:
>>
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628>
>>
>> There hasn't been much movement to change the GCC type system to track
>> pointer alignment, if I read the bug correctly.
>
> Yes, I'd expect that the compiler be able to or the developer be forced to
> track the alignmend/packing of pointers it extracts from packed structures.
> x86 may not need it, but other architectures do. I think it's a reasonable
> thing to ask.

The compiler would have to do this for performance reasons, so that the
__atomic built-ins can fall back to mutexes for unaligned pointers.
With the current type system, there is no reasonable way to tell apart
aligned and unaligned pointers.

But even if we could get the compiler fixed by magic, there would be a
lot of tricky there. If the pointer uses the wrong alignment, some code
will break silently because the mutex-based approach simply is not an
adequate replacement. It does not work for process-shared mappings or
anything that's aliased at the VM level.

>> I'm worried the run-time check is the only realistic way to implement
>> what you want, given this constraint. And I really don't think we
>> should emit a run-time check.
>
> What's your suggestion? That we always lock? Or that we never do?

Undefined behavior if the pointer is not naturally aligned, with both
<stdatomic.h> and __atomic built-ins.

Thanks,
Florian

H.J. Lu

unread,
Jan 3, 2019, 11:49:12 AM1/3/19
to Florian Weimer, Thiago Macieira, IA32 System V Application Binary Interface
Could someone send me a patch against hjl/x86/master branch?

Thanks.

--
H.J.

Thiago Macieira

unread,
Jan 3, 2019, 1:56:27 PM1/3/19
to Florian Weimer, ia32...@googlegroups.com, H.J. Lu
On Thursday, 3 January 2019 11:09:53 -02 Florian Weimer wrote:
> Undefined behavior if the pointer is not naturally aligned, with both
> <stdatomic.h> and __atomic built-ins.

Ok, makes sense.

I'd still suggest that if the pointer is to a packed or underaligned type,
then mutexes should be used. Explicitly marked as such.

That still leaves the question on what to do for std::atomic<FourBytesStruct>.

Jonathan Wakely

unread,
Mar 20, 2019, 4:44:30 PM3/20/19
to IA32 System V Application Binary Interface
libstdc++ std::atomic doesn't always increase the alignment. For this example it is consistent with GCC's _Atomic, and that doesn't match Clang's _Atomic:

#include <stdio.h>

typedef struct { char d[3]; } three;

int main()
{
  printf("%zu %zu\n", sizeof(three), _Alignof(three));
  printf("%zu %zu\n", sizeof(_Atomic three), _Alignof(_Atomic three));
}

Clang increases the alignment to 4, making it lock-free. GCC doesn't.

For this example GCC and Clang seem to disagree on all architectures, not just x86 (I tested x86_64, ARM, aarch64 and power64).

Obviously that goes beyond the scope of the IA-32 psABI, but any decision here should consider this case too, as a starting point to fixing it everywhere.

John McCall

unread,
Sep 11, 2019, 10:10:31 PM9/11/19
to IA32 System V Application Binary Interface
This conversation appears to have died off, but the ABI document doesn't seem to have been updated, and it seems like both Clang and GCC are still waiting for a decision.  I'll add that, for better or worse, the x86 psABI group has quite a lot of influence here — it's likely that other ABIs will generally follow your lead.

There are two specific questions here:

1. For any complete object type T, what are the size and alignment of _Atomic(T)?  If padding bytes are included (presumably always after the value), are there assumed requirements on their contents (e.g. always being zero)?

2. Under what circumstances is the compiler permitted to inline an atomic operation (whether an intrinsic or a language-directed operation on _Atomic(T)) instead of calling into the runtime?  Or to put it another way, under what circumstances is the atomics runtime required to perform the operation in a way that correctly interoperates with obvious inlined code sequences?  (A runtime which always acquires a lock and performs byte-by-byte copies, even when the atomic object is aligned, would not permit this.)

Speaking as a Darwin maintainer (and Darwin has already decided its ABI here), I think that over-aligning and (if necessary) padding objects up to some fixed limit is sensible and clearly within the scope of what the C committee intended.  However, I can understand why you would be reticent to make changes that would break ABI for some existing code.  Ultimately, all I really want is a decision that we in the compilers community can finally act on.

John.
Reply all
Reply to author
Forward
0 new messages