Question about atomic code emitted by GCC

310 views
Skip to first unread message

Roger Ferrer Ibanez

unread,
Aug 9, 2018, 11:13:41 AM8/9/18
to RISC-V SW Dev
Hi all,

apologies if my question looks a bit naive, I'm definitely not an expert
in memory models.

I was toying with the code generation of riscv64-unknown-linux-gnu GCC
8.1 of the following C11 program:

-- atomic.c
#include <stdatomic.h>

atomic_int x;
void foo(void) { atomic_fetch_add_explicit(&x, 5, memory_order_acq_rel); }

-- end of atomic.c

For the atomic fetch and add, GCC emits this sequence

fence iorw,ow; amoadd.w.aq zero,a4,0(a5)

Reading the GCC source, the first fence implements the release semantics.

My question is, why didn't GCC just choose to emit?

amoadd.w.aq.rl zero,a4,0(a5)

(no fence intended :)

Is this because we still have to order both the device IO/memory before
the AMO itself? I infer this from 7.1p2 of the current RISC-V ISA[1]
draft (emphasis mine)

> To provide more efficient support for release consistency [10], each
atomic instruction has two bits, aq and rl, used to specify additional
memory ordering constraints as viewed by other RISC-V harts. **The bits
order accesses to one of the two address domains, memory or I/O,
depending on which address domain the atomic instruction is accessing.**"

My understanding is that fence is the only that can order the two
domains. I infer that from 7.1p1 (emphasis mine)

> The base RISC-V ISA has a relaxed memory model, with the FENCE
instruction used to impose additional ordering constraints. The address
space is divided by the execution environment into memory and I/O
domains, and the FENCE instruction provides options to order accesses to
one or **both of these two address domains**.

But then I'm mildly confused by a later note in 7.3

> "The AMOs were designed to implement the C11 and C++11 memory models
efficiently. Although the FENCE R, RW instruction suffices to implement
the acquire operation and FENCE RW, W suffices to implement release,
both imply additional unnecessary ordering as compared to AMOs with the
corresponding aq or rl bit set."

The overall message makes sense to me but I'm confused why the spec
suggests "fence rw, w" and not "fence iorw, ow" (looks like GCC extends
'r' to include 'i' and 'w' to include 'o').

Thank you very much,

[1]
https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20180808-ce5e74a

--
Roger Ferrer Ibáñez - roger....@bsc.es
Barcelona Supercomputing Center - Centro Nacional de Supercomputación

http://bsc.es/disclaimer

Daniel Lustig

unread,
Aug 9, 2018, 5:47:13 PM8/9/18
to Roger Ferrer Ibanez, RISC-V SW Dev
On 8/9/2018 8:13 AM, Roger Ferrer Ibanez wrote:
> Hi all,
>
> apologies if my question looks a bit naive, I'm definitely not an
> expert in memory models.
>
> I was toying with the code generation of riscv64-unknown-linux-gnu
> GCC 8.1 of the following C11 program:
>
> -- atomic.c #include <stdatomic.h>
>
> atomic_int x; void foo(void) { atomic_fetch_add_explicit(&x, 5,
> memory_order_acq_rel); }
>
> -- end of atomic.c
>
> For the atomic fetch and add, GCC emits this sequence
>
> fence iorw,ow; amoadd.w.aq zero,a4,0(a5)
>
> Reading the GCC source, the first fence implements the release
> semantics.
>
> My question is, why didn't GCC just choose to emit?
>
> amoadd.w.aq.rl zero,a4,0(a5)
>
> (no fence intended :)

amoadd.w.aq.rl zero,a4,0(a5) looks sensible to me. I don't know why
GCC would emit a fence iorw,ow there. It's not wrong, but it's
overkill. The C memory model doesn't require I/O ordering to be
enforced for every synchronization operation. (And even if it did,
there's no reason it would be asymmetric like that anyway)

> Is this because we still have to order both the device IO/memory
> before the AMO itself? I infer this from 7.1p2 of the current RISC-V
> ISA[1] draft (emphasis mine)
>
>> To provide more efficient support for release consistency [10],
>> each atomic instruction has two bits, aq and rl, used to specify
>> additional memory ordering constraints as viewed by other RISC-V
>> harts. **The bits order accesses to one of the two address domains,
>> memory or I/O, depending on which address domain the atomic
>> instruction is accessing.**"
>
> My understanding is that fence is the only that can order the two
> domains. I infer that from 7.1p1 (emphasis mine)

Agreed.

>> The base RISC-V ISA has a relaxed memory model, with the FENCE
>> instruction used to impose additional ordering constraints. The
>> address space is divided by the execution environment into memory
>> and I/O domains, and the FENCE instruction provides options to
>> order accesses to one or **both of these two address domains**.
>
> But then I'm mildly confused by a later note in 7.3
>
>> "The AMOs were designed to implement the C11 and C++11 memory
>> models efficiently. Although the FENCE R, RW instruction suffices
>> to implement the acquire operation and FENCE RW, W suffices to
>> implement release, both imply additional unnecessary ordering as
>> compared to AMOs with the corresponding aq or rl bit set."
>
> The overall message makes sense to me but I'm confused why the spec
> suggests "fence rw, w" and not "fence iorw, ow" (looks like GCC
> extends 'r' to include 'i' and 'w' to include 'o').

The "additional unnecessary ordering" referenced there is in the
sense described in Appendix A.3.7.

-Dan
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Jim Wilson

unread,
Aug 9, 2018, 6:24:29 PM8/9/18
to Daniel Lustig, Roger Ferrer Ibanez, RISC-V SW Dev
On Thu, Aug 9, 2018 at 2:47 PM, Daniel Lustig <dlu...@nvidia.com> wrote:
> On 8/9/2018 8:13 AM, Roger Ferrer Ibanez wrote:
>> For the atomic fetch and add, GCC emits this sequence
>> fence iorw,ow; amoadd.w.aq zero,a4,0(a5)

I doubt that anyone has looked at the GCC atomic or memory model
support in a long time. We know that we have a problem with sub-word
atomics that are currently emitted as function calls, but need to be
inline expanded instead to solve a number of problems. In this case,
gcc is emitting a fence to get release semantics. It appears that gcc
has no support for adding .rl to an amo* instruction, which looks odd,
but I don't know the history here. It always uses a fence instead.
Gcc does have support for adding .aq to an amo* instruction to get
acquire semantics. Maybe they were just trying to be conservative
before the memory model was developed, and the mapping to the C/C++
language standards was defined. Anyways, this looks like another
atomic problem that needs to be added to the list of things to fix.

Jim

Roger Ferrer Ibanez

unread,
Aug 10, 2018, 3:30:07 AM8/10/18
to Daniel Lustig, RISC-V SW Dev
>> My question is, why didn't GCC just choose to emit?
>>
>> amoadd.w.aq.rl zero,a4,0(a5)
>>
>> (no fence intended :)
>
> amoadd.w.aq.rl zero,a4,0(a5) looks sensible to me. I don't know why
> GCC would emit a fence iorw,ow there. It's not wrong, but it's
> overkill. The C memory model doesn't require I/O ordering to be
> enforced for every synchronization operation. (And even if it did,
> there's no reason it would be asymmetric like that anyway)
>

OK I see. I thought this tried to be very conservative to make it
useable even in contexts of memory mapped IO. As you point, would fail
to do so if that had been the goal in the first place.

>
> The "additional unnecessary ordering" referenced there is in the
> sense described in Appendix A.3.7.

Oh I missed that appendix.

Thank you Dan.

Roger Ferrer Ibanez

unread,
Aug 10, 2018, 3:39:04 AM8/10/18
to Jim Wilson, Daniel Lustig, RISC-V SW Dev

> acquire semantics. Maybe they were just trying to be conservative
> before the memory model was developed, and the mapping to the C/C++
> language standards was defined.

Sure, I also presumed given that this was in flux, GCC tried to do
something that was "future-proof".

Thanks Jim
Reply all
Reply to author
Forward
0 new messages