implementation of sync.atomic primitives

shiv...@yelp.com

unread,

Mar 19, 2018, 12:47:54 AM3/19/18

to golang-nuts

I noticed that internally, the language implementation seems to rely on the atomicity of reads to single-word values:

https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/runtime/chan.go#L160

As I understand it, this atomicity is provided by the cache coherence algorithms of modern architectures. Accordingly, the implementations in sync.atomic of word-sized loads (e.g., LoadUint32 on 386 and LoadUint64 on amd64) use ordinary MOV instructions:

https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_386.s#L146

https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L103

However, word-sized stores on these architectures use special instructions:

https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L133

Given that the APIs being implemented don't provide any global ordering guarantees, what's the reason they can't be implemented solely with MOV?

Thanks very much for your time.

Ian Lance Taylor

unread,

Mar 19, 2018, 1:55:07 AM3/19/18

to shiv...@yelp.com, golang-nuts

On Sun, Mar 18, 2018 at 9:47 PM, shivaram via golang-nuts
<golan...@googlegroups.com> wrote:
>
> I noticed that internally, the language implementation seems to rely on the
> atomicity of reads to single-word values:
>
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/runtime/chan.go#L160

In the machine level, words like "atomicity" are overloaded with
different meanings. I think what you are saying is that the runtime
package is assuming that a load of a machine word will never read an
interleaving of two different store of a machine word. It will always
read the value written by a single store, though exactly which store
it sees is unknown. This is true on all the processors that Go
supports.

> As I understand it, this atomicity is provided by the cache coherence
> algorithms of modern architectures. Accordingly, the implementations in
> sync.atomic of word-sized loads (e.g., LoadUint32 on 386 and LoadUint64 on
> amd64) use ordinary MOV instructions:
>
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_386.s#L146
>
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L103
>
> However, word-sized stores on these architectures use special instructions:
>
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L133
>
> Given that the APIs being implemented don't provide any global ordering
> guarantees, what's the reason they can't be implemented solely with MOV?

You are not giving the correct reason for why atomic.LoadUint32 and
LoadUint64 can use ordinary MOV instructions on x86 processors. The
LoadUint32, etc., functions guarantee much more than that they read a
value that is not an interleaving a multiple writes. They are also
load-acquire operations, meaning that when the function completes, the
caller will see not only the value that was loaded but also all other
values that some other processor core wrote before writing to the
address being loaded (assuming the write was done using StoreUint32,
etc.). It happens that on x86 you can implement load-acquire using a
simple MOV instruction. Most other multicore processors use a more
complex memory model, and their sync/atomic implementations are
accordingly more complex.

Ian

shiv...@yelp.com

unread,

Mar 19, 2018, 8:46:56 AM3/19/18

to golang-nuts

Thanks, this was very helpful. If I understand correctly:

1. These ordering guarantees are part of the Go memory model? I couldn't find explicit descriptions of them in these pages:

https://golang.org/ref/mem

https://golang.org/pkg/sync/atomic/

2. The property that word-sized values are not subject to interleaving/tearing is an implementation detail, rather than a guarantee of the Go memory model?

thepud...@gmail.com

unread,

Mar 19, 2018, 11:52:19 AM3/19/18

to golang-nuts

Hi Shivaram,

Regarding the memory model definition and sync/atomic, there is this github issue that you likely would be interested in:

issue #5045 "doc: define how sync/atomic interacts with memory model"

Including this comment:

=====================================================

https://github.com/golang/go/issues/5045#issuecomment-252730563

=====================================================

rsc commented on Oct 10, 2016:

Yes, I spent a while on this last winter but didn't get a chance to write it up properly yet. The short version is that I'm fairly certain the rules will be that Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics), and that you shouldn't mix atomic and non-atomic accesses for a given memory word.

=====================================================

--thepudds

thepud...@gmail.com

unread,

Mar 19, 2018, 12:30:39 PM3/19/18

to golang-nuts

Hi Ian,

I know you were not giving any type of definitive treatise on how go treats atomics across different processors...

but is a related aspect restricting instruction reordering by the compiler itself?

I don't know what the modern go compiler does at this point, but I think at least circa go 1.5 there was a nop function that seemed to be used to help prevent the compiler from inlining and then doing instruction re-ordering (first snippet below), and I think I've seen you make related comments more recently (e.g., FreeBSD atomics discussion snippet I included at the end of this post)?

I haven't followed the more recent atomics related changes (including it seems in 1.10 there might have been some work around intrinsics such as CL 28076: "cmd/compile: intrinsify sync/atomic for amd64"?)...

And yes, on the one hand the answer is "respect the memory model and get a clean report from the race detector, etc., etc."... but of course sometimes the performance aspect of the current compiler does matter beyond just mere natural curiosity about how the go compiler does what it does (where performance was the context I had looked at this more closely in the past).

Two related snippets:

====================================================

from go 1.5 https://github.com/golang/go/blob/release-branch.go1.5/src/runtime/atomic_amd64x.go#L11

====================================================

// The calls to nop are to keep these functions from being inlined.

// If they are inlined we have no guarantee that later rewrites of the

// code by optimizers will preserve the relative order of memory accesses.

//go:nosplit

func atomicload(ptr *uint32) uint32 {

nop()

return *ptr

}

====================================================

Ian Lance Taylor response to question on FreeBSD atomics discussion on golang-dev: https://groups.google.com/forum/#!topic/golang-dev/f3PS8hp4Jfs

====================================================

> The second issue I have is translating FreeBSD atomic operations to runtime

> atomic ops.

> If I understand it correctly then atomic_load_acq_32 has weaker requirements

> compared to runtime/internal/atomic.Load.

> On x86 the FreeBSD variant is just a compiler barrier to prevent it

> re-oredering instructions.

The Go compiler does reorder instructions. But it doesn't reorder

instructions across a non-inlined function call. On x86 a simple

memory load suffices for atomic.Load because x86 has a fairly strict

memory order in any case. Most other processors are more lenient, and

require more work in the atomic operation.

====================================================

--thepudds

On Monday, March 19, 2018 at 1:55:07 AM UTC-4, Ian Lance Taylor wrote:

Keith Randall

unread,

Mar 19, 2018, 7:54:29 PM3/19/18

to golang-nuts

On Monday, March 19, 2018 at 9:30:39 AM UTC-7, thepud...@gmail.com wrote:

Hi Ian,

I know you were not giving any type of definitive treatise on how go treats atomics across different processors...

but is a related aspect restricting instruction reordering by the compiler itself?

Yes, the compiler needs to treat atomic loads differently from normal loads with respect to any instruction reordering it does. So although *p and atomic.LoadUint32(p) both compile to a single MOVL on amd64, internally the compiler represents those two operations differently.

Ian Lance Taylor

unread,

Mar 19, 2018, 8:59:15 PM3/19/18

to shiv...@yelp.com, golang-nuts

On Mon, Mar 19, 2018 at 5:46 AM, shivaram via golang-nuts
<golan...@googlegroups.com> wrote:
>
> 2. The property that word-sized values are not subject to
> interleaving/tearing is an implementation detail, rather than a guarantee of
> the Go memory model?

My impression is that that is guaranteed by the Go memory model. But
the runtime package is in some sense not subject to the Go memory
model, since it is responsible for implementing the Go memory model.

Ian

shiv...@yelp.com

unread,

Mar 20, 2018, 2:03:40 PM3/20/18

to golang-nuts

The race detector in v1.10 considers unsynchronized reads and writes on the `int` and `bool` types to be races:

https://gist.github.com/slingamn/886ebeba32f04294028cf0a60a8cc8c0

Are these instances of the race detector being stricter than the memory model?

Caleb Spare

unread,

Mar 20, 2018, 2:18:46 PM3/20/18

to shiv...@yelp.com, golang-nuts

At risk of causing more confusion, here's my understanding of the
situation after observing a lot of discussion about the memory model
over the years.

There are at least two different things one might mean when referring
to the Go memory model:

1. The written contract as specified by the language spec, the sync
and sync/atomic documentation, and https://golang.org/ref/mem.
2. The much stricter, unwritten (AFAIK), and possibly fuzzier contract
which the Go team tries to implement, which holds that races which
violate (1) shouldn't be interpreted as license for the
compiler/runtime to do arbitrarily bad things, as has been the case in
C/C++-world.

For the most part the race detector implements checks for things that
violate (1) plus some reasonable assumptions about things which ought
to be part of (1) but haven't been written down formally yet (such as
https://github.com/golang/go/issues/5045).

When Ian wrote "My impression is that that is guaranteed by the Go
memory model." I think he was referring to (2). It's certainly
intentional that data races on even word-sized values on amd64 are
considered faulty according to (1) and the race detector.

On Tue, Mar 20, 2018 at 11:03 AM, shivaram via golang-nuts

<golan...@googlegroups.com> wrote:
> The race detector in v1.10 considers unsynchronized reads and writes on the
> `int` and `bool` types to be races:
>
> https://gist.github.com/slingamn/886ebeba32f04294028cf0a60a8cc8c0
>
> Are these instances of the race detector being stricter than the memory
> model?
>
> On Monday, March 19, 2018 at 8:59:15 PM UTC-4, Ian Lance Taylor wrote:
>>
>> On Mon, Mar 19, 2018 at 5:46 AM, shivaram via golang-nuts
>> <golan...@googlegroups.com> wrote:
>> >
>> > 2. The property that word-sized values are not subject to
>> > interleaving/tearing is an implementation detail, rather than a
>> > guarantee of
>> > the Go memory model?
>>
>> My impression is that that is guaranteed by the Go memory model. But
>> the runtime package is in some sense not subject to the Go memory
>> model, since it is responsible for implementing the Go memory model.
>>
>> Ian
>

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ian Lance Taylor

unread,

Mar 20, 2018, 3:19:48 PM3/20/18

to Shivaram Lingamneni, golang-nuts

On Tue, Mar 20, 2018 at 11:03 AM, shivaram via golang-nuts
<golan...@googlegroups.com> wrote:
>
> The race detector in v1.10 considers unsynchronized reads and writes on the
> `int` and `bool` types to be races:
>
> https://gist.github.com/slingamn/886ebeba32f04294028cf0a60a8cc8c0
>
> Are these instances of the race detector being stricter than the memory
> model?

No, that example is a race. In what way is it not a race?

I think that the Go memory model guarantees that a read will see
either one write or the other, not an interleaving of the writes. But
that doesn't mean that it's not a race. The memory model explains
what a race is, and it pretty clearly includes cases like your
example.

Ian

shiv...@yelp.com

unread,

Mar 21, 2018, 10:49:06 AM3/21/18

to golang-nuts

I saw John Regehr's definitions [1] of "data race" versus "race condition", which were helpful. If I've understood correctly:

a. These examples have "data races", as defined by the race detector documentation [2]: "a data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write."

b. but they don't have "race conditions", inasmuch as (according to a version of the Go memory model that includes the non-interleaving guarantee) the semantics are the same as those of a program where the same reads and writes are performed with the sync.atomic primitives, or under a mutex.

[1] https://blog.regehr.org/archives/490

[2] https://golang.org/doc/articles/race_detector.html

Reply all

Reply to author

Forward