Atomic pointer operations and unsafe

4,661 views
Skip to first unread message

Dmitry Vyukov

unread,
Jul 21, 2014, 5:20:34 PM7/21/14
to golang-dev, Rob 'Commander' Pike
Regarding https://codereview.appspot.com/111230043/ , moving the
discussion to golang-dev.

The context: I've received a private report about poor encoding/gob
performance on a 80-core machine. encoding/gob has a bunch of global
mutexes. By removing just one of them, I've got 2x speedup on 16-core
machine. If all mutexes are removed on the 80-core machine it can
easily make 20x difference.

I replaced the mutex with unsafe.Pointer and operate on it with
atomic.Load/StorePointer. r banned the change because of usage of
unsafe package.

This kind of scalable synchronization algorithms inherently requires
unsafe package today. This means that it is incompatible with
appengine and other safe contexts. While it is not actually unsafe.
Looks like a quite unfortunate situation to me. Today people use
80-code machines, tomorrow they will use 200-core machines. Mutexes
don't work there in any way, shape or form.

Thoughts?

Brad Fitzpatrick

unread,
Jul 21, 2014, 5:27:51 PM7/21/14
to Dmitry Vyukov, golang-dev, Rob 'Commander' Pike
Are you proposing an addition to the sync package? If so, what?




--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,
Jul 21, 2014, 5:32:46 PM7/21/14
to Brad Fitzpatrick, golang-dev, Rob 'Commander' Pike
For example, if we make atomic.FooPointer compiler assisted functions
that can accept any pointer type along with unsafe.Pointer (so that
it's backwards compatible), then it would solve the problem.
This looks like providing sugar for tricky synchronization algorithms.
But my concern is the inability to express such algorithms for e.g.
appengine.

Russ Cox

unread,
Jul 21, 2014, 5:36:58 PM7/21/14
to Dmitry Vyukov, golang-dev, Rob 'Commander' Pike
Luckily all the uses of 80-core machines I know about are by researchers, not in production environments. It is fine to send researchers patches until we really understand the whole story.

Russ

Dmitry Vyukov

unread,
Jul 21, 2014, 5:55:12 PM7/21/14
to Russ Cox, golang-dev, Rob 'Commander' Pike
On my machine the performance difference between mutex and atomic is
6x. And AWS offers a more parallel machine than mine *today*.

Rui Ueyama

unread,
Jul 21, 2014, 6:12:14 PM7/21/14
to Dmitry Vyukov, golang-dev, Rob 'Commander' Pike
I may be missing something, but as to that specific CL, did you try to make typeLock a RWMutex instead of Mutex? I think we need to acquire the writer lock only when we see a value of new type, so I believe usually we only need the reader lock.


On Mon, Jul 21, 2014 at 2:20 PM, 'Dmitry Vyukov' via golang-dev <golan...@googlegroups.com> wrote:

Henrik Johansson

unread,
Jul 21, 2014, 6:58:17 PM7/21/14
to golang-dev, Rob 'Commander' Pike, Dmitry Vyukov

Is there really any difference between the locks and the store/load functions?
They both deviate from the go mantra about sharing through communication and while the locks may be easier to handle they serve the same purpose.

If these types and functions are not really unsafe then they belong the atomic package fully.

What harm can it cause? Most other languages of similar kind have such utilities so it can't be all bad.

/Henrik

Dmitry Vyukov

unread,
Jul 22, 2014, 4:25:38 AM7/22/14
to Rui Ueyama, golang-dev, Rob 'Commander' Pike
On Tue, Jul 22, 2014 at 2:11 AM, Rui Ueyama <ru...@google.com> wrote:
> I may be missing something, but as to that specific CL, did you try to make
> typeLock a RWMutex instead of Mutex? I think we need to acquire the writer
> lock only when we see a value of new type, so I believe usually we only need
> the reader lock.

RWMutexes do not scale on high core counts.

Dmitry Vyukov

unread,
Jul 22, 2014, 4:26:03 AM7/22/14
to Henrik Johansson, golang-dev, Rob 'Commander' Pike
On Tue, Jul 22, 2014 at 2:58 AM, Henrik Johansson <dahan...@gmail.com> wrote:
> Is there really any difference between the locks and the store/load
> functions?

Yes, there is a difference. Mutexes do no scale. Atomic loads do.

Henrik Johansson

unread,
Jul 22, 2014, 4:44:27 AM7/22/14
to Dmitry Vyukov, Rob Pike, golang-dev

Ah yes indeed. I was more getting at the general principle than the runtime characteristics. They are both used to provide a consistent view of shared data.

I think this sort of performance improvement is a natural part of a runtime/library evolution and many core systems are getting more common.

I am not sure but I remember a Niagara with some 32 cores and that was some time ago. Surely systems come with even more cores pretty cheap today.

Dmitry Vyukov

unread,
Jul 22, 2014, 5:01:02 AM7/22/14
to Henrik Johansson, Rob Pike, golang-dev
I agree with you and I do not agree with Russ.

If Go wants to continue to be positioned as "the way to program modern
multicore machines" (which I believe it was initially), then it must
provide relevant means for that and its base libraries must not impose
scalability bottlenecks.

Russ Cox

unread,
Jul 22, 2014, 9:39:32 AM7/22/14
to Dmitry Vyukov, Henrik Johansson, Rob Pike, golang-dev
I don't know what you are arguing for or against.

Go is as much about writing understandable, maintainable programs as it is about multicore performance. Peppering code with uses of atomic and unsafe may help the latter but it dramatically hurts the former. As Brad hinted at, it would be much better to develop a nice, usable abstraction provided by package sync than it would be to relax the rules for using atomic and unsafe.

Remember the new introduction to the Go memory model (not yet committed): 

Use locks and channels. Don't be clever.

For thoroughness, the rest of this document explains things in more
detail, but if you need to read this document to decide if your
program is correct, you are being clever.

We need to provide ways for end programmers to avoid being clever. Making atomic operations polymorphic does the opposite: it encourages cleverness.

Russ

Ugorji

unread,
Jul 22, 2014, 10:22:00 AM7/22/14
to golan...@googlegroups.com, dvy...@google.com, dahan...@gmail.com, r...@golang.org
IIUC, Dmitry is arguing against doing nothing.

Russ last message suggested that "only researchers" are seeing these issues, and it is fine to send them patches till we understand things better. Dmitry is saying that we understand things pretty well now, and see that mutexes do not scale to multi-core, and we need to do something (compiler assisted mode for Load/StorePointer without importing unsafe, etc). Dmitry's last statement was: if go is positioned as for multi-core, ... "then it must provide relevant means for that and its base libraries must not impose scalability bottlenecks". 

I agree that we need to do something, if mutexes do not scale with the number of cores. 

Dmitry Vyukov

unread,
Jul 22, 2014, 10:22:39 AM7/22/14
to Russ Cox, Henrik Johansson, Rob Pike, golang-dev
On Tue, Jul 22, 2014 at 5:39 PM, Russ Cox <r...@golang.org> wrote:
> I don't know what you are arguing for or against.
>
> Go is as much about writing understandable, maintainable programs as it is
> about multicore performance. Peppering code with uses of atomic and unsafe
> may help the latter but it dramatically hurts the former. As Brad hinted at,
> it would be much better to develop a nice, usable abstraction provided by
> package sync than it would be to relax the rules for using atomic and
> unsafe.
>
> Remember the new introduction to the Go memory model (not yet committed):
>
> Use locks and channels. Don't be clever.
>
> For thoroughness, the rest of this document explains things in more
> detail, but if you need to read this document to decide if your
> program is correct, you are being clever.

But that principle must not be carried to an absurdity as well, right?
We already use sync/atomic in regexp, net and in some other packages
for a reason.

The problem with scalability is that you can not get away with *any*
amount of simple code. Sometimes you just need an atomic.


> We need to provide ways for end programmers to avoid being clever. Making
> atomic operations polymorphic does the opposite: it encourages cleverness.

What is required here is something where you can put an object to, and
retrieve the current version of the object. We can call it some other
way, but this is an atomic pointer as an abstraction.

Russ Cox

unread,
Jul 22, 2014, 11:08:27 AM7/22/14
to Dmitry Vyukov, Henrik Johansson, Rob Pike, golang-dev
On Tue, Jul 22, 2014 at 10:22 AM, Dmitry Vyukov <dvy...@google.com> wrote:
But that principle must not be carried to an absurdity as well, right?
We already use sync/atomic in regexp, net and in some other packages
for a reason.

I see atomics in net. I don't see any in regexp, thank goodness.

The problem with scalability is that you can not get away with *any*
amount of simple code. Sometimes you just need an atomic.

That's fine, but we need to find a way to provide them that does not require cleverness to use.

I really appreciate your focus on performance. I care more about maintainability and readability over the long term than I do about raw performance. You are much more comfortable with complex code if it means the code runs faster. If we can find something that satisfies both of us, life will be good for Go programmers.
 
> We need to provide ways for end programmers to avoid being clever. Making
> atomic operations polymorphic does the opposite: it encourages cleverness.

What is required here is something where you can put an object to, and
retrieve the current version of the object. We can call it some other
way, but this is an atomic pointer as an abstraction.

The first big problem is that doing this requires unsafe, and people basically can't use unsafe safely. If this is something we want to promote, I think we need to find a way to do it without unsafe. Converting protobuf and gob to avoid unsafe found bugs in both, and if we can't get those right, we can't promote unsafe as a day-to-day thing.

One possibility is to define, in package atomic

// An Interface provides an atomic load and store of a consistently typed value.
// Interfaces can be created as part of other data structures.
// The zero value for an Interface returns nil from Load.
// Once Store has been called, an Interface must not be copied.
type Interface struct {v interface{}}

// Load returns the value set by the most recent Store.
// It returns nil if there has been no call to Store.
func (p *Interface) Load() interface{}

// Store sets the value of the interface to v.
// All calls to Store for a given Interface must use values of the same type.
// Store of an inconsistent type panics, as does Store(nil).
func (p *Interface) Store(v interface{})

These might need to be implemented in assembly - I am not sure the interface value can be manipulated correctly using the current typed primitives. In pseudo-code:

Load:

typ = atomicload(&p.v.typ) 
if typ == nil { return nil }
val = atomicload(&p.v.val)
return {typ, val}

Store:

if v == nil { panic }
for {
    typ = atomicload(&p.v.typ)
    if typ == nil && !atomiccas(&p.v.typ, nil, v.typ) { continue }
    if typ != v.typ { panic }
    break
}
atomicstore(&p.v.val, v.val)

The restriction about having a consistently typed value for the lifetime of the Interface means that load and store can use single-word atomic operations. Using interfaces is maybe a little clunky compared to polymorphism, but interfaces are available and polymorphism is not. Interfaces also make it possible to do atomic load/store of multiword structures (with allocations being done automatically underneath).

This could be used in gob or other places without needing package unsafe.

Thoughts?
Russ

Ian Lance Taylor

unread,
Jul 22, 2014, 10:17:26 PM7/22/14
to Dmitry Vyukov, Russ Cox, Henrik Johansson, Rob Pike, golang-dev
On Tue, Jul 22, 2014 at 7:22 AM, 'Dmitry Vyukov' via golang-dev
<golan...@googlegroups.com> wrote:
>
> What is required here is something where you can put an object to, and
> retrieve the current version of the object. We can call it some other
> way, but this is an atomic pointer as an abstraction.

I haven't thought about it too hard, but from looking at the CL it
seems that you could also do it using sync.Once and a concurrent map.
I mention this because those are abstractions that may be easier to
think about and use correctly than atomic pointers.

Ian

Pieter Droogendijk

unread,
Jul 23, 2014, 3:56:54 AM7/23/14
to golan...@googlegroups.com, r...@golang.org, dahan...@gmail.com


On Tuesday, July 22, 2014 4:22:39 PM UTC+2, Dmitry Vyukov wrote:

> We need to provide ways for end programmers to avoid being clever. Making
> atomic operations polymorphic does the opposite: it encourages cleverness.

What is required here is something where you can put an object to, and
retrieve the current version of the object. We can call it some other
way, but this is an atomic pointer as an abstraction.
 
Functionally, this sounds like one of these: make(chan *Structure, 1)

Perhaps an optimization on channels with a buffer of size 1?

Dmitry Vyukov

unread,
Jul 23, 2014, 6:52:21 AM7/23/14
to Russ Cox, Henrik Johansson, Rob Pike, golang-dev
Looks good to me.

I think I even wrote something along these lines some time ago... right:
https://codereview.appspot.com/5554065/diff/3/src/pkg/sync/atomic/ifaceptr.go
It was an atomic interface that can change types dynamically
(double-word atomic loads are based on a lock-free seq-lock).

I agree that we don't need to change types for the copy-on-write case.
We can even require that the atomic.Interface is initialized to the
final type before other loads and stores:

var x atomic.Interface = atomic.MakeInterface(new map[string]string)

It will make Load/Store impl simpler and usage even stricter.

Russ Cox

unread,
Jul 23, 2014, 8:00:06 AM7/23/14
to Dmitry Vyukov, Henrik Johansson, Rob Pike, golang-dev
On Wed, Jul 23, 2014 at 6:51 AM, Dmitry Vyukov <dvy...@google.com> wrote:
Looks good to me.

I think I even wrote something along these lines some time ago... right:
https://codereview.appspot.com/5554065/diff/3/src/pkg/sync/atomic/ifaceptr.go
It was an atomic interface that can change types dynamically
(double-word atomic loads are based on a lock-free seq-lock).

It looks like you started that but never actually implemented the atomic part. The assembly is just doing a simple copy of the interface value. But I think it's fine to use the fixed-type version to start. We can always relax the constraints later if we decide it is worthwhile.
 
I agree that we don't need to change types for the copy-on-write case.
We can even require that the atomic.Interface is initialized to the
final type before other loads and stores:

var x atomic.Interface = atomic.MakeInterface(new map[string]string)

It will make Load/Store impl simpler and usage even stricter.

Requiring that kind of initialization before use makes the zero value unusable and encourages people to think = is okay on atomic.Interface, both of which seem like drawbacks to me.

I thought more yesterday and I think that atomic.Value is a better name than atomic.Interface.

If you think atomic.Value makes sense and it fixes the gob problem, I'm happy with it. We should let Rob weigh in too (he is away this week, I am away next week), but I think we're heading down a good path. Want to start a CL?

Thanks.
Russ

Dmitry Vyukov

unread,
Jul 24, 2014, 10:09:50 AM7/24/14
to Russ Cox, Henrik Johansson, Rob Pike, golang-dev
Here it is:
https://codereview.appspot.com/120140044

Implementation only for amd64 for now.
I did not yet done a benchmark, but I am sure that it will be on a par
with normal atomic load (interface->value conversion of this kind must
be fast and inlined).

Waiting for Rob.

Dmitry Vyukov

unread,
Jul 25, 2014, 4:58:44 AM7/25/14
to Pieter Droogendijk, golang-dev, Russ Cox, Henrik Johansson
Channels have put and get. This component requires peek and replace.
Channels do not provide either of them.

Dmitry Vyukov

unread,
Jul 25, 2014, 5:08:23 AM7/25/14
to Ian Lance Taylor, Russ Cox, Henrik Johansson, Rob Pike, golang-dev
I see your point. But the atomic.Value (or any other way to support
copy-on-write) allows to use absolutely any single-threaded structure
in read-mostly scenario; while hashmaps are limited to, well,
hashmaps. Also concurrent hashmaps are generally slower than
single-threaded hashmaps; they are usually forced to use linked lists
instead of arrays; and some implementations just do some kind of very
fine-grained locking which destroys the whole point of reads being
physically read-only (rather than logically read-only). And of course
implementation complexity of good concurrent hashmaps needs to be
noted separately.
Having said that, a concurrent hashmap can also be a useful component.

Henrik Johansson

unread,
Jul 25, 2014, 5:37:41 AM7/25/14
to Dmitry Vyukov, Rob Pike, Russ Cox, golang-dev, Ian Lance Taylor

So, just so I get this straight. This value serves the only purpose to allow a proper read and write across threads/goroutines? It will not provide compare and swap? I kind of, perhaps naively, thought that that was the intended use case.

Dmitry Vyukov

unread,
Jul 25, 2014, 6:04:56 AM7/25/14
to Henrik Johansson, Rob Pike, Russ Cox, golang-dev, Ian Lance Taylor
On Fri, Jul 25, 2014 at 1:37 PM, Henrik Johansson <dahan...@gmail.com> wrote:
> So, just so I get this straight. This value serves the only purpose to allow
> a proper read and write across threads/goroutines? It will not provide
> compare and swap? I kind of, perhaps naively, thought that that was the
> intended use case.


Yes. At least for now. We don't have good use cases for CAS yet.

Юрий Соколов

unread,
Aug 3, 2014, 1:44:27 AM8/3/14
to golan...@googlegroups.com
Closure lives well with true readonly hashmap.

bernhard...@gmail.com

unread,
Aug 3, 2014, 8:02:54 AM8/3/14
to golan...@googlegroups.com, dahan...@gmail.com, r...@golang.org, ia...@golang.org
Explanation please. 
I do understand that atomic pointers improve concurrent access to a data structure/object because they are non-blocking. This is, I guess, Dmitry's point of having a non-blocking synchronization vs a mutex which is blocking. And Russ, albeit reluctantly, agrees. His pseudo-code above shows an 'atomiccas' (cpu compare-and-swap synchronization - cas) instruction.

What I do not understand is how a simple atomic swap of a pointer protects an entire data structure/object. This can only work if every change to a data-structure requires a change of the pointer pointing to it, kind of a copy-on-write-with-pointer-change mechanism.

Do I have this right?

Dmitry Vyukov

unread,
Aug 4, 2014, 7:05:12 AM8/4/14
to Bernhard Spanyar, golang-dev, Henrik Johansson, Rob 'Commander' Pike, Ian Taylor
On Sun, Aug 3, 2014 at 4:02 PM, <bernhard...@gmail.com> wrote:
> Explanation please.
> I do understand that atomic pointers improve concurrent access to a data
> structure/object because they are non-blocking. This is, I guess, Dmitry's
> point of having a non-blocking synchronization vs a mutex which is blocking.
> And Russ, albeit reluctantly, agrees. His pseudo-code above shows an
> 'atomiccas' (cpu compare-and-swap synchronization - cas) instruction.

No, this is not about blocking vs non-blocking. RWMutes is also
non-blocking for the case w/o writers. But it does not scale as well.
This is about write-sharing:
http://www.1024cores.net/home/lock-free-algorithms/first-things-first


> What I do not understand is how a simple atomic swap of a pointer protects
> an entire data structure/object. This can only work if every change to a
> data-structure requires a change of the pointer pointing to it, kind of a
> copy-on-write-with-pointer-change mechanism.
>
> Do I have this right?

Yes, this is intended to be used with copy-on-write (as the patch does).

David Crawshaw

unread,
Aug 5, 2014, 8:17:25 AM8/5/14
to Dmitry Vyukov, Rui Ueyama, golang-dev, Rob 'Commander' Pike
Bringing an offline discussion with Dmitry back to golang-dev,

> RWMutexes do not scale on high core counts.

The sync.RWMutex interface can be made to scale a lot further than it
does now without changing its semantics. The read lock can be striped
out per processor, so that uncontentious reads don't all write on the
same atomic int.

This means the memory use of a RWMutex grows proportionally to the
number of procs. But I believe it is reasonable that the size of a
synchronization primitive grows with the number of procs in a machine.

bernhard...@gmail.com

unread,
Aug 7, 2014, 10:32:26 AM8/7/14
to golan...@googlegroups.com, bernhard...@gmail.com, dahan...@gmail.com, r...@golang.org, ia...@golang.org
All right, thanks for the answer, but could you elaborate a bit on why an atomic pointer prevents write-sharing (false-sharing)? Your blog shows 'Write private', 'Read private', and 'Read Shared' to be the best performing operation.
Top-of-your-head/big-picture explanation sufficient.


False-sharing: One cache line is shared by two different threads, without them really sharing any data. The threads are just using  data which sits next to each other in the same cache line.

Dmitry Vyukov

unread,
Aug 7, 2014, 10:40:39 AM8/7/14
to Bernhard Spanyar, golang-dev, Henrik Johansson, Rob 'Commander' Pike, Ian Taylor
It's simple -- with the atomic-pointer-based copy-on-write solution
there are writes in common case. Goroutines only read the pointer.
With RWMutex there are constant writes to the RWMutex internal state
to synchronize with potential (but non-existent in this case) writers.

bernhard...@gmail.com

unread,
Aug 7, 2014, 11:53:45 AM8/7/14
to golan...@googlegroups.com, bernhard...@gmail.com, dahan...@gmail.com, r...@golang.org, ia...@golang.org
These few words are pregnant with meaning '...there are writes in common case. Goroutines only read the pointer.' but I don't get it.

What I would imagine is this. You have private memory for each thread (hence the word 'private xyz' in your blog) which does not interleave data and variables with other threads in the same cache line.
Any data in the private area comes there with a copy-on-write scheme. Once this piece of data becomes the 'official and shared' state, a public pointer is switched to it, atomically. 
So you have no false sharing and copy-on-write to each threads' private memory, which is later again published by pointer switch?

Is this kind of right?

Dmitry Vyukov

unread,
Aug 7, 2014, 11:58:54 AM8/7/14
to Bernhard Spanyar, golang-dev, Henrik Johansson, Rob 'Commander' Pike, Ian Taylor
sorry, there are *no* writes in common case

bernhard...@gmail.com

unread,
Aug 7, 2014, 12:00:50 PM8/7/14
to golan...@googlegroups.com, bernhard...@gmail.com, dahan...@gmail.com, r...@golang.org, ia...@golang.org
Ok, and that means?

Dmitry Vyukov

unread,
Aug 7, 2014, 12:03:11 PM8/7/14
to Bernhard Spanyar, golang-dev, Henrik Johansson, Rob 'Commander' Pike, Ian Taylor
threads do not write to any shared data

On Thu, Aug 7, 2014 at 8:00 PM, <bernhard...@gmail.com> wrote:
> Ok, and that means?
Reply all
Reply to author
Forward
0 new messages