Writing data without synchronization

Slawomir Pryczek

unread,

Apr 14, 2020, 7:38:44 AM4/14/20

to golang-nuts

Hi Guys, was wondering about some things related to multithread code.

1. If data that is accessed or changed - it needs to go into CPU cache first, and AFAIK to caches whole memory block not just this single area of memory on which we're operating. When we're doing writes without synchronization to something which is tightly packed, how does it work that cache isn't poisoned, and the result is still correct, even if each cpu core is having its own L1 and L2 caches?

Are such patterns, when single thread is only operating on single cell(s) of memory, considered thread safe for slices and in general?

https://play.golang.org/p/rF8jPEGDZzk

2. For lazy initialization double-check pattern. I found on several sources that it is safe, still race detector complains about data races (which is correct because there is race, which is done on purpose) - https://software.intel.com/en-us/node/506123

And of course this code seems to be working as expected

https://play.golang.org/p/gUzWHr-1K9H

3. Now question is, can i disable these warnings somehow so race detector won't complain about safe code because i have ton of these and it's obfuscating usable output?

4. Im not sure about instruction ordering, will it always be the case that threads that see initialized=true will also see myconfig AFTER the update, because myconfig isn't synchronized. Would it be better to just use pointer instead and check for nil value or even atomic.Value?

5. How these patterns are working with other CPUs, are they considered unsafe on anything other than intel architecture?

So im writing some code which will do hundreds of thousands of operations per second and these operations need to read config (probably in some cases at multiple places). You think using just atomic.Value (Store/Load) will have impact when i run it on machine with eg. 96 cores? Maybe RW mutex would be better? What you think? I could just run some synthetic benchmarks but for these usually seem to not have much use in real life scenarios for multithreaded code.

Thanks

Brian Candler

unread,

Apr 14, 2020, 8:34:14 AM4/14/20

to golang-nuts

On Tuesday, 14 April 2020 12:38:44 UTC+1, Slawomir Pryczek wrote:

1. If data that is accessed or changed - it needs to go into CPU cache first, and AFAIK to caches whole memory block not just this single area of memory on which we're operating. When we're doing writes without synchronization to something which is tightly packed, how does it work that cache isn't poisoned, and the result is still correct, even if each cpu core is having its own L1 and L2 caches?

That's dealt with by the CPU: see https://en.wikipedia.org/wiki/Cache_coherence

So im writing some code which will do hundreds of thousands of operations per second and these operations need to read config (probably in some cases at multiple places). You think using just atomic.Value (Store/Load) will have impact when i run it on machine with eg. 96 cores? Maybe RW mutex would be better? What you think?

If the config values are static, it should be safe for all threads to read unprotected the same shared area.

If you need the config to change, you could create a new config object and then somehow signal to each goroutine that a new config is available (e.g. pass the pointer to the new config over a channel). The goroutine will need to check for the update message from time to time of course.

If the config isn't too huge, you could just pass the whole config over the channel, and let each goroutine have its own local copy: "share by communicating". Just remember that some values in go contain hidden pointers - in particular slices and maps.

There are suggested patterns for dealing with this shown in the examples under atomic.Value.

In general: I suggest avoiding premature optimisation. You expect that reading the config safely (with atomic/mutex protection) is going to take a sizeable proportion of the runtime - but even the most experienced programmers are often wrong about their gut feeling as to where the hot spots are. Build it in the clearest / most obvious way; then profile it; *then* decide where the important optimisations are.

Ian Lance Taylor

unread,

Apr 14, 2020, 3:25:15 PM4/14/20

to Slawomir Pryczek, golang-nuts

On Tue, Apr 14, 2020 at 4:39 AM Slawomir Pryczek <slawe...@gmail.com> wrote:
>
> Hi Guys, was wondering about some things related to multithread code.
>
> 1. If data that is accessed or changed - it needs to go into CPU cache first, and AFAIK to caches whole memory block not just this single area of memory on which we're operating. When we're doing writes without synchronization to something which is tightly packed, how does it work that cache isn't poisoned, and the result is still correct, even if each cpu core is having its own L1 and L2 caches?

If you use atomic writes, the CPU cores will communicate to either
invalidate cache lines or provide an update of the new memory
contents. On x86 this happens even if you don't do an atomic write.
On some other processors, the other CPU cores can see old data in that
memory address for some period of time.

> Are such patterns, when single thread is only operating on single cell(s) of memory, considered thread safe for slices and in general?
> https://play.golang.org/p/rF8jPEGDZzk

As far as I know this is safe on all modern multi-core processors.
They will use cache snooping or other techniques to make sure that at
most one core has a dirty version of a particular cache line. Of
course, if there are many cores the performance can be quite bad.

> 2. For lazy initialization double-check pattern. I found on several sources that it is safe, still race detector complains about data races (which is correct because there is race, which is done on purpose) - https://software.intel.com/en-us/node/506123
> And of course this code seems to be working as expected
>
> https://play.golang.org/p/gUzWHr-1K9H

This is unsafe in general because nothing ensures that memory changes
made before setting initialized to true are visible to other cores
that happen to see initialized set to true. That is, in your example,
some other core could see the variable initialized set to true without
seeing any values stored in the variable myconfig. That can't happen
on x86, which enforces memory ordering of writes, but it can happen on
other non-x86 processors.

Note that I think that your link to software.intel.com doesn't support
your argument even on x86, as in that link they are using atomic
writes, but your code is using ordinary writes.

> 3. Now question is, can i disable these warnings somehow so race detector won't complain about safe code because i have ton of these and it's obfuscating usable output?

No. Your code isn't safe, and the race detector should and will complain.

> 4. Im not sure about instruction ordering, will it always be the case that threads that see initialized=true will also see myconfig AFTER the update, because myconfig isn't synchronized. Would it be better to just use pointer instead and check for nil value or even atomic.Value?

It would be better to use a sync.Mutex or, if absolutely necessary,
use the calls in the sync/atomic package.

> 5. How these patterns are working with other CPUs, are they considered unsafe on anything other than intel architecture?

As noted Intel architectures enforce a global write ordering. That is
not true on most other multi-core processors.

> So im writing some code which will do hundreds of thousands of operations per second and these operations need to read config (probably in some cases at multiple places). You think using just atomic.Value (Store/Load) will have impact when i run it on machine with eg. 96 cores? Maybe RW mutex would be better? What you think? I could just run some synthetic benchmarks but for these usually seem to not have much use in real life scenarios for multithreaded code.

I agree that synthetic benchmarks are unlikely to help. You'll have
to benchmark your real code.

There is no single answer as to what is best as it depends on things
like the number of reads compared to the number of writes, and what
happens if you use out of date data, and what is driving your
operations. For example, in Go, if your operations are being driven
by channel data, and if it doesn't matter if you use slightly out of
date configuration information, then it's easy enough to send
configuration updates on a channel and have the workers use select to
pick up either the next data item or the configuration change. I
don't know whether that makes any sense for your situation.

Ian

Slawomir Pryczek

unread,

Apr 14, 2020, 6:25:59 PM4/14/20

to golang-nuts

Thanks for very insightful posts. That's actually what i was interested in, as i was wondering if x86 is so advanced to invalidate caches on its own, or it's taken care by software/compiler or if that's just pure coincidence that this code actually works :)

Actually as i'm doing persistent connections i'll just read everything once, on-connect and then keep a copy per thread while doing the synchronization "normally". Basically i'm not into micro-optimizations too much, in production code i see that synchronization of short code leads to starvation that's why i try to avoid it, but yes probably redesigning code would be better than using some "hacks" which can be unsafe.

Jake Montgomery

unread,

Apr 15, 2020, 12:17:43 PM4/15/20

to golang-nuts

On Tuesday, April 14, 2020 at 7:38:44 AM UTC-4, Slawomir Pryczek wrote:

Hi Guys, was wondering about some things related to multithread code.

2. For lazy initialization double-check pattern. I found on several sources that it is safe, still race detector complains about data races (which is correct because there is race, which is done on purpose) - https://software.intel.com/en-us/node/506123
And of course this code seems to be working as expected

https://play.golang.org/p/gUzWHr-1K9H

It is a race. Your code may 'seem' safe, and it may always work correctly on some processor and Go version combinations. But it also might work some of the time on some processor and Go version combinations, but fail in unexpected and hard to debug ways other times. I always like to suggest reading Benign Data Races: What Could Possibly Go Wrong in cases like this. (It is from the same site you reference.)

Be very careful with concurrency. Just because you found "several sources that it is safe", does not mean that it is. Even if it is safe in the case the article sites, the devil is always in the details with concurrency. It may be safe in C++, with its memory model, and guarantees, but that does not necessarily mean it is safe in another language, such as Go. Also, the article you reference (https://software.intel.com/en-us/node/506123) uses a tbb::atomic<T*>, which is not not a go type. In your example you use a bool instead. Details. Also, the documentation specifically says that the example in the article only works on Intel processors, so even in C++, it would be unsafe for that code to run on processors.

One final note on this kind of optimization. Sometimes the Go runtime or even standard libraries play such 'crazy games'. Do not be fooled, it is ok for them to do it, but not for us. The runtime and libraries are aware of what processor they are running on (if necessary), and are tied to a specific version of Go. If the language internals change in a way that breaks their 'crazy games', they can fix them at the same time. We do not have that luxury.

Reply all

Reply to author

Forward