var (
rec bool
mu = &sync.Mutex{}
)
func recording() bool {
mu.Lock()
defer mu.Unlock()
return rec
}
func setRecording(s bool) {
mu.Lock()
defer mu.Unlock()
rec = s
}
...
for {
v := <-data
if recording() {
record(v)
}
processData(v)
On Thu, Feb 16, 2012 at 18:14, Patrick Mylund Nielsen
I am not an expert on this but this is generally how I think about it:A context switch (changing from one thread of goroutine to another) can happen between any two machine instructions.
import "sync/atomic"
var recording int32
func isRecording() bool {
return recording > 0
}
func setRecording(shoudRecord bool) {
if shouldRecord {
atomic.CompareAndSwapInt32(&recording, 0, 1)
} else {
atomic.CompareAndSwapInt32(&recording, 1, 0)
}
}
Or am I missing something?
Thats how the corresponding java constructs mentioned earlier would have been used. So yes i guess.
Does sync/atomic work differently?
According to Chapter 8 Bus Locking, of the Intel 64 and IA-32
Architectures Software Developer’s Manual, Volume 3A
The memory-ordering model prevents loads and stores from being
reordered with locked instructions that execute earlier or later. [1]
So isRecording is guaranteed to return the correct value.
Bobby
1 - http://stackoverflow.com/questions/9027590/do-we-need-mfence-when-using-xchg
if isRecording() {
...
}
Otherwise, you're right. You can always explicitly do:
if recording > 0 {
...
}
yours,
Bobby
In practice this code will probably work, but it doesn't set a good
example. Instead of a compare-and-swap, you just want an atomic store.
recording should also be loaded atomically in isRecording.
import "sync/atomic"
var recording int32
func isRecording() bool {
return atomic.LoadInt32(&recording) != 0
}
func setRecording(shouldRecord bool) {
if shouldRecord {
atomic.StoreInt32(&recording, 1)
} else {
atomic.StoreInt32(&recording, 0)
}
}
This still doesn't avoid the tricky problem Vanja pointed out, but I'm
under the impression that Jake just wants to avoid getting his flag
somehow corrupted. For that case, this is good enough.
> The go compilers have inlining turned on by default; isRecording()
> should be inlined when used like:
>
> if isRecording() {
> ...
> }
Just because a function has been inlined does not mean it is atomic.
And it's probably not a good idea to depend on the compiler to inline
in any case.
- Evan
func record() {
...
}
if recording > 0 {
record()
}
can be rewritten as
func record() {
if recording <= 0 {
return
}
...
}
In this case, all putting a mutex around record() will do is to cause
the writer to sleep until record() is done, it won't cancel the
execution of record(). And if its not going to cancel record(), then
the only time you care about the value of recording is at the
beginning of the function, when you're deciding whether to execute it
or not. It doesn't matter if you get unscheduled immediately after
you check.
To put it another way, consider your timeline, modified to use a mutex:
> Context 1 grabs the mutex
> recording could be 1
> then you evaluate the if statement, and go into the block,
> then switch to context 2 and change recording to 0. This thread sleeps, blocking on the mutex.
> then switch back and record(v). You've recorded while recording is sleeping, waiting to set recording to 0
> context 1 drops the mutex
> context 2 wakes and acquires mutex, setting recording to 0.
So a mutex hasn't actually changed the behavior of this situation at
all. Unless I'm missing something, or am wrong :)
yours,
Bobby
# GOMAXPROCS=4 go test --bench=. --parallel=1
BenchmarkAtomicInt32-4 1000000000 0.03 ns/op
BenchmarkMutex-4 2000000000 0.05 ns/op
BenchmarkConcurrentAtomicInt32-4 1000000000 0.05 ns/op
BenchmarkConcurrentMutex-4 1000000000 0.56 ns/op
Each op is 2 million stores and reads, to be fair, but atomic _is_
much faster, and so probably preferable for the purpose we've
discussed (as long as you accept the race condition.)
Thanks Bobby and Evan!
Best,
Patrick
To answer my own question using a shoddy test
(https://gist.github.com/1849411): Yes. (I figured, but I wasn't sure
if the difference was significant.)# GOMAXPROCS=4 go test --bench=. --parallel=1
BenchmarkAtomicInt32-4 1000000000 0.03 ns/op
BenchmarkMutex-4 2000000000 0.05 ns/op
BenchmarkConcurrentAtomicInt32-4 1000000000 0.05 ns/op
BenchmarkConcurrentMutex-4 1000000000 0.56 ns/opEach op is 2 million stores and reads, to be fair, but atomic _is_
much faster, and so probably preferable for the purpose we've
discussed (as long as you accept the race condition.)
Please, rewrite the test to produce a result that isn't "garbage." :)
BenchmarkAtomicInt32-4 50000000 31.1 ns/op
BenchmarkMutex-4 20000000 90.3 ns/op
BenchmarkConcurrentAtomicInt32-4 5000000 530 ns/op
BenchmarkConcurrentMutex-4 500000 5478 ns/op
So, the conclusion is the same.
Thanks Jan.
I did say the test was shoddy, and yes, in this case optimization
works against me.Please, rewrite the test to produce a result that isn't "garbage." :)
So my first results were "correct", they just didn't run long enough.
I'm curious how big of a gap optimization makes between each, though,
and how to test something like this reliably.