Observations of the language after coding a lock-free hashtable in Go

977 views
Skip to first unread message

dan....@gmail.com

unread,
Jun 30, 2014, 11:38:59 AM6/30/14
to golan...@googlegroups.com
Sharing memory directly between goroutines which may be running on separate hardware threads is firmly against the principles of Go. Nonetheless Go is a practical language and it allows you to do nasty things in contrary to the spirit of the language when you need to, that's important to me.

Over the weekend I wrote my first Go code, and ran my first go compiles, tests, and benchmarks. I implemented a hashtable for
32bit integers that allows multiple readers concurrent access with a single writer. I chose it deliberately because it seemed like something Go is not suited for. And parts of it were painful (by the way scroll to the bottom if you just want to see the benchmarks.)

I initially wanted to implement it to support 64bit integers as well, but without generics there just wasn't a reasonable way of doing that. You end up with all kinds of methods ending in 32 or 64 (worse if you wanted more than two types) and that proliferates up the call chain with runtime checks to make sure you're not calling a 32bit method on a hashtable with 64bit keys until it gets easier to just have a distinct type for each. The Go team is well aware of those limitations I think, and I fully expect to see generics someday in the language. If stodgy old Java got them (well something that looked like them) then I think it will eventually happen for Go as well.

The other part that was painful was trying to do "volatile" accesses in a language that has no memory model of note. I had to turn every access into use of the sync/atomic package. That likely bears a runtime cost, but it wasn't prohibitive, as we'll see when I get to the benchmarks.

// Do NOT reorder this, otherwise parallel lookup could find key but see empty value
atomic.StoreInt32((*int32)(unsafe.Pointer(slot+uintptr(4))), val)
atomic.StoreInt32((*int32)(unsafe.Pointer(slot)), key)

That's kind of nasty, but still readable. All I really need here is a C++ relaxed store, followed by a release store. For reading I would have to read key first with an acquire load, and then I can use a relaxed load for the value. On x86 there's no memory fences or anything else needed to do that, the compiler just has to output regular loads and stores without reordering them or optimizing them away. I miss the ability to do that in Go without a runtime cost. Go will probably choose sequential consistency for the sync/atomic package, which means a full memory barrier instruction needs to be emitted with every operation, and the compiler can then optimize some of those away. Java volatile does something like that IIRC, but it also gives the ability to do a release store with the putOrderedXXX in the unsafe class.

Some other nastiness is storing the hashtable itself as an unsafe.Pointer:

type HashIndex struct {
tbl unsafe.Pointer
}

When what I really wanted was:

type HashIndex struct {
tbl *hashTable
}

Here HashIndex wraps a pointer to a fixed size hashTable which implements the lookup/insert methods. It has to be done this way, because to grow the HashIndex, we need to construct a new, larger fixed size hashTable and copy the items across, and then swap it atomically with the old one. There doesn't seem to be a way with atomic.StoreXXX to store a strongly typed pointer of anything, you need a pointer to an unsafe.Pointer as the destination and the only way I could see to do that was store it as an unsafe.Pointer and cast it before using it. That makes the type system useless in this case, but it's not the end of the world since I'm bypassing it all over anyway to do this. Some getters and setters make it a little easier:

func (idx *HashIndex) table() *hashTable {
return (*hashTable)(atomic.LoadPointer(&idx.tbl))
}

func (idx *HashIndex) setTable(tbl *hashTable) {
atomic.StorePointer(&idx.tbl, unsafe.Pointer(tbl))
}

But there seems to be a bigger problem here. Go's memory model doesn't promise anything in these cases, and it has no way outside of sync/atomic to tell the compiler not to reorder or optimize out operations. So to use syscall.Mmap safely, you can't use the byte slice returned.

bytes := syscall.Mmap(...)
bytes[0] = 42 // Go compiler is allowed to notice that nobody reads bytes, so it can optimize this away

You'd need to do this:

atomic.StoreInt32((*int32)(unsafe.Pointer(&bytes[0])), (atomic.LoadInt32((*int32)(unsafe.Pointer(&bytes[0]))) &^ 0xff) | 42)

Not only does that make your eyes bleed, but it's safe to say nobody does that. It gets worse if you want to access index i instead of zero, because you have to mask off the address to get the containing dword and then modify the bit operations to mask off and change only the byte you're interested in. Yuck! So one day one of two things will happen. 

1) Go will make their memory model stronger, prohibiting optimizing away redundant writes inhibiting lots of optimizations.
2) Go will do those optimizations and break every program that uses Mmap.

I somehow don't see either of those as viable options. And heaven help you if you were crazy enough to read from that Mmap in Go. Optimizations to eliminate "redundant" stores don't help performance much, but eliminating redundant reads, i.e. reading into registers and using those registers, is the bread and butter of every compiler. So every program that reads from a Mmap is already fundamentally broken with respect to the Go memory model. So when you read multiple times from a Mmap in Go, that read can be arbitrarily stale depending on how the compiler cached the reads in registers and reused them, which it is not only fully allowed to do, it must already do it.

One obvious solution to all of these problems is add volatile to the language. Then you could declare Mmap as returning volatile byte[] and be done with it. But that is like C++ const/volatile and proliferates around the type system. It's nasty and against the spirit of Go. I don't want to see that in Go. Here volatile byte[] would actually need to be a different SliceHeader, one with a volatile Data. Nasty.

An alternative is to make a more robust atomics package that can do these kinds of things more cleanly, more easily, and make it clear in the Go memory model that you have to use that even if you only want to read from a mmap. Add support for all the data types, not just int32 and 64, bytes and bools too. Add special runtime support for dealing with slices and typed pointers so you don't have to subvert the type system to use them. Most peoples programs will still be broken in theory, but they probably work in practice (or they'd notice.) At least there would be a not so painful way to fix them if needed, and a way to do it right for those who know they need volatile reads and writes. The Go compiler would be free to do all the optimizations it wants. I'd love to see support for weaker atomics as well, C++ style, rather than just sequential consistency, but maybe in time.

The end result of my weekend's hacking? On an i7-4770K @ 3.5ghz

BenchmarkGoMapInsert-4 20000000         110 ns/op
BenchmarkHashIndexInsert-4 100000000   25.6 ns/op
BenchmarkGoMapLookup-4 50000000       78.5 ns/op
BenchmarkHashIndexLookup-4 100000000 17.7 ns/op 

About 4x faster than Go's builtin map, for int32 key/value on both insert and lookup. And it allows any number of reader threads concurrent access with a writer without any synchronization or blocking, unlike Go maps. It doesn't allow zero keys, unlike go maps, and it doesn't allow deletes. It's just int32 key and value. Hardly apples to apples, but the performance of pure Go code is impressive nonetheless. 200 LOC, not counting tests.

The amazing part to me? That I could setup Go on windows and a linux virtual machine, and do all of that in a language I've never used before in my life in a single weekend. That means that from a pragmatic standpoint of being easy to use, easy to write, easy to compile, test, and benchmark, Go is a wonderful success.

There's still some rough edges though, with the memory model and lack of generics, and I hope Go will take a practical approach rather than an idealistic approach. Practical languages get used. Idealistic languages (Haskell anyone?) remain toys.








Jan Mercl

unread,
Jul 1, 2014, 4:52:22 AM7/1/14
to dan....@gmail.com, golang-nuts
On Mon, Jun 30, 2014 at 5:38 PM, <dan....@gmail.com> wrote:
> The other part that was painful was trying to do "volatile" accesses in a
> language that has no memory model of note.

No memory model of note? Are you talking about
http://golang.org/ref/mem ? If so, then you may have misunderstood
(some part) of the document.

> I had to turn every access into
> use of the sync/atomic package. That likely bears a runtime cost, but it
> wasn't prohibitive, as we'll see when I get to the benchmarks.

The point you may have missed in the Go memory model is that it allows
the compiler to _not write_ to memory at all - where it's possible -
when there's no synchronization. IOW, the above statement gets it
upside down. The memory model allows optimization of programs using
synchronization to do _less_ traffic on the memory bus - and to
explicitly "ask" for the writes when necessary.

> The end result of my weekend's hacking? On an i7-4770K @ 3.5ghz
>
> BenchmarkGoMapInsert-4 20000000 110 ns/op
> BenchmarkHashIndexInsert-4 100000000 25.6 ns/op
> BenchmarkGoMapLookup-4 50000000 78.5 ns/op
> BenchmarkHashIndexLookup-4 100000000 17.7 ns/op

I cannot be sure without seeing the code for the benchmarks but from
the numbers I guess the methodology of them is probably flawed. The
numbers suggest that you're doing a variable amount of work in the
benchmark - in contrast of doing a constant amount of work for b.N
times. If the former is the case then the numbers are probably not
much usable because as they may include an unknown quantity: f(b.N).

> There's still some rough edges though, with the memory model and lack of
> generics, and I hope Go will take a practical approach rather than an
> idealistic approach.

It could be useful to know which edges of the memory model you
consider rough. Maybe they are like that, maybe they could be
explained/reasoned why they exist.

Wrt generics: Go, so far, does take the practical route and that's why
it doesn't have generics. No one said they will never come, no one
said they will. AFAICT, the more seasoned a Go developer is, the less
likely it is (s)he misses generics.

-j

Dmitry Vyukov

unread,
Jul 1, 2014, 5:16:06 AM7/1/14
to dan....@gmail.com, golang-nuts
I second most of your experience. You can see my adventures with these
awful casts here:
http://golang.org/src/pkg/sync/pool.go
https://codereview.appspot.com/107180045/diff/100001/src/pkg/regexp/freelist.go
There is some additional complications like that you can't use ABA
counters directly with pointers, or interface{} is 2 words so you
can't update it atomically.

Fortunately average Go programmer writes such code roughly 0.00% of time.

For int32/int64, you can use template code generation to generate 2
different packages with int32 and int64.

Go memory model basically boils down to "data races are banned;
race-free programs are sequentially consistent".

Regarding the mmap, I am not I follow. If it's a shared mmap and you
read it concurrently from a different process, then you want the
stores to be atomic anyway. If it's a private mmap, then you need to
do something like close or flush after the writes; and that will
ensure that compiler actually emits stores since it does not know
where the memory come from (so it can't prove that close/flush do not
read the memory).

egon

unread,
Jul 1, 2014, 5:31:09 AM7/1/14
to golan...@googlegroups.com

On Monday, 30 June 2014 18:38:59 UTC+3, Daniel Eloff wrote:

The end result of my weekend's hacking? On an i7-4770K @ 3.5ghz

BenchmarkGoMapInsert-4 20000000         110 ns/op
BenchmarkHashIndexInsert-4 100000000   25.6 ns/op
BenchmarkGoMapLookup-4 50000000       78.5 ns/op
BenchmarkHashIndexLookup-4 100000000 17.7 ns/op 

About 4x faster than Go's builtin map, for int32 key/value on both insert and lookup. And it allows any number of reader threads concurrent access with a writer without any synchronization or blocking, unlike Go maps. It doesn't allow zero keys, unlike go maps, and it doesn't allow deletes. It's just int32 key and value. Hardly apples to apples, but the performance of pure Go code is impressive nonetheless. 200 LOC, not counting tests.

Try comparing against these implementations: http://play.golang.org/p/8pJ7gTuuJR
 
There's still some rough edges though, with the memory model and lack of generics, and I hope Go will take a practical approach rather than an idealistic approach. Practical languages get used. Idealistic languages (Haskell anyone?) remain toys.

IOH, Go is practical for a reason and one of those reasons is leaving out features.

+ egon

egon

unread,
Jul 1, 2014, 5:33:57 AM7/1/14
to golan...@googlegroups.com


On Tuesday, 1 July 2014 12:31:09 UTC+3, egon wrote:

On Monday, 30 June 2014 18:38:59 UTC+3, Daniel Eloff wrote:

The end result of my weekend's hacking? On an i7-4770K @ 3.5ghz

BenchmarkGoMapInsert-4 20000000         110 ns/op
BenchmarkHashIndexInsert-4 100000000   25.6 ns/op
BenchmarkGoMapLookup-4 50000000       78.5 ns/op
BenchmarkHashIndexLookup-4 100000000 17.7 ns/op 

About 4x faster than Go's builtin map, for int32 key/value on both insert and lookup. And it allows any number of reader threads concurrent access with a writer without any synchronization or blocking, unlike Go maps. It doesn't allow zero keys, unlike go maps, and it doesn't allow deletes. It's just int32 key and value. Hardly apples to apples, but the performance of pure Go code is impressive nonetheless. 200 LOC, not counting tests.

Try comparing against these implementations: http://play.golang.org/p/8pJ7gTuuJR
 

Forgot to note that code is completely untested, I just tested whether it compiles.

Daniel Eloff

unread,
Jul 1, 2014, 10:00:32 AM7/1/14
to golan...@googlegroups.com, dan....@gmail.com
Hi Dmitry,

The lock-free programming world is very small, I suspect you must be the same Dmitry Vyukov from 1024cores.net. I have great respect for your work, and have used both Relacy and TSAN in the past as well as your MPMC queue.

I see you took the same approach in pool.go, so I guess there's not much improvement to be had there currently. It's good to know that if the standard library does those kinds of things my code will probably not break in the future. It can be improved by improving the atomic/sync package, there's really no language / memory model changes necessary for lock free programming in Go.

About template code generation, I did a bit of searching and found: http://golang.org/src/cmd/go/test.go which uses Must templates to generate go code. A bit nasty but it would work. I'll probably take the approach of copy paste in this case though, because once a hashtable is written and tested there's rarely a reason to change it again.

About the memory map, I'm hoping someone would tell me I'm wrong, and tell me why. Let me try and explain myself better.

// Other side of memory map is another process, writing atomically
mem, _ := syscall.Mmap(...)

for {
  sequenceStart := ReadInt32(&mem[0])
  for i := 0; i < 10; i++ {
    numbers[i] = ReadInt32(&mem[4 + i*4])
  }
  sequenceEnd := ReadInt32(&mem[44])
  if sequenceStart == sequenceEnd {
    break
  }
}

So a typical seq lock pattern, like they use in the Linux kernel to read multiple words consistently. Read the starting sequence number, read the data, then check that the ending sequence number matched, otherwise there was a concurrent write happening and you need to try again.

I don't see how that kind of thing can work in Go, or basically any code that uses mmap and reads more than once from it while another process is changing it. According to the Go memory model, it can assume that nobody can write to mem concurrently and so it has every reason in the world to unroll that loop, cache those reads in registers, and not read them the second time through the loop. In fact the compiler seems to be allowed to notice that if the sequence numbers don't match, it should loop forever. So it could transform the code into:

sequenceStart := ReadInt32(&mem[0])
  for i := 0; i < 10; i++ {
    numbers[i] = ReadInt32(&mem[4 + i*4])
  }
  sequenceEnd := ReadInt32(&mem[44])
  if sequenceStart != sequenceEnd {
    for {} // Infinite loop
  }

It might be a braindead optimization, but I don't think the memory model prohibits it. I'm hoping you'll tell me I'm wrong :)

About the write scenario, I'm curious how the compiler knows that the memory isn't just a regular slice? Because if it somehow can figure out that the memory is coming from somewhere else and not optimize away writes to it, then there's no problem. The stores are going to be atomic anyway for aligned natural word sizes in Go, on x86/64, because the underlying mov instruction is and there's no reason for the Go compiler to generate strange machine code. It worked in C/C++ for the same reason long before they had atomics. You just need an atomic store somewhere to ensure ordering of writes, matched by an atomic load in the other process.

But you're right that another system call taking the slice as an argument, e.g. unmap, could very well read that memory, and the Go compiler would have to really do those writes because it can't prove otherwise. That didn't occur to me. Repeated writes, like the sequence lock example but with writes instead of reads could probably still be optimized away because Go doesn't think there can be a concurrent reader.

What do you think? I guess repeated writes and repeated reads of a memory map only make sense with concurrent reader/writer on the other side, so most Go programmers will never notice. For those crazy people like us, there's atomic/sync to force the result we want. So that seems a lot better than what I first thought.

Cheers,
Dan

Daniel Eloff

unread,
Jul 1, 2014, 10:34:13 AM7/1/14
to golan...@googlegroups.com, dan....@gmail.com
Hi Jan,


No memory model of note? Are you talking about
http://golang.org/ref/mem ? If so, then you may have misunderstood
(some part) of the document.


What I mean by that is that the Go memory model doesn't allow for concurrent access without synchronization. It doesn't permit race conditions. Which means that when you need concurrent access to memory with something that scales better than a mutex, you're out of luck. Fortunately the sync/atomic package comes to the rescue, it's just a little painful to work with. It currently makes no guarantees about its own memory model, but will probably have to be sequential consistency by default in order to not break existing code. So I'll qualify that statement: from a lock-free programming standpoint Go has no memory model of note.
 
> I had to turn every access into
> use of the sync/atomic package. That likely bears a runtime cost, but it
> wasn't prohibitive, as we'll see when I get to the benchmarks.
 
The point you may have missed in the Go memory model is that it allows
the compiler to _not write_ to memory at all - where it's possible -
when there's no synchronization. IOW, the above statement gets it
upside down. The memory model allows optimization of programs using
synchronization to do _less_ traffic on the memory bus - and to
explicitly "ask" for the writes when necessary.

I understand that, if you look at the point I was trying to make about memory maps and wondering how the Go compiler doesn't think it can omit those writes, you'll see that I do. Using atomic/sync has a runtime cost, if it has to guarantee sequential consistency (a total order) then it really has to issue full memory barriers with every load and store. That bears a significant cost, even on x86/64. I think that's even stronger than what Java does with volatile (Java uses sfence/lfence instead of the full mfence, IIRC.) That cost is dozens of cycles at best. I have not yet checked the assembly to see what Go actually produces, but I know they're leaning toward sequential consistency from what I've seen in public discussions on the subject. I hope that one day Go will support more relaxed atomic operations as well.


I cannot be sure without seeing the code for the benchmarks but from
the numbers I guess the methodology of them is probably flawed. The
numbers suggest that you're doing a variable amount of work in the
benchmark - in contrast of doing a constant amount of work for b.N
times. If the former is the case then the numbers are probably not
much usable because as they may include an unknown quantity: f(b.N).

Here's the code for the lookup benchmarks, I may be new to Go, but I think they're correct. The speed advantage likely comes from being a specialized implementation for int32, with no need to tell the difference between a 0 key and an empty or deleted slot in the hashtable. That and I used the Python hashing and random probing algorithm, which is extremely efficient for the case of sequential integer keys, and one of the most heavily tuned dictionary implementations out there.

func BenchmarkGoMapLookup(b *testing.B) {
b.StopTimer()

m := make(map[int32]int32)
for i := 1; i < b.N; i++ {
m[int32(i)] = int32(i)
}

b.StartTimer()

sum := int32(0)
for i := 1; i < b.N; i++ {
sum += m[int32(i)]
}
}

func BenchmarkHashIndexLookup(b *testing.B) {
b.StopTimer()

idx := &HashIndex32{}
for i := 1; i < b.N; i++ {
idx.Add(int32(i), int32(i))
}

b.StartTimer()

sum := int32(0)
for i := 1; i < b.N; i++ {
val, _ := idx.Lookup(int32(i))
sum += val
}
}
 
It could be useful to know which edges of the memory model you
consider rough. Maybe they are like that, maybe they could be
explained/reasoned why they exist.

Guarantees in the atomics package, atomics for all natural word sizes, not just 32/64 bit, relaxed atomics for better performance. Bonus points for some way to do atomics that doesn't invovle bypassing the type system completely with unsafe.Pointer, but that would need special compiler support, or generics.
 

Wrt generics: Go, so far, does take the practical route and that's why
it doesn't have generics. No one said they will never come, no one
said they will. AFAICT, the more seasoned a Go developer is, the less
likely it is (s)he misses generics.

I hope I too learn to work around not having generics and stop missing them so much. One thing I see a lot in Go is functions taking interface{}, which is the Go equivalent of object. When you write code like that you unfortunately bypass the type system and turn compile time errors into runtime cast errors. Then it really starts to feel like Python! But right now Go has more important things to worry about than generics, but when the announcement of generics in Go comes I'll light up more than I used to at Christmas as a kid. Definitely looking forward to that, hopefully it will come to pass some day.

Cheers,
Dan

Mateusz Czapliński

unread,
Jul 1, 2014, 11:44:42 AM7/1/14
to golan...@googlegroups.com, dan....@gmail.com


On Tuesday, July 1, 2014 4:34:13 PM UTC+2, Daniel Eloff wrote:
Wrt generics: Go, so far, does take the practical route and that's why
it doesn't have generics. No one said they will never come, no one
said they will. AFAICT, the more seasoned a Go developer is, the less
likely it is (s)he misses generics.

I hope I too learn to work around not having generics and stop missing them so much. One thing I see a lot in Go is functions taking interface{}, which is the Go equivalent of object. When you write code like that you unfortunately bypass the type system and turn compile time errors into runtime cast errors. Then it really starts to feel like Python! But right now Go has more important things to worry about than generics, but when the announcement of generics in Go comes I'll light up more than I used to at Christmas as a kid. Definitely looking forward to that, hopefully it will come to pass some day.

For what it's worth (assuming you haven't seen the article yet), the main stumbling point with generics IIUC is probably most concisely summed up here:
  http://research.swtch.com/generic
Should you know of something important/interesting that would have its weight, you'd be probably more than welcome to share. Unfortunately, on my side, this article from 2009 is all I can say about my knowledge of the situation. That's because, from my observations, all discussions on the topic on the mailing list tend to *immediately* spiral into into exploding debates over syntax, mixed with the same solutions repeated over and over, and it's really hard for a common man like me to find any signal there.

/M.

Ian Lance Taylor

unread,
Jul 1, 2014, 12:33:49 PM7/1/14
to Daniel Eloff, golang-nuts
On Tue, Jul 1, 2014 at 7:34 AM, Daniel Eloff <dan....@gmail.com> wrote:
>
> What I mean by that is that the Go memory model doesn't allow for concurrent
> access without synchronization. It doesn't permit race conditions. Which
> means that when you need concurrent access to memory with something that
> scales better than a mutex, you're out of luck. Fortunately the sync/atomic
> package comes to the rescue, it's just a little painful to work with. It
> currently makes no guarantees about its own memory model, but will probably
> have to be sequential consistency by default in order to not break existing
> code. So I'll qualify that statement: from a lock-free programming
> standpoint Go has no memory model of note.

That is correct and at this point that is intentional. Experience has
shown that very few people can write correct lock-free programs. Go
does not provide a lock-free programming model. That is unlikely to
change any time soon. I think this is a case where comprehensibility
of the programming model trumps runtime efficiency.


Your other observations about mmap appear to me to be correct. Go
does not provide a way to use a memory mapping shared among multiple
processes (as opposed to multiple goroutines in the same process). If
that ever changes my guess is that it would be through some sort of
function call interface.

Ian

Daniel Eloff

unread,
Jul 1, 2014, 12:52:53 PM7/1/14
to golan...@googlegroups.com, dan....@gmail.com
On Tuesday, July 1, 2014 10:44:42 AM UTC-5, Mateusz Czapliński wrote:

For what it's worth (assuming you haven't seen the article yet), the main stumbling point with generics IIUC is probably most concisely summed up here:
  http://research.swtch.com/generic
Should you know of something important/interesting that would have its weight, you'd be probably more than welcome to share. Unfortunately, on my side, this article from 2009 is all I can say about my knowledge of the situation. That's because, from my observations, all discussions on the topic on the mailing list tend to *immediately* spiral into into exploding debates over syntax, mixed with the same solutions repeated over and over, and it's really hard for a common man like me to find any signal there.

Thanks for the link, it was enlightening. I guess I could say that I don't like any of the three choices presented. I have a feeling that the best option would be something like a mix. Code duplication/specialization for builtin types, because the alternative that I do currently is copy paste the code and change the types by hand anyway, so it's not going to make the binaries bigger or the compiler much slower (but I'm no expert on compilers!) For everything else, maybe something like interfaces where you're actually passing around a tuple of type and reference. That can be done with interfaces right now, but you need to manually insert the cast. Having the compiler do the cast and check the types statically is an advantage with little down side, much like Java's generics. People were already using object and casting in Java, so it made no performance difference. The same can be said of Go with interfaces, so that part seems to have little downside. The waters become muddy with structs. Do you treat them like builtin types and specialize? Do you treat them like interfaces and store them as references to a struct Java style? One way bloats your code and slows the compiler, the other gives you a runtime hit for having the extra indirection. Neither choice is right or wrong in every case, it really depends. I'd like a language that let me choose between specialization (C++ style) or boxing/unboxing (Java style.) It would choose the Java way by default, but when I profile and find it's a problem, I can change it to force specialization and fix it. Actually, maybe that would just work for generics in general, and not make any exception for builtin types. You could use a collection with boxing in one part of the code, and with specialization in another (assuming different types, if it's the same types it should either be specialized everywhere or boxed everywhere, but not a mix.)

In a situation where you must choose between different tradeoffs, letting the programmer choose, with a sensible default, gives the most flexibility. We constantly make those kinds of tradeoffs, that's our job as engineers. How that would work, if it could work, or what the syntax looks like is not for me to say. I somehow doubt I'm offering anything that hasn't been said before, but it's my thoughts for what they're worth.

Cheers,
Dan

Daniel Eloff

unread,
Jul 1, 2014, 1:04:03 PM7/1/14
to golan...@googlegroups.com, dan....@gmail.com
That is correct and at this point that is intentional.  Experience has
shown that very few people can write correct lock-free programs.  Go
does not provide a lock-free programming model.  That is unlikely to
change any time soon.  I think this is a case where comprehensibility
of the programming model trumps runtime efficiency.

Well it wouldn't hurt to have the memory model for the atomics package specified, but I know that's planned anyway and rightly not a current priority. Having some additional options like relaxed atomics with no runtime overhead would be a dream. But as you say, most people don't and shouldn't care. I suspect the people who would care the most are the actual Go runtime engineers. Otherwise I think it's fine the way it is.
 
Your other observations about mmap appear to me to be correct.  Go
does not provide a way to use a memory mapping shared among multiple
processes (as opposed to multiple goroutines in the same process).  If
that ever changes my guess is that it would be through some sort of
function call interface.

I can think of a few other low-level things that would run into that problem, such as dealing with hardware / writing device drivers, but that's way outside the target area for Go as a language. mmap for sharing between goroutines doesn't get you out of the woods either, since the memory model explicitly prohibits unsynchronized access regardless of whether it is within the same address space or not.

I think that's ok too. It will likely work in the common case on synchronized access. Everyone else (all 5 people) should use the atomics package.

Cheers,
Dan

Erwin

unread,
Jul 1, 2014, 1:46:02 PM7/1/14
to Ian Lance Taylor, Daniel Eloff, golang-nuts

Your other observations about mmap appear to me to be correct.  Go
does not provide a way to use a memory mapping shared among multiple
processes (as opposed to multiple goroutines in the same process).  If
that ever changes my guess is that it would be through some sort of
function call interface.

Hmm, I was hoping to use memory mapping between processes to implement an efficient 'plug-in' system. I don't fully understand all of the details mentioned in this thread, but it looks like the compiler might optimize out reads and writes from/to the mapped memory, if it doesn't know that another process is potentially reading/writing the mapped memory as well? If so, couldn't we trick the compiler to not optimize out reads/writes by adding a goroutine just for the sake of reading/writing some values of the mapped memory?   
 

Ian Lance Taylor

unread,
Jul 1, 2014, 1:55:34 PM7/1/14
to Erwin, Daniel Eloff, golang-nuts
It's more a theoretical concern at present, but that's basically
right: the Go compiler does not know that some other process might
change the memory, so the compiler could in principle skip certain
reads and writes that have no obvious purpose. The hacky workaround
would be not a goroutine, but something like passing the mmap slice to
a cgo function call that is a no-op but that the Go compiler can't
see. That will ensure that all values are written out to memory and
that no values are cached across the call.

Ian

Erwin

unread,
Jul 1, 2014, 2:27:45 PM7/1/14
to Ian Lance Taylor, Daniel Eloff, golang-nuts

> Hmm, I was hoping to use memory mapping between processes to implement an
> efficient 'plug-in' system. I don't fully understand all of the details
> mentioned in this thread, but it looks like the compiler might optimize out
> reads and writes from/to the mapped memory, if it doesn't know that another
> process is potentially reading/writing the mapped memory as well? If so,
> couldn't we trick the compiler to not optimize out reads/writes by adding a
> goroutine just for the sake of reading/writing some values of the mapped
> memory?

It's more a theoretical concern at present, but that's basically
right: the Go compiler does not know that some other process might
change the memory, so the compiler could in principle skip certain
reads and writes that have no obvious purpose.  The hacky workaround
would be not a goroutine, but something like passing the mmap slice to
a cgo function call that is a no-op but that the Go compiler can't
see.  That will ensure that all values are written out to memory and
that no values are cached across the call.


Ok, I see. So I was worried too soon, and this theoretical future compiler behavior can be easily circumvented.   

Dmitry Vyukov

unread,
Jul 8, 2014, 1:42:01 AM7/8/14
to Daniel Eloff, golang-nuts
On Tue, Jul 1, 2014 at 6:00 PM, Daniel Eloff <dan....@gmail.com> wrote:
> Hi Dmitry,
>
> The lock-free programming world is very small, I suspect you must be the
> same Dmitry Vyukov from 1024cores.net. I have great respect for your work,
> and have used both Relacy and TSAN in the past as well as your MPMC queue.

Thanks!


> I see you took the same approach in pool.go, so I guess there's not much
> improvement to be had there currently. It's good to know that if the
> standard library does those kinds of things my code will probably not break
> in the future.

That's not to say that there are some plans to break such code. But we
maintain compiler/runtime/stdlib as a single thing, so if some corner
rules change we update stdlib at the same time.



> It can be improved by improving the atomic/sync package,
> there's really no language / memory model changes necessary for lock free
> programming in Go.
>
> About template code generation, I did a bit of searching and found:
> http://golang.org/src/cmd/go/test.go which uses Must templates to generate
> go code. A bit nasty but it would work. I'll probably take the approach of
> copy paste in this case though, because once a hashtable is written and
> tested there's rarely a reason to change it again.

I did not mean text/template package. As far as I understand there are
some helper tools/packages specifically for code generation for
generic containers/algorithms. Never used them, though, so can't
suggest anything particular.
Well, if it's a synchronization algorithm with concurrent memory accesses,
sync/atomic is what doctor said. sync/atomic operations expect that
the data can be read by another goroutines, and that the data can be
changed asynchronously by another goroutine.


> About the write scenario, I'm curious how the compiler knows that the memory
> isn't just a regular slice?

There is no need to distinguish between mmap-ed memory and regular
slices. If compiler can't prove that the memory can't be accessed by
other goroutines, it must flush all writes before all opaque function
calls and synchronization events.


> Because if it somehow can figure out that the
> memory is coming from somewhere else and not optimize away writes to it,
> then there's no problem. The stores are going to be atomic anyway for
> aligned natural word sizes in Go, on x86/64, because the underlying mov
> instruction is and there's no reason for the Go compiler to generate strange
> machine code. It worked in C/C++ for the same reason long before they had
> atomics. You just need an atomic store somewhere to ensure ordering of
> writes, matched by an atomic load in the other process.

Right, just take a look at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55981



> But you're right that another system call taking the slice as an argument,
> e.g. unmap, could very well read that memory, and the Go compiler would have
> to really do those writes because it can't prove otherwise. That didn't
> occur to me. Repeated writes, like the sequence lock example but with writes
> instead of reads could probably still be optimized away because Go doesn't
> think there can be a concurrent reader.

No, if you use proper atomic operations. That's what they are for --
concurrent accesses.
Reply all
Reply to author
Forward
0 new messages