Sharing memory directly between goroutines which may be running on separate hardware threads is firmly against the principles of Go. Nonetheless Go is a practical language and it allows you to do nasty things in contrary to the spirit of the language when you need to, that's important to me.
Over the weekend I wrote my first Go code, and ran my first go compiles, tests, and benchmarks. I implemented a hashtable for
32bit integers that allows multiple readers concurrent access with a single writer. I chose it deliberately because it seemed like something Go is not suited for. And parts of it were painful (by the way scroll to the bottom if you just want to see the benchmarks.)
I initially wanted to implement it to support 64bit integers as well, but without generics there just wasn't a reasonable way of doing that. You end up with all kinds of methods ending in 32 or 64 (worse if you wanted more than two types) and that proliferates up the call chain with runtime checks to make sure you're not calling a 32bit method on a hashtable with 64bit keys until it gets easier to just have a distinct type for each. The Go team is well aware of those limitations I think, and I fully expect to see generics someday in the language. If stodgy old Java got them (well something that looked like them) then I think it will eventually happen for Go as well.
The other part that was painful was trying to do "volatile" accesses in a language that has no memory model of note. I had to turn every access into use of the sync/atomic package. That likely bears a runtime cost, but it wasn't prohibitive, as we'll see when I get to the benchmarks.
// Do NOT reorder this, otherwise parallel lookup could find key but see empty value
atomic.StoreInt32((*int32)(unsafe.Pointer(slot+uintptr(4))), val)
atomic.StoreInt32((*int32)(unsafe.Pointer(slot)), key)
That's kind of nasty, but still readable. All I really need here is a C++ relaxed store, followed by a release store. For reading I would have to read key first with an acquire load, and then I can use a relaxed load for the value. On x86 there's no memory fences or anything else needed to do that, the compiler just has to output regular loads and stores without reordering them or optimizing them away. I miss the ability to do that in Go without a runtime cost. Go will probably choose sequential consistency for the sync/atomic package, which means a full memory barrier instruction needs to be emitted with every operation, and the compiler can then optimize some of those away. Java volatile does something like that IIRC, but it also gives the ability to do a release store with the putOrderedXXX in the unsafe class.
Some other nastiness is storing the hashtable itself as an unsafe.Pointer:
type HashIndex struct {
tbl unsafe.Pointer
}
When what I really wanted was:
type HashIndex struct {
tbl *hashTable
}
Here HashIndex wraps a pointer to a fixed size hashTable which implements the lookup/insert methods. It has to be done this way, because to grow the HashIndex, we need to construct a new, larger fixed size hashTable and copy the items across, and then swap it atomically with the old one. There doesn't seem to be a way with atomic.StoreXXX to store a strongly typed pointer of anything, you need a pointer to an unsafe.Pointer as the destination and the only way I could see to do that was store it as an unsafe.Pointer and cast it before using it. That makes the type system useless in this case, but it's not the end of the world since I'm bypassing it all over anyway to do this. Some getters and setters make it a little easier:
func (idx *HashIndex) table() *hashTable {
return (*hashTable)(atomic.LoadPointer(&idx.tbl))
}
func (idx *HashIndex) setTable(tbl *hashTable) {
atomic.StorePointer(&idx.tbl, unsafe.Pointer(tbl))
}
But there seems to be a bigger problem here. Go's memory model doesn't promise anything in these cases, and it has no way outside of sync/atomic to tell the compiler not to reorder or optimize out operations. So to use syscall.Mmap safely, you can't use the byte slice returned.
bytes := syscall.Mmap(...)
bytes[0] = 42 // Go compiler is allowed to notice that nobody reads bytes, so it can optimize this away
You'd need to do this:
atomic.StoreInt32((*int32)(unsafe.Pointer(&bytes[0])), (atomic.LoadInt32((*int32)(unsafe.Pointer(&bytes[0]))) &^ 0xff) | 42)
Not only does that make your eyes bleed, but it's safe to say nobody does that. It gets worse if you want to access index i instead of zero, because you have to mask off the address to get the containing dword and then modify the bit operations to mask off and change only the byte you're interested in. Yuck! So one day one of two things will happen.
1) Go will make their memory model stronger, prohibiting optimizing away redundant writes inhibiting lots of optimizations.
2) Go will do those optimizations and break every program that uses Mmap.
I somehow don't see either of those as viable options. And heaven help you if you were crazy enough to read from that Mmap in Go. Optimizations to eliminate "redundant" stores don't help performance much, but eliminating redundant reads, i.e. reading into registers and using those registers, is the bread and butter of every compiler. So every program that reads from a Mmap is already fundamentally broken with respect to the Go memory model. So when you read multiple times from a Mmap in Go, that read can be arbitrarily stale depending on how the compiler cached the reads in registers and reused them, which it is not only fully allowed to do, it must already do it.
One obvious solution to all of these problems is add volatile to the language. Then you could declare Mmap as returning volatile byte[] and be done with it. But that is like C++ const/volatile and proliferates around the type system. It's nasty and against the spirit of Go. I don't want to see that in Go. Here volatile byte[] would actually need to be a different SliceHeader, one with a volatile Data. Nasty.
An alternative is to make a more robust atomics package that can do these kinds of things more cleanly, more easily, and make it clear in the Go memory model that you have to use that even if you only want to read from a mmap. Add support for all the data types, not just int32 and 64, bytes and bools too. Add special runtime support for dealing with slices and typed pointers so you don't have to subvert the type system to use them. Most peoples programs will still be broken in theory, but they probably work in practice (or they'd notice.) At least there would be a not so painful way to fix them if needed, and a way to do it right for those who know they need volatile reads and writes. The Go compiler would be free to do all the optimizations it wants. I'd love to see support for weaker atomics as well, C++ style, rather than just sequential consistency, but maybe in time.
The end result of my weekend's hacking? On an i7-4770K @ 3.5ghz
BenchmarkGoMapInsert-4 20000000 110 ns/op
BenchmarkHashIndexInsert-4 100000000 25.6 ns/op
BenchmarkGoMapLookup-4 50000000 78.5 ns/op
BenchmarkHashIndexLookup-4 100000000 17.7 ns/op
About 4x faster than Go's builtin map, for int32 key/value on both insert and lookup. And it allows any number of reader threads concurrent access with a writer without any synchronization or blocking, unlike Go maps. It doesn't allow zero keys, unlike go maps, and it doesn't allow deletes. It's just int32 key and value. Hardly apples to apples, but the performance of pure Go code is impressive nonetheless. 200 LOC, not counting tests.
The amazing part to me? That I could setup Go on windows and a linux virtual machine, and do all of that in a language I've never used before in my life in a single weekend. That means that from a pragmatic standpoint of being easy to use, easy to write, easy to compile, test, and benchmark, Go is a wonderful success.
There's still some rough edges though, with the memory model and lack of generics, and I hope Go will take a practical approach rather than an idealistic approach. Practical languages get used. Idealistic languages (Haskell anyone?) remain toys.