Go-Disruptor

1,718 views
Skip to first unread message

jonathan....@gmail.com

unread,
May 17, 2014, 12:50:50 AM5/17/14
to golan...@googlegroups.com
For whatever it's worth, I've just posted a pre-alpha release of the LMAX Disruptor pattern written in Go: https://github.com/smartystreets/go-disruptor

(The code comes from the same team that has written and is supporting GoConvey, the BDD testing framework and web-based testing GUI.)

Using the Disruptor pattern, we were able to achieve a sustained throughput of over 225 million messages per second between goroutines on a Core i7 2.7 GHZ MacBook Pro (early 2013).  On this same hardware, the standard channels implementation maxed out at 15 million messages per second.

Granted, channels are still the preferred and best concurrency mechanism in the vast majority of use cases, but the Disruptor has a place in ultra-low latency applications where consistent latency is mission critical and often measured in nanoseconds.

Comments and feedback are appreciated.

Nate Finch

unread,
May 17, 2014, 6:57:45 AM5/17/14
to golan...@googlegroups.com
You need a link in the readme to something that describes what a disruptor is.  Probably most devs haven't heard of it.

Henrik Johansson

unread,
May 17, 2014, 7:25:00 AM5/17/14
to jonathan....@gmail.com, golang-nuts

Cool, I will try it out for sure!

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nate Finch

unread,
May 17, 2014, 7:59:37 AM5/17/14
to golan...@googlegroups.com
I know this is pre-alpha exploratory code, but some code comments:
  
Don't call receivers "this".  Give them a variable name like anything else (traditionally a single letter).  
Put types at the top of the file... it makes it easier to understand the functions when you've seen the types first.
Put constants and global variables at the top of the file... that's where most people will expect to find them.
Don't use bare returns unless your implementation requires them (yours doesn't ;) (in main.go startConsumers)
Don't use underscores in filenames, it's too easy to confuse them with the compiler restrictions.
Numeric constants generally don't need types, unless you really need to enforce that they are explicitly not allowed to be used by a variable of a different size.  
You can put more than one type in a file... and given that all your types are pretty tiny, it would probably be a lot easier to read if we didn't have to flip between files constantly :)
Not sure why NewBarrier takes a variadic *Sequence and not just a slice.
Converting b.N to an int64 in your benchmarks isn't going to do anything.  It's set outside the benchmark, so you can't make the system set it higher than the platform bitness allows anyway.

Overall, pretty cool to see someone showing off the fact that Go's garbage collection doesn't have to be a speed drawback because it's so easy to avoid allocations due to the low level nature of the language.

jonathan....@gmail.com

unread,
May 17, 2014, 12:06:24 PM5/17/14
to golan...@googlegroups.com
Thanks for your review Nate, I'll go through and standardize the code. Here are a few links to the Disruptor pattern:

Nate Finch

unread,
May 17, 2014, 2:45:28 PM5/17/14
to golan...@googlegroups.com
As mentioned in an email exchange, my preference against underscores in file names is not what the std library does.

Nigel Tao

unread,
May 18, 2014, 11:00:45 PM5/18/14
to jonathan....@gmail.com, golang-nuts
On Sat, May 17, 2014 at 2:50 PM, <jonathan....@gmail.com> wrote:
> For whatever it's worth, I've just posted a pre-alpha release of the LMAX
> Disruptor pattern written in Go:
> https://github.com/smartystreets/go-disruptor

I haven't looked at your code in great detail, but the Go Memory Model
says that the compiler is free to optimize

func (c *Cursor) Store(value int64) {
c.value = value
}

as a no-op if it can prove that there are no further reads of c.value
in the same goroutine. This can happen even if, at the processor
level, such a write would have had (atomic) side-effects visible from
other goroutines.

See the "there is no guarantee that the write to done will ever be
observed by main, since there are no synchronization events between
the two threads" example at http://golang.org/ref/mem

jonathan....@gmail.com

unread,
May 18, 2014, 11:39:56 PM5/18/14
to golan...@googlegroups.com, jonathan....@gmail.com
Nigel,

Thanks so much for responding.  I very much appreciate having a Go contributor give feedback. I have read through the Go memory model documents a number of times along with Google Code issues related to the memory model. Right now, the kinds of strong guarantees and capabilities that I'm accustomed to in other languages and runtimes simply aren't found in Go, so there is a certain amount of "programming by coincidence" that I am working through.

That being said, the particulars of the scenario you mentioned may have a simple solution.  For example, by providing a simple function that reads the cursor value at startup (or some other time) such that it doesn't affect runtime performance and it indicates to the compiler that the value will be read and considered such that it should not be optimized away.

I sincerely appreciate your insights into these potential pitfalls as I move forward with this project.  My plan have a full set of unit and integration tests that assert the behavior of the entire library against multiple versions of the Go runtime on different operation systems and CPU architectures. Such a suite of tests should help to mitigate any obvious issues as the Go runtime matures and further internal optimizations occur.

Jonathan

Jan Mercl

unread,
May 19, 2014, 4:14:16 AM5/19/14
to Nigel Tao, jonathan....@gmail.com, golang-nuts
On Mon, May 19, 2014 at 5:00 AM, Nigel Tao <nige...@golang.org> wrote:
> I haven't looked at your code in great detail, but the Go Memory Model
> says that the compiler is free to optimize
>
> func (c *Cursor) Store(value int64) {
> c.value = value
> }
>
> as a no-op if it can prove that there are no further reads of c.value
> in the same goroutine.

I think that cannot be statically proved above trivial cases. Consider
scenarios like

func f(c *Cursor, p *int64) {
c.value = 42
g(*p)
}

go f(c, &c.value)

-j

Dmitry Vyukov

unread,
May 19, 2014, 11:44:48 AM5/19/14
to jonathan....@gmail.com, golang-nuts
What are the downsides as compared to channels? As far as I
understand, selects and timeouts are not supported. Is blocking
supported at all?

Have you tried to implement something with channels and with
disruptor? What are the conclusions?

jonathan....@gmail.com

unread,
May 19, 2014, 11:58:04 AM5/19/14
to golan...@googlegroups.com, jonathan....@gmail.com
There are a few downsides right now:

1. The API certainly isn't as clean
2. It's designed to be created during application startup and run until shutdown whereas channels can (and often are) created adhoc and used for a short period of time.
3. Selects aren't supported. Timeouts will be supported, but they are implemented in the calling code.
4. Blocking currently isn't supported and I'm on the fence about whether to support it at all.
5. My implementation is only single producer right now, multi consumer whereas channels can be many to many.  I'll be implementing the multi-producer scenarios shortly and handling it in virtually the same way as the Java Disruptor.
6. Running under GOMAXPROCS @ 1 or running on a single core processor is an order of magnitude SLOWER than the channel implementation.

Right now working with channels is much cleaner and easier, but it's a lot slower. Also the Disruptor requires you to track a bit more, e.g. the current sequence when compared to channels. I've seen your proposal for Go channels on steroids [1] and atomic intrinsics [2] and I'm excited how it allows applications to work with the CPU instead of against it.

Dmitry Vyukov

unread,
May 19, 2014, 12:25:22 PM5/19/14
to jonathan.s.oliver42, golang-nuts
On Mon, May 19, 2014 at 7:58 PM, <jonathan....@gmail.com> wrote:
> There are a few downsides right now:
>
> 1. The API certainly isn't as clean
> 2. It's designed to be created during application startup and run until
> shutdown whereas channels can (and often are) created adhoc and used for a
> short period of time.
> 3. Selects aren't supported. Timeouts will be supported, but they are
> implemented in the calling code.
> 4. Blocking currently isn't supported and I'm on the fence about whether to
> support it at all.

Blocking is usually not required only in synthetic benchmarks.


> 5. My implementation is only single producer right now, multi consumer
> whereas channels can be many to many. I'll be implementing the
> multi-producer scenarios shortly and handling it in virtually the same way
> as the Java Disruptor.
> 6. Running under GOMAXPROCS @ 1 or running on a single core processor is an
> order of magnitude SLOWER than the channel implementation.

Note that this is not about GOMAXPROCS=1. This is about a situation
when a producer and the corresponding consumer are not scheduled
simultaneously. You can easily get the same effect with GOMAXPROCS=64
if you have enough producers/consumers and/or other activities.
What are the target applications for this library?

jonathan....@gmail.com

unread,
May 19, 2014, 1:09:59 PM5/19/14
to golan...@googlegroups.com, jonathan.s.oliver42
If the producer is scheduled and the consumer(s) is/are not, then the producer writes to spare slots in the ring buffer. If the producer is not scheduled and the consumer(s) is/are, then the consumer(s) read from the slots in the ring buffer.  The Disruptor pattern prevents the producer from racing ahead of the consumer(s) and vice versa through a gating concept that indicates all high water marks for producers and consumers.

My target application is an event-sourced application, e.g. storing all messages and rebuilding application state from all stored messages.  But another easy use case would be something like logging.  In most logging framework you have to multiplex the multiple incoming log stream from various go routines into a single stream to pipe to stdout/file/network, etc. This multiplexing is most often handled with a lock. The Disruptor removes all of the locks and dramatically improves performance. In fact, log4j is moving to a Disruptor-pattern within their framework to facilitate more performant logging.

Jonathan

Dmitry Vyukov

unread,
May 19, 2014, 1:14:19 PM5/19/14
to jonathan....@gmail.com, golang-nuts
On Mon, May 19, 2014 at 9:09 PM, <jonathan....@gmail.com> wrote:
> If the producer is scheduled and the consumer(s) is/are not, then the
> producer writes to spare slots in the ring buffer. If the producer is not
> scheduled and the consumer(s) is/are, then the consumer(s) read from the
> slots in the ring buffer.

What happens if the ring buffer is empty?

jonathan....@gmail.com

unread,
May 19, 2014, 1:45:29 PM5/19/14
to golan...@googlegroups.com, jonathan....@gmail.com
Those details is currently left to the consumer, but I will bring over a few of the defined "wait strategies" from Java including:

Sleeping (polling)
BusySpin
Yield (runtime.Gosched)
Phased Back Off (BusySpin then Sleep)
Blocking (perhaps with a WaitGroup)

Nigel Tao

unread,
May 19, 2014, 9:19:46 PM5/19/14
to jonathan....@gmail.com, golang-nuts
On Mon, May 19, 2014 at 1:39 PM, <jonathan....@gmail.com> wrote:
> Thanks so much for responding. I very much appreciate having a Go
> contributor give feedback. I have read through the Go memory model documents
> a number of times along with Google Code issues related to the memory model.
> Right now, the kinds of strong guarantees and capabilities that I'm
> accustomed to in other languages and runtimes simply aren't found in Go, so
> there is a certain amount of "programming by coincidence" that I am working
> through.

Don't program by coincidence. The Go memory model is clear: if you
need to share memory, you also need explicit synchronization.

I read somewhere that one of the reasons that the Itanium architecture
wasn't successful was that compilers had to insert lots of memory
barriers to enforce C++'s stronger guarantees than the hardware had,
which obviously killed actual vs theoretical performance.


> That being said, the particulars of the scenario you mentioned may have a
> simple solution. For example, by providing a simple function that reads the
> cursor value at startup (or some other time) such that it doesn't affect
> runtime performance and it indicates to the compiler that the value will be
> read and considered such that it should not be optimized away.

Even if the value is read once (at startup), the compiler can stil
optimize away writes as no-ops. See the "an aggressive compiler might
delete the entire go statement" line in http://golang.org/ref/mem


> I sincerely appreciate your insights into these potential pitfalls as I move
> forward with this project. My plan have a full set of unit and integration
> tests that assert the behavior of the entire library against multiple
> versions of the Go runtime on different operation systems and CPU
> architectures. Such a suite of tests should help to mitigate any obvious
> issues as the Go runtime matures and further internal optimizations occur.

You might catch obvious issues with existing compilers, but the Go
memory model says that you still will have non-obvious issues, and
they are the hardest ones to debug.

jonathan....@gmail.com

unread,
May 19, 2014, 10:18:25 PM5/19/14
to golan...@googlegroups.com, jonathan....@gmail.com
Nigel,

Thanks again for your feedback. You're absolutely correct about the Go memory model guarantees. I will take your comments as fair warning that any issues I do face are entirely my fault and not the responsibility of the Go maintainers and that any code I create built atop this library can and will misbehave according to the memory model as it is documented, not as it is currently implemented in any given release of the Go runtime.

Now, all of that being said, the Disruptor is unlike most libraries in that it should work with Go's memory model instead of against it.  It heavily employees a technique known as the "single-writer" pattern such that any given "consuming" goroutine will only ever write to a single value and the visibility of that value will be copied with its corresponding CPU cache line to another core in multi-core scenarios.  Again, you're 100% correct that I can't guarantee a given variable update will ever be seen by another goroutine on another core and that's a chance I'm willing to take.

As it stands, I'm pushing 675 million messages per second from one goroutine to another on my laptop--this is the same laptop that maxed out at 15 million per second when using a channel.  Right now a "message" is a simple, incrementing integer generated by one goroutine and passed to another goroutine. I've run this on a number of different operating systems and CPU architectures (including a quad-core ARM) with excellent results.  My Nexus 5 was pushing about 17 million messages per second with no race conditions.

Thanks again for the warning. If things don't pan out the way I hope they do, then I'll know who it was that helped me see the light.

Sokolov Yura

unread,
May 20, 2014, 12:53:46 AM5/20/14
to golan...@googlegroups.com
Did you compare against blocking select or nonblocking? Nonblocking select (i.e. select with default) on a single channel is much faster than any other since it doesn't allocate memory. Blocking select and nonblocking on a several channels allocates memory for some internal structures.

jonathan....@gmail.com

unread,
May 20, 2014, 8:02:41 AM5/20/14
to golan...@googlegroups.com
I pushed a few benchmark tests [1] that can be run with "GOMAXPROCS=N go test -bench=." where N represents the number of operating system threads that will be allowed.  Note that channels get *slower* as you increase the thread count because of the contention for the underlying lock that serves as the synchronization primitive.

On my box, the non-blocking implementation hit almost 30 million/sec, so that's a new record for channels, but that same implementation when contended over two or more threads went down to only 7.5 million/sec.  At the same time, in most applications the bottleneck certainly isn't a contended lock on the channel--it's usually something like JSON serialization, for example.  In the typical application increasing the number of cores decreases channel performance but increases application performance because it's able to bring more CPU cores to handle the problem space.  Another thing I love about channels is how easy they are to create and use.  Ranging over a channel, or simply typing "ch <- message" is amazing in its simplicity and one of the reasons I prefer Go over other languages.

My implementation of the Disruptor doesn't do well on less than two threads (GOMAXPROCS=2) because there's no blocking on receive. I'm going to see what it would take to implement that.  The main reason the Disruptor pattern is so fast compared to alternatives is that it was created having "mechanical sympathy" for the CPU and how modern CPU architectures work. Most synchronization primitives function around the concept of a lock whereas the Disruptor avoids contention at all costs.

Reply all
Reply to author
Forward
0 new messages