New disruptor wait strategies

1,743 views
Skip to first unread message

S. Kaser

unread,
Jun 2, 2012, 8:10:40 AM6/2/12
to Disruptor
Hi,

We've encountered some issues around high CPU utilisation leading to
starvation, when lower-latency strategies are used, when not enough
cores in a machine to accommodate multiple processes with many
disruptor consumers/producers.

In my understanding, there seems to be a trade-off between CPU usage
and latency in the 4 available strategies:

1) Blocking- uses a classic re-entrant lock, context switching
overhead, but low cpu usage
2) Sleeping- context switching overhead, but low cpu usage
3) Yielding- very high cpu utilisation, low latency
4) BusySpin- highest cpu utilisation, very low latency

With (3) and (4) we see amazing performance but our (limited number
of) cores become overloaded.
With (1) and (2) we see latencies increase but external processes see
fewer latency spikes

So we've come up with some "middle ground" strategies, by implementing
(also testing & benchmarking) a few alternating strategies that mix
spinning/yielding and blocking/sleeping periodically.

Examples:

[AlternatingYieldBlockWaitStrategy]
---yield---_________block__________---yield---_________block__________

[AlternatingSpinSleepWaitStrategy]
---spin----_________sleep__________---spin----_________sleep__________


You can configure the intervals specifying nanosecond values during
construction time. At the moment the amount of time per alternating
strategy is constant, as our sources typically emit data periodically.
But we will experiment with modulating the pulse widths in the near
future, as some sources behave a bit more randomly...

Benchmarking what we have, I find that configured properly, they bring
a good balance of performance during bursts, essentially almost as
fast as the strategies they were inspired from. Furthermore, the
blocking/sleeping parts ensure that CPU utilisation stays low during
quieter periods (e.g. 25% instead of 100% per core). This allows for
the system to be livelier...

Another benefit is that the ring buffer is accessed periodically, so
in theory, it stays closer to CPU caches (due to the occasional read).
This should reduce any latency spikes, if something comes in e.g. 20
seconds later at a blocking interval (compared to a purely blocking
strategy)

Here are the sources:
AlternatingSpinBlockWaitStrategy: http://pastebin.com/KT9WmzMR
AlternatingSpinSleepWaitStrategy: http://pastebin.com/Hwdgy3fT
AlternatingYieldBlockWaitStrategy: http://pastebin.com/YiATa0pB
AlternatingYieldSleepWaitStrategy: http://pastebin.com/JpwBTgkz



Benchmark
(microsecond median of ~10 million transactions, single burst)

2012-06-02T12:43:04.673+0100: [Full GC (System) [CMS: 0K-
>213K(262144K), 0.0145464 secs] 25456K->213K(1541632K), [CMS Perm :
1500K->1498K(12288K)], 0.0146480 secs] [Times: user=0.00 sys=0.02,
real=0.01 secs]
BlockingWaitStrategy
[1965ms]

2012-06-02T12:43:25.960+0100: [Full GC (System) [CMS: 213K-
>266K(262144K), 0.0230689 secs] 611173K->266K(1541632K), [CMS Perm :
1817K->1817K(12288K)], 0.0231659 secs] [Times: user=0.03 sys=0.00,
real=0.02 secs]
SleepingWaitStrategy
[713ms]

2012-06-02T12:43:33.182+0100: [Full GC (System) [CMS: 266K-
>264K(262144K), 0.0105709 secs] 280293K->264K(1541632K), [CMS Perm :
1821K->1821K(12288K)], 0.0106275 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
AlternatingYieldSleepWaitStrategy
[327ms]

2012-06-02T12:43:36.363+0100: [Full GC (System) [CMS: 264K-
>265K(262144K), 0.0111233 secs] 280291K->265K(1541632K), [CMS Perm :
1825K->1825K(12288K)], 0.0112026 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
AlternatingYieldBlockWaitStrategy
[313ms]

2012-06-02T12:43:39.489+0100: [Full GC (System) [CMS: 265K-
>266K(262144K), 0.0107522 secs] 534856K->266K(1541632K), [CMS Perm :
1829K->1829K(12288K)], 0.0108127 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
AlternatingSpinSleepWaitStrategy
[289ms]

2012-06-02T12:43:42.275+0100: [Full GC (System) [CMS: 266K-
>266K(262144K), 0.0096278 secs] 534858K->266K(1541632K), [CMS Perm :
1833K->1833K(12288K)], 0.0096959 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
AlternatingSpinBlockWaitStrategy
[245ms]

2012-06-02T12:43:44.559+0100: [Full GC (System) [CMS: 266K-
>266K(262144K), 0.0115687 secs] 534857K->266K(1541632K), [CMS Perm :
1836K->1836K(12288K)], 0.0117470 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
YieldingWaitStrategy
[305ms]

2012-06-02T12:43:47.660+0100: [Full GC (System) [CMS: 266K-
>267K(262144K), 0.0107414 secs] 534859K->267K(1541632K), [CMS Perm :
1838K->1838K(12288K)], 0.0108685 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
BusySpinWaitStrategy
[205ms]

Heap
par new generation total 1279488K, used 585506K [0x04720000,
0x52f20000, 0x52f20000)
eden space 1272832K, 46% used [0x04720000, 0x282e88f8, 0x52220000)
from space 6656K, 0% used [0x52220000, 0x52220000, 0x528a0000)
to space 6656K, 0% used [0x528a0000, 0x528a0000, 0x52f20000)
concurrent mark-sweep generation total 262144K, used 267K
[0x52f20000, 0x62f20000, 0x62f20000)
concurrent-mark-sweep perm gen total 12288K, used 1842K [0x62f20000,
0x63b20000, 0x66f20000)


GC args:
-Xms1512m -Xmx1512m -Xmn1256m -XX:TargetSurvivorRatio=90 -
XX:SurvivorRatio=190 -XX:MaxTenuringThreshold=1 -XX:+UseParNewGC -XX:
+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -
XX:CompileThreshold=1000 -XX:MaxGCPauseMillis=20 -XX:+PrintGCDetails -
XX:+PrintGCDateStamps -XX:-HeapDumpOnOutOfMemoryError -
XX:HeapDumpPath=c:\temp\logs\


Performance test source can be found here:
http://pastebin.com/gRDY0yvT



Regards,
Nik

Michael Barker

unread,
Jun 3, 2012, 4:35:18 AM6/3/12
to lmax-di...@googlegroups.com
Thanks for the patches. I'll have a look at the new strategies. I've
been contemplating implementing a good phased back-off strategy.

Mike.

Ravi Prakash

unread,
Nov 12, 2013, 4:13:52 AM11/12/13
to lmax-di...@googlegroups.com
Hi,

The waitFor() function in WaitStrategy interface has been modified, so the above class implementation is failing for the same, please suggest.   
Reply all
Reply to author
Forward
0 new messages