Result : 2762031.415 ±(99.9%) 316068.515 ops/ms
Statistics: (min, avg, max) = (2618424.889, 2762031.415, 2818755.467), stdev = 82081.990
Confidence interval (99.9%): [2445962.900, 3078099.930]
Benchmark Mode Samples Mean Mean error Units
o.s.MyBenchmark.CAS thrpt 5 31822.578 2484.089 ops/ms
o.s.MyBenchmark.GET thrpt 5 2762031.415 316068.515 ops/ms
Result : 642967.171 ±(99.9%) 102030.083 ops/ms
Statistics: (min, avg, max) = (509935.187, 642967.171, 744384.949), stdev = 67486.583
Confidence interval (99.9%): [540937.088, 744997.254]
Result "testGETProtection": 642632.576 ±(99.9%) 102039.668 ops/ms
Statistics: (min, avg, max) = (509598.987, 642632.576, 744064.608), stdev = 67492.923
Confidence interval (99.9%): [540592.908, 744672.245]
Result "testGETSelector": 334.595 ±(99.9%) 16.189 ops/ms
Statistics: (min, avg, max) = (318.929, 334.595, 349.398), stdev = 10.708
Confidence interval (99.9%): [318.405, 350.784]
Benchmark (useLazySet) Mode Samples Mean Mean error Units
o.s.MyBenchmark.CAS true thrpt 10 31981.160 392.509 ops/ms
o.s.MyBenchmark.CAS:testCASProtection true thrpt 10 31687.056 394.408 ops/ms
o.s.MyBenchmark.CAS:testCASSelector true thrpt 10 294.104 8.928 ops/ms
o.s.MyBenchmark.CAS false thrpt 10 32666.963 666.222 ops/ms
o.s.MyBenchmark.CAS:testCASProtection false thrpt 10 32361.241 674.443 ops/ms
o.s.MyBenchmark.CAS:testCASSelector false thrpt 10 305.722 23.933 ops/ms
o.s.MyBenchmark.GET true thrpt 10 591208.139 75301.095 ops/ms
o.s.MyBenchmark.GET:testGETProtection true thrpt 10 590867.063 75316.221 ops/ms
o.s.MyBenchmark.GET:testGETSelector true thrpt 10 341.076 18.519 ops/ms
o.s.MyBenchmark.GET false thrpt 10 642967.171 102030.083 ops/ms
o.s.MyBenchmark.GET:testGETProtection false thrpt 10 642632.576 102039.668 ops/ms
o.s.MyBenchmark.GET:testGETSelector false thrpt 10 334.595 16.189 ops/ms
Am 28. März 2014 bei 20:34:18, awei...@voltdb.com (awei...@voltdb.com) schrieb:
Hi all,On my current project the last high traffic lock I have to deal with is Selector.wakeup() which is invoked to hand off writes to the network thread responsible for servicing the socket. The lock is split across several selector threads, but a socket is only ever serviced by one selector to allow lock free access to the associated application state for that connection and this results in contention when there is a hot connection.Netty tries to reduce invocations of Selector.wakeup() by tracking whether the selector thread might already be awake using an AtomicBoolean. Netty uses CAS to turn the boolean on and off. SeeandI don't quite see how CAS is necessary although maybe it is more accurate at preventing extra Selector.wakeup() and just as fast? If you do a CAS that is going to fail on a cache line that is already in the shared state is it any slower than doing a volatile read? Will a failed CAS move the cache line to the exclusive state and incur extra overhead even though the value at the cache line is not going to change?Thanks,Ariel
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Why did I use CAS rather than GET? I don't remember to be honest. I'm up for using GET instead if the extra wakeups are not too many. IIRC wakeup in Linux is writing a dummy byte to a pipe to wake up an epoll_wait call and thus it's pretty expensive - think CAS vs system call that writes to a kernel buffer from user space, and then clearing it up.
Usually, a fully asynchronous Netty application will not even see a CAS, because everything is run from an I/O thread. However, an application that performs a potentially long running task will be affected by this change.
Would you be interested in investigating further? I'd be happy to help you.
Hi,Attempting to answer my own question, failed CAS is indeed slower. JMH code http://pastebin.com/PtbTbi0G and results http://pastebin.com/CRnaY3hZResult : 2762031.415 ±(99.9%) 316068.515 ops/ms
Statistics: (min, avg, max) = (2618424.889, 2762031.415, 2818755.467), stdev = 82081.990
Confidence interval (99.9%): [2445962.900, 3078099.930]
Benchmark Mode Samples Mean Mean error Units
o.s.MyBenchmark.CAS thrpt 5 31822.578 2484.089 ops/ms
o.s.MyBenchmark.GET thrpt 5 2762031.415 316068.515 ops/msI am skeptical the CAS can deliver more value in this case than the occasional extra Selector.wakeup() invocation. Well, let's benchmark!
I tried to write a Selector loop and a loop that wakes up the selector. The selector loop consumes some CPU after being woken up. I also tested using set and lazySet. Set seems to perform better. 1:3 threads since my CPU is a quad-core, but utilization was 250%.
Hi all,On my current project the last high traffic lock I have to deal with is Selector.wakeup() which is invoked to hand off writes to the network thread responsible for servicing the socket. The lock is split across several selector threads, but a socket is only ever serviced by one selector to allow lock free access to the associated application state for that connection and this results in contention when there is a hot connection.Netty tries to reduce invocations of Selector.wakeup() by tracking whether the selector thread might already be awake using an AtomicBoolean. Netty uses CAS to turn the boolean on and off. SeeandI don't quite see how CAS is necessary although maybe it is more accurate at preventing extra Selector.wakeup() and just as fast? If you do a CAS that is going to fail on a cache line that is already in the shared state is it any slower than doing a volatile read? Will a failed CAS move the cache line to the exclusive state and incur extra overhead even though the value at the cache line is not going to change?Thanks,Ariel
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Am 1. April 2014 bei 14:48:59, Ariel Weisberg (arielw...@gmail.com) schrieb:
Hi,I tested to see if you really benefit from CAS, according to my benchmarks you can queue more tasks (and not by a little) without hitting the cache line the boolean as hard if the selector thread is awake for some period of time. If it really is a beneficial there should be a way to change the benchmark so that CAS comes out faster.
I don't see why Selector.wakeup() is only called from the network thread? If another thread in the system needs to queue a write to a socket owned by the selector would it not put a task in the queue and then invoke wakeup? Does Netty allow writers to sockets to lock and do the writes themselves or are you saying event processing never escapes the Netty thread?
I guess you miss-understood me… I said „we have nave multiple threads that may trigger the CAS operation here“, which basically means the Selector.wakeup() will only be called from a „Non-IO-Thread“. So basically what we do is if someone triggers a write from out site of the „IO-Thread (EventLoop)“ we put a task in a queue and wakeup the Selector so the task is picked up.
My application partitions to the core level so there will always be a handoff to a different non-network event processing thread or possibly a forward to a different socket if the request arrived at the wrong node. Replication will also trigger messages to other network threads. Event processing depends on shared state and rather then lock the shared mutable state I am partitioning it so that events can be routed to the correct partition and then processed without locking on the shared mutable state.
Ariel
On Tuesday, April 1, 2014 2:04:51 AM UTC-4, Norman Maurer wrote:
Hi there,I can only talk for Netty here and why we do it so take this with a grain of salt :)I think if you really want to prevent multiple wakeups you need an atomic operation. Remember that in the case of Netty we have multiple threads that may trigger the CAS operation here.--
Norman MaurerAm 28. März 2014 bei 20:34:18, awei...@voltdb.com (awei...@voltdb.com) schrieb:
Hi all,On my current project the last high traffic lock I have to deal with is Selector.wakeup() which is invoked to hand off writes to the network thread responsible for servicing the socket. The lock is split across several selector threads, but a socket is only ever serviced by one selector to allow lock free access to the associated application state for that connection and this results in contention when there is a hot connection.Netty tries to reduce invocations of Selector.wakeup() by tracking whether the selector thread might already be awake using an AtomicBoolean. Netty uses CAS to turn the boolean on and off. SeeandI don't quite see how CAS is necessary although maybe it is more accurate at preventing extra Selector.wakeup() and just as fast? If you do a CAS that is going to fail on a cache line that is already in the shared state is it any slower than doing a volatile read? Will a failed CAS move the cache line to the exclusive state and incur extra overhead even though the value at the cache line is not going to change?Thanks,Ariel
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Am 1. April 2014 bei 16:43:20, awei...@voltdb.com (awei...@voltdb.com) schrieb:
Hi,That is what I expected, the statement that confused me was " as you already said most netty apps not even need to call the wakeup at all. "
This was more related to the fact that many Netty apps are doing all the writes from within the IO-Thread (EventLoop) anyway and so not need to wake up the selector at all. Sorry for the confusion :)
If my benchmark actually measures what it attempts to measure then CAS is not better at protecting Selector.wakeup() from extra wakeups. This might be because the overhead of CAS is greater than the savings from the extra accuracy that CAS provides.My intuition is that the race for the volatile field will only result in extra Selector.wakeups() a fraction of the time. I would need to run an end to end benchmark with each approach and my guess is that it will barely be measurable.
Yeah… I’m just not sure use not CAS will buy you anything either. So I think all you can do is benchmark and check. And be sure Trustin and me would be really interesting to hear the results ;)
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.