Hi, interesting post!Have you any latency numbers as well for strategies you tested?
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Also which commercial or otherwise products did you compare against? Just curious.
Also, separately, cloudius systems have a nice looking open source framework (it's not a protocol) with sound performance principles: http://www.seastar-project.org. Someone linked to it on this list a few weeks ago.
sent from my phone
Thanks Martin. The append-only log/datastructure -- is it backed by disk (mmap)? I'm assuming it's bounded to some size, so what happens if you get a bunch of slow consumers? I understand they won't prevent other consumers from going through the log and consuming subsequent messages, but if you're not dropping them now altogether, they must be handcuffing you to some degree by not allowing release/reuse of old logs? How's the memory consumption going to look like here?
Also, what is the memory footprint of Aeron, say, in default configuration? I'm talking about purely its own overhead, not anything arbitrary consumers would add.
Also which commercial or otherwise products did you compare against? Just curious.
Also, separately, cloudius systems have a nice looking open source framework (it's not a protocol) with sound performance principles: http://www.seastar-project.org. Someone linked to it on this list a few weeks ago
--
The few commercial products I'm aware of typically offer some sort of failover/redundancy/persistence/durability/etc options, which usually come at a performance hit. There's nothing wrong with running from /dev/shm, but it's a different use case.
Martin, if you had to support tcp how would you handle tcp fragmentation ? Meaning how you make sure you only call the code using aeron to be notified only when the full (or some significant part) is ready for parsing ?
p.s.i am hoping this is not high jacking the thread !
--
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
It is "unfortunate" that JMH does not come out of the box with support for pinning threads to cores so that you could have more control on how you oversubscribe.
Also it is unfortunate that JMH does not come with cpu performance counters for people that have cpu bound workloads where the injector makes a lot of sense.
On Friday, April 24, 2015 at 1:59:23 PM UTC-4, Benedict Elliott Smith wrote:
This approach is similar to, or at least addresses the same kinds of concerns as, the approach I designed for Apache Cassandra. I've had a blog post sitting around for months that hasn't been published for various reasons, that I've hit publish on today. Competition FTW :)
On 24 April 2015 at 03:10, Greg Wilkins <gr...@intalio.com> wrote:
Many many months ago, I asked in this forum about how to avoid the parallel slowdown in handling requests from a multiplexed HTTP/2 connection. The problem being that to avoid HOL blocking, requests are dispatched to other threads, but by doing so that means the CPU core that handles are request is mostly likely a different one than that which parsed it, so it's cache will be cold of all the request data.
The suggestion here was that I look at some kind of work stealing algorithm to avoid HOL blocking while keeping request streams mostly on the same core by using a single queue per thread. Good idea, but was too complex to implement in our environment (Jetty). We also looked at the disruptor and it was also not a good fit.
So we have come up with our own scheduling strategy for Jetty-9.3's HTTP/2 which we have nicknamed Eat What You Kill and it implements the producer consumer pattern with mechanical sympathy.
I've written up a blog describing the problem and our solution, which you can preview here: https://webtide.com/?p=2870&preview=1&_ppp=f8c4ae3461 I'd very much appreciate some review/feedback from this forum before I publish that blog - specially if I have: accidentally plagiarised an existing idea; missed something which means I'm fooling myself; badly described the whole thing; etc
cheers
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
On Fri, Apr 24, 2015 at 10:05 PM, ymo <ymol...@gmail.com> wrote:It is "unfortunate" that JMH does not come out of the box with support for pinning threads to cores so that you could have more control on how you oversubscribe.JMH does not support pinning, because there is no built-in JDK API we can use. But since JMH does provide the invariant that @Setup methods would be called by the worker threads, you can "just" use AffinityLock-s, etc.Also it is unfortunate that JMH does not come with cpu performance counters for people that have cpu bound workloads where the injector makes a lot of sense.Come again? There are: -prof perf, -prof perfasm, -prof perfnorm, (heck, even) -prof xperfasm.-Aleksey
On Friday, April 24, 2015 at 1:59:23 PM UTC-4, Benedict Elliott Smith wrote:
This approach is similar to, or at least addresses the same kinds of concerns as, the approach I designed for Apache Cassandra. I've had a blog post sitting around for months that hasn't been published for various reasons, that I've hit publish on today. Competition FTW :)
On 24 April 2015 at 03:10, Greg Wilkins <gr...@intalio.com> wrote:
Many many months ago, I asked in this forum about how to avoid the parallel slowdown in handling requests from a multiplexed HTTP/2 connection. The problem being that to avoid HOL blocking, requests are dispatched to other threads, but by doing so that means the CPU core that handles are request is mostly likely a different one than that which parsed it, so it's cache will be cold of all the request data.
The suggestion here was that I look at some kind of work stealing algorithm to avoid HOL blocking while keeping request streams mostly on the same core by using a single queue per thread. Good idea, but was too complex to implement in our environment (Jetty). We also looked at the disruptor and it was also not a good fit.
So we have come up with our own scheduling strategy for Jetty-9.3's HTTP/2 which we have nicknamed Eat What You Kill and it implements the producer consumer pattern with mechanical sympathy.
I've written up a blog describing the problem and our solution, which you can preview here: https://webtide.com/?p=2870&preview=1&_ppp=f8c4ae3461 I'd very much appreciate some review/feedback from this forum before I publish that blog - specially if I have: accidentally plagiarised an existing idea; missed something which means I'm fooling myself; badly described the whole thing; etc
cheers
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Ok i stand corrected ... I used JMH a while ago and missed all the later days improvements Thanks alot Aleksey ))) Maybe i missed it but does it make sense to enable the perf counters per method ?
On Friday, April 24, 2015 at 3:13:52 PM UTC-4, Aleksey Shipilev wrote:
On Fri, Apr 24, 2015 at 10:05 PM, ymo <ymol...@gmail.com> wrote:It is "unfortunate" that JMH does not come out of the box with support for pinning threads to cores so that you could have more control on how you oversubscribe.JMH does not support pinning, because there is no built-in JDK API we can use. But since JMH does provide the invariant that @Setup methods would be called by the worker threads, you can "just" use AffinityLock-s, etc.Also it is unfortunate that JMH does not come with cpu performance counters for people that have cpu bound workloads where the injector makes a lot of sense.Come again? There are: -prof perf, -prof perfasm, -prof perfnorm, (heck, even) -prof xperfasm.-Aleksey
On Friday, April 24, 2015 at 1:59:23 PM UTC-4, Benedict Elliott Smith wrote:
This approach is similar to, or at least addresses the same kinds of concerns as, the approach I designed for Apache Cassandra. I've had a blog post sitting around for months that hasn't been published for various reasons, that I've hit publish on today. Competition FTW :)
On 24 April 2015 at 03:10, Greg Wilkins <gr...@intalio.com> wrote:
Many many months ago, I asked in this forum about how to avoid the parallel slowdown in handling requests from a multiplexed HTTP/2 connection. The problem being that to avoid HOL blocking, requests are dispatched to other threads, but by doing so that means the CPU core that handles are request is mostly likely a different one than that which parsed it, so it's cache will be cold of all the request data.
The suggestion here was that I look at some kind of work stealing algorithm to avoid HOL blocking while keeping request streams mostly on the same core by using a single queue per thread. Good idea, but was too complex to implement in our environment (Jetty). We also looked at the disruptor and it was also not a good fit.
So we have come up with our own scheduling strategy for Jetty-9.3's HTTP/2 which we have nicknamed Eat What You Kill and it implements the producer consumer pattern with mechanical sympathy.
I've written up a blog describing the problem and our solution, which you can preview here: https://webtide.com/?p=2870&preview=1&_ppp=f8c4ae3461 I'd very much appreciate some review/feedback from this forum before I publish that blog - specially if I have: accidentally plagiarised an existing idea; missed something which means I'm fooling myself; badly described the whole thing; etc
cheers
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Not sure what does it mean to have perf counters per method. perfasm aggregates counters per method. But I tend to think once you need that, you have to employ a full-fledged profiler, like Solaris Studio Performance Analyzer or VTune or something else.-Aleksey.
On Fri, Apr 24, 2015 at 10:34 PM, ymo <ymol...@gmail.com> wrote:
Ok i stand corrected ... I used JMH a while ago and missed all the later days improvements Thanks alot Aleksey ))) Maybe i missed it but does it make sense to enable the perf counters per method ?
On Friday, April 24, 2015 at 3:13:52 PM UTC-4, Aleksey Shipilev wrote:
On Fri, Apr 24, 2015 at 10:05 PM, ymo <ymol...@gmail.com> wrote:It is "unfortunate" that JMH does not come out of the box with support for pinning threads to cores so that you could have more control on how you oversubscribe.JMH does not support pinning, because there is no built-in JDK API we can use. But since JMH does provide the invariant that @Setup methods would be called by the worker threads, you can "just" use AffinityLock-s, etc.Also it is unfortunate that JMH does not come with cpu performance counters for people that have cpu bound workloads where the injector makes a lot of sense.Come again? There are: -prof perf, -prof perfasm, -prof perfnorm, (heck, even) -prof xperfasm.-Aleksey
On Friday, April 24, 2015 at 1:59:23 PM UTC-4, Benedict Elliott Smith wrote:
This approach is similar to, or at least addresses the same kinds of concerns as, the approach I designed for Apache Cassandra. I've had a blog post sitting around for months that hasn't been published for various reasons, that I've hit publish on today. Competition FTW :)
On 24 April 2015 at 03:10, Greg Wilkins <gr...@intalio.com> wrote:
Many many months ago, I asked in this forum about how to avoid the parallel slowdown in handling requests from a multiplexed HTTP/2 connection. The problem being that to avoid HOL blocking, requests are dispatched to other threads, but by doing so that means the CPU core that handles are request is mostly likely a different one than that which parsed it, so it's cache will be cold of all the request data.
The suggestion here was that I look at some kind of work stealing algorithm to avoid HOL blocking while keeping request streams mostly on the same core by using a single queue per thread. Good idea, but was too complex to implement in our environment (Jetty). We also looked at the disruptor and it was also not a good fit.
So we have come up with our own scheduling strategy for Jetty-9.3's HTTP/2 which we have nicknamed Eat What You Kill and it implements the producer consumer pattern with mechanical sympathy.
I've written up a blog describing the problem and our solution, which you can preview here: https://webtide.com/?p=2870&preview=1&_ppp=f8c4ae3461 I'd very much appreciate some review/feedback from this forum before I publish that blog - specially if I have: accidentally plagiarised an existing idea; missed something which means I'm fooling myself; badly described the whole thing; etc
cheers
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Out of curiosity how much is want of the attribute due to lack of a repl
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
This is what I hate about slides taken on their own and not supporting a presentation. I think Slide Share is so wrong :-) A presentation is a presentation and a document is a document."Persistent" in this sense is the functional programming definition and not the storage to disk.A FP persistent data structure does not mutate under the reader. For the period of time they are valid then they are persistent from the FP perspective.
I kind of don't understand the JMH benchmark altogether. Specially about the part where you have .threads(2000) ?
For the sake of sanity here is what i do :1) run one producer on one core2) run all consumers on other cores
3) make sure i am always on the same socket4) make sure the kernel is not scheduling anything on those coresIt is a benchmark after all but at least it removes all the other variables from the equation to only test how new consumers are processing your requests. makes sense ?
Aeron .... This thread copies the data into data structures that are append only and can be ready by multiple threads without any locks or full fences.
Hi,This is what I hate about slides taken on their own and not supporting a presentation. I think Slide Share is so wrong :-) A presentation is a presentation and a document is a document."Persistent" in this sense is the functional programming definition and not the storage to disk.A FP persistent data structure does not mutate under the reader. For the period of time they are valid then they are persistent from the FP perspective.Some people use the term "append-only" here.
The theory that I'm putting forward is that in such cases it is better to get the same thread to do the processing rather than use and efficient hand over to another thread. Instead hand over to another thread to continue producing.
The theory that I'm putting forward is that in such cases it is better to get the same thread to do the processing rather than use and efficient hand over to another thread. Instead hand over to another thread to continue producing.I suspect the improvement you are seeing isn't based on which thread does the processing, but down to efficiently saturating the available CPUs. If serving the servlet takes less time than a call to LockSupport.unpark() * num cores, then you never fully utilise all of your cores with a single network consumer. In your design, this cost is steadily spread out across all of the cores, so that you more rapidly saturate them. This is a similar optimisation to that delivered by the injector, but the injector tries to take it one step further and eliminate some of these calls entirely.
--
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/xUb9S0Rl6L4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
This is what I hate about slides taken on their own and not supporting a presentation. I think Slide Share is so wrong :-) A presentation is a presentation and a document is a document."Persistent" in this sense is the functional programming definition and not the storage to disk.A FP persistent data structure does not mutate under the reader. For the period of time they are valid then they are persistent from the FP perspective.Some people use the term "append-only" here.True some people say append-only. It is one means of achieving a persistent data structure. The other major technique is path-copy. Persistent is the abstract term like List is in Java, with append-only or path-copy as implementations, just like a List can be array backed or linked nodes.The key to persistence in this sense is immutability from the readers perspective. Immutability makes things great for reasoning about and if you consider a reasonable time/space window then quite powerful things can be built that also afford great performance. For example, append-only can play very well with hardware prefetchers.
I think 'append-only' might be the more appropriate term. The persistence property is a superset of immutability; i.e. when a persistent structure is 'modified', a new version of the structure is created with the modification applied, but the key point is that the original remains unmodified + immutable before/during+after the modification - i.e. the original version 'persists' after the operation. I don't think the Aeron structures posses ( or require!) this property.I think it makes sense to describe the aeron log structures as an append-only collection of logically immutable values, but describing them as 'persistent' might confuse a few FP weenies :-)