--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
In your calculate()-method you keep creating ActorSystems and spawn an actor responsible for the actual computation.
You should prefer creating and keeping one ActorSystem for your benchmark and just spawn an actor to handle your calculation request (i.e. your master). After completing the calculation you let the actor die. This alone would probably boost the performance. Also: There is no 1:1 correlation between actors and threads, you can create way more actors which - depending on the dispatcher - use a given thread pool for execution.Kind regards,Hendrik
Am Dienstag, 7. Januar 2014 01:26:11 UTC+1 schrieb Rüdiger Möller:Please checkout chart here.https://plus.google.com/109956740682506252532/posts/1hKcYyPuJzh
[cut&pasted from g+]:Hey folks, i am currently writing a blog benchmarking akka vs traditional threading. I use the example provided by the akka java tutorial computing Pi. In order to compare the abillity to paralellize big amounts of tiny jobs, i use Pi-computaional slices of 100,000 jobs with iteration of 1000.
Hardware is dual socket AMD opteron with each 8 real cores and 8 'virtual' (because the test uses floating point i just scale to 16 threads instead of 32).
As you can see in the chart AKKA (2.03) performs very bad compared to threads and a homebrew actor lib.
source of akka bench is here: https://gist.github.com/RuedigerMoeller/8272966
(added outer loop to original typesafe sample)
Is there anything I miss or is this 'normal' Akka performance ?
Threading-style code is here: https://gist.github.com/RuedigerMoeller/8273307
I tried 2.1 with even worse results.
http://imgur.com/TAt9XOf
--
You definitely want to play around with the configuration and make sure that you are benchmarking correctly: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java
dispatcher throughput, thread pool type and thread pool size, and mailbox type.
Also, Opterons have pretty bad cache performance for inter-core comms (Intel uses inclusive L3s for faster on-package caches)
Am Dienstag, 7. Januar 2014 11:57:54 UTC+1 schrieb √:You definitely want to play around with the configuration and make sure that you are benchmarking correctly: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java
I know the details regarding VM warmup and how to avoid in-place JIT'ing. I am iterating the bench 30 times, only take the average times of last 10 iterations (20 iterations warmup), see mainloop i added to the original AKKA sample. Also the benchmark is not in the nanos, but several hundred millis per iteration.
dispatcher throughput, thread pool type and thread pool size, and mailbox type.
can you be more specific pls (Code!). The use case is many short running computations with contention created when aggregating the result.
Also, Opterons have pretty bad cache performance for inter-core comms (Intel uses inclusive L3s for faster on-package caches)
Well the other benches use the same hardware.
I am currently repeating the test on a dual socket x 6 core intel xeon (so 12 cores, 24Hardware-Threads). With Akka still being worst one by far.
On Tue, Jan 7, 2014 at 1:32 PM, Rüdiger Möller <moru...@gmail.com> wrote:
Am Dienstag, 7. Januar 2014 11:57:54 UTC+1 schrieb √:You definitely want to play around with the configuration and make sure that you are benchmarking correctly: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java
I know the details regarding VM warmup and how to avoid in-place JIT'ing. I am iterating the bench 30 times, only take the average times of last 10 iterations (20 iterations warmup), see mainloop i added to the original AKKA sample. Also the benchmark is not in the nanos, but several hundred millis per iteration.
There is no reason at all to use currentTimeMillis (it's got accuracy problems (I've seen up to 20-30ms), just use nanoTime.
dispatcher throughput, thread pool type and thread pool size, and mailbox type.
can you be more specific pls (Code!). The use case is many short running computations with contention created when aggregating the result.
Also, Opterons have pretty bad cache performance for inter-core comms (Intel uses inclusive L3s for faster on-package caches)
Well the other benches use the same hardware.
I am currently repeating the test on a dual socket x 6 core intel xeon (so 12 cores, 24Hardware-Threads). With Akka still being worst one by far.
Numbers?
(I also think we have to remain quite reasonable here, akka lets you scale out your computation up to ~2500 Jvms. Does the other solutions offer that?)
Am Dienstag, 7. Januar 2014 13:40:50 UTC+1 schrieb √:On Tue, Jan 7, 2014 at 1:32 PM, Rüdiger Möller <moru...@gmail.com> wrote:
Am Dienstag, 7. Januar 2014 11:57:54 UTC+1 schrieb √:You definitely want to play around with the configuration and make sure that you are benchmarking correctly: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java
I know the details regarding VM warmup and how to avoid in-place JIT'ing. I am iterating the bench 30 times, only take the average times of last 10 iterations (20 iterations warmup), see mainloop i added to the original AKKA sample. Also the benchmark is not in the nanos, but several hundred millis per iteration.
There is no reason at all to use currentTimeMillis (it's got accuracy problems (I've seen up to 20-30ms), just use nanoTime.
There is reason. System time millis is based on global system time, nanos is guaranteed to be consistent within a single thread only. The "accuracy problems" usually occur if one compares System.currentTimeMillis obtained from within different threads. Since runtimes are in the range of up to >1000 ms, and the tests are run many times i can say for sure this is not the reason why akka seems to scale not-that-good. Some historical problems with huge inaccuracy of systimemillis were with windows XP. I am using CenOS 6.4, 64bit.
dispatcher throughput, thread pool type and thread pool size, and mailbox type.
can you be more specific pls (Code!). The use case is many short running computations with contention created when aggregating the result.
(I also think we have to remain quite reasonable here, akka lets you scale out your computation up to ~2500 Jvms. Does the other solutions offer that?)
Network connected VM's :-).
The computation is just a placeholder for the use case of "high rates of small events" which is typical for many real time systems. Scaling frequently does not pay-off because network (+decoding/encoding) becomes the bottleneck. Scaling is not about saturating many CPU's but about getting more throughput ;-)
nanoTime is supposedly monotonic, where is your reference to the "same thread" claim?
nanoTime is supposedly monotonic, where is your reference to the "same thread" claim?
Its guaranteed to be monotonic seen from a single thread, not across threads. systime millis is guaranteed to be monotonic across threads, so its more expensive and requires some fencing etc. to be generated by hotspot. There is a video of Cliff Click out there where he goes into that in great detail ..
Anyway, the results are not skewed by that for sure.
No, but i will try. I am not interested in presenting skewed benchmarks. Abstraktor is not a competing project, its just my playground lean actor impl to get a raw feeling of what should be possible.
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.
Regarding remoting/failover there are much faster options than actors/Akka today.
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.
regards,
Rüdiger
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
On Tue, Jan 7, 2014 at 4:22 PM, Rüdiger Möller <moru...@gmail.com> wrote:
nanoTime is supposedly monotonic, where is your reference to the "same thread" claim?
Its guaranteed to be monotonic seen from a single thread, not across threads. systime millis is guaranteed to be monotonic across threads, so its more expensive and requires some fencing etc. to be generated by hotspot. There is a video of Cliff Click out there where he goes into that in great detail ..
Anyway, the results are not skewed by that for sure.I'm not sure a hand-wavy reference to Cliff is going to quench my thirst for facts even though Cliff is a great guy.
No, but i will try. I am not interested in presenting skewed benchmarks. Abstraktor is not a competing project, its just my playground lean actor impl to get a raw feeling of what should be possible.
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.Yep, my argument was that without remote you have a spof.Regarding remoting/failover there are much faster options than actors/Akka today.
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.
I'm not sure a hand-wavy reference to Cliff is going to quench my thirst for facts even though Cliff is a great guy.
Again: your obsession with time measurement makes sense when measuring small amounts of ticks and adding them up, but not in the context of a longer running test. You easily may copy the snippet and add nanotime measurement. It will not make significant difference.
No, but i will try. I am not interested in presenting skewed benchmarks. Abstraktor is not a competing project, its just my playground lean actor impl to get a raw feeling of what should be possible.
The default settings is definitely not optimal for your use-case (as default settings rarely are).
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.Yep, my argument was that without remote you have a spof.Regarding remoting/failover there are much faster options than actors/Akka today.Reference?
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
What are you basing this opinion on, and what benchmark/setting are you comparing?
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.Great, let us know what the results were.
I'm not sure a hand-wavy reference to Cliff is going to quench my thirst for facts even though Cliff is a great guy.
Again: your obsession with time measurement makes sense when measuring small amounts of ticks and adding them up, but not in the context of a longer running test. You easily may copy the snippet and add nanotime measurement. It will not make significant difference.
No, but i will try. I am not interested in presenting skewed benchmarks. Abstraktor is not a competing project, its just my playground lean actor impl to get a raw feeling of what should be possible.
The default settings is definitely not optimal for your use-case (as default settings rarely are).
That's why I preferred talking back here :-)
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.Yep, my argument was that without remote you have a spof.Regarding remoting/failover there are much faster options than actors/Akka today.Reference?
thinking of IBM WLLM or Informatika UM. Does Akka offer reliable UDP messaging with several million msg/sec througput, from what I have seen its challenging enough to get this on localhost. I have used the former, its really blasting fast (at elast with kernel bypass networking equipment).
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
What are you basing this opinion on, and what benchmark/setting are you comparing?
Benchmarks and some public bragging with not-so-impressive numbers .. ;-). Also i can see in single threaded benchmarks, Akka's message passing adds significant overhead compared to single threaded executor and byte-weaving based proxying as used in other libs.
One even can do remote messaging faster than Akka does inter thread messaging, so there definitely is room for improvement.
Does Akka provide reliable UDP messaging (NAK based, not acknowledged ?).
That's what you need for high end throughput+failover imo. Doing typed actor message passing via JDK proxies e.g. is well .. you should know yourself :-)
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.Great, let us know what the results were.
Don't expect too much, as I pointed out from my POV the problem is already inside Akkas basic message passing performance IMO, so Akka has a hard time breaking even when scaling. We'll see.
Current AMD Opteron(tm) and Athlon(tm)64 processors provide power management mechanisms that independently adjust the performance state ("P-state") and power state ("C-state") of the processor[1][2]; these state changes can affect a processor core's Time Stamp Counter (TSC) which some operating systems may use as a part of their time keeping algorithms. Most modern operating systems are well aware of the effect of these state changes on the TSC and the potential for TSC drift[3] across multiple processor cores and properly account for it.I don't think these apply for current systems though (the above posts are really old) and I don't even see definitive answer for old systems -- unfortunately Björn is not here to ask him.
Tuning the dispatcher is not skewing the benchmarks. The whole idea of dispatchers is that you can tune subsystems of your actor system to particular load characteristics. The default throughput setting hits a particular point in the fairness-throughput tradeoff spectrum, which is not the best for batch workloads.
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.Yep, my argument was that without remote you have a spof.Regarding remoting/failover there are much faster options than actors/Akka today.As for failover, if you are limited to software implementation the speed of remote failover is bounded by a timeout period, it does not matter what software framework (Akka or other) is used. If you have any side-channel information, maybe hardware solutions e.g. link failure notifications or hardware watchdogs the game is different -- but that is apples to oranges.
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
It is a bit of a strawman. For any kind of particular use-casethe fastest implementation is a custom hand-tuned one designed by an expert and I don't doubt that you can beat Akka in many particular scenarios. In fact, for every system there is always one more benchmark that you cannot beat. It all depends how much resources you have to throw against your problem (and maintaining it over time).
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.We could in theory play around with the example and fine-tune (I am very tempted to try it now), but the problem is that we are preparing a release and we cannot really allocate any time to this particular benchmark. Play around with the dispatcher settings a bit and see how it works out -- try tuning the throughput setting in particular.
Current AMD Opteron(tm) and Athlon(tm)64 processors provide power management mechanisms that independently adjust the performance state ("P-state") and power state ("C-state") of the processor[1][2]; these state changes can affect a processor core's Time Stamp Counter (TSC) which some operating systems may use as a part of their time keeping algorithms. Most modern operating systems are well aware of the effect of these state changes on the TSC and the potential for TSC drift[3] across multiple processor cores and properly account for it.I don't think these apply for current systems though (the above posts are really old) and I don't even see definitive answer for old systems -- unfortunately Björn is not here to ask him.
All my machines have power saving turned off. The intel boxes run with hyperthreading disabled ..
Tuning the dispatcher is not skewing the benchmarks. The whole idea of dispatchers is that you can tune subsystems of your actor system to particular load characteristics. The default throughput setting hits a particular point in the fairness-throughput tradeoff spectrum, which is not the best for batch workloads.
I agree. I just wanted to state, that I am not interested in presenting "Bad Akka" as some of the comments looked like you fell offended ;-).
| Single-machine performance is only interesting if you are after single points of failure.Both things are important: single machine performance AND remote messaging throughput + latency.Yep, my argument was that without remote you have a spof.Regarding remoting/failover there are much faster options than actors/Akka today.As for failover, if you are limited to software implementation the speed of remote failover is bounded by a timeout period, it does not matter what software framework (Akka or other) is used. If you have any side-channel information, maybe hardware solutions e.g. link failure notifications or hardware watchdogs the game is different -- but that is apples to oranges.
Disagree. You can run systems redundantly with total message ordering and always get the fastest response. This is zero latency failover. Needs a decent reliable UDP messaging stack ofc.
I appreciate your vision of making this transparent to the application. Its a great idea, but I think your are still not there for the very high end kind of application, no offence. I have built large high performance distributed systems, so I know what I am talking bout.
It is a bit of a strawman. For any kind of particular use-casethe fastest implementation is a custom hand-tuned one designed by an expert and I don't doubt that you can beat Akka in many particular scenarios. In fact, for every system there is always one more benchmark that you cannot beat. It all depends how much resources you have to throw against your problem (and maintaining it over time).
Mostly agree. However there's no excuse in not using the fastest possible option in basic mechanics like queued message dispatch. I have reasonable suspicion this is the case (will have to investigate).
However regarding concurrent programming, actors can improve performance and maintainability today, that's why i am currently investigating/benchmarking local performance only.
I'll will incorporate your proposals into the test.We could in theory play around with the example and fine-tune (I am very tempted to try it now), but the problem is that we are preparing a release and we cannot really allocate any time to this particular benchmark. Play around with the dispatcher settings a bit and see how it works out -- try tuning the throughput setting in particular.
As long the benchmark processes 1 million of independend Pi computation slices concurrently, any tuning would be fair (and welcome). I am not so sure regarding "batching" optimizations as this actually reduces the number of messages processed. However an adaptive batching dispatcher could boost a lot (i know this from my network related work), but at the cost of increased latency. This test is not about batching but about processing many tiny units of work e.g. market data ;-)
regards,
rüdiger
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
Sounds just like a minor comms mishap.
How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
I'm not sure I follow. Clearly there's a tradeoff between fairness and throughput due to platform artifacts.
On Jan 7, 2014 6:54 PM, "Rüdiger Möller" <moru...@gmail.com> wrote:
>
>
>
> Am Dienstag, 7. Januar 2014 17:52:28 UTC+1 schrieb √:
>>
>>
>>
>> Sounds just like a minor comms mishap.
>
>
> Sorry cannot figure out what this means .. I am a native german
I am native Swedish; you misunderstood eachother.
>
>>
>>
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.
But it requires that workstealing is OK which is a subset of cases.
>
>>
>> I'm not sure I follow. Clearly there's a tradeoff between fairness and throughput due to platform artifacts.
>>
>
> Its more basic: Using proxies for typed actors is really wasteful.
Provably untrue; You MUST have a logical proxy since it is a distributed model. The wastefulness of said proxy is implementation dependent and as such you cannot make any claim of efficiency of unstated implementation or in general.
Untyped actors on the other hand replace message dispatch with 'instancof' chaining, which prevents any hotspot call optimization in case of direct calls (both actors share same thread => direct method call done instead of queuing).
Which is what you want since otherwise you're synchronous, i.e. a malicious or broken recipient can prevent progress of the sender's logic leading to extremely brittle systems. See http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Cheers,
V
Now; you asked for ways of improving the Akka actor performance, we have provided the relevant information for you to do so.
Let's stay on topic.
Cheers,
V
No, my point is that using currentTimeMillis to obtain durations _at all_ is to be considered bad practice due to the shoddy accuracy.
You're comparing apples to oranges, i.e. a transport with a model of computation.Akkas remoting transport is pluggable so you could implement an UM version of it if you so wish. Or even that WLLM!
Reference?
That's what you need for high end throughput+failover imo. Doing typed actor message passing via JDK proxies e.g. is well .. you should know yourself :-)Absolutely, TypedActors used to be based on AspectWerkz proxies but were repurposed to use JDK Proxies due to the use-case. You are of course, if you want, free to use a JVM that ships with extreme performance JDK Proxies. :)
I've seen differences up to 4 magnitudes just with configuration changes. As you can imagine, I have spent quite some time tuning Akka.
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.But it requires that workstealing is OK which is a subset of cases.
Provably untrue; You MUST have a logical proxy since it is a distributed model. The wastefulness of said proxy is implementation dependent and as such you cannot make any claim of efficiency of unstated implementation or in general.
Untyped actors on the other hand replace message dispatch with 'instancof' chaining, which prevents any hotspot call optimization in case of direct calls (both actors share same thread => direct method call done instead of queuing).
Which is what you want since otherwise you're synchronous, i.e. a malicious or broken recipient can prevent progress of the sender's logic leading to extremely brittle systems. See http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Right, just so I'm clear - running your tests, I see something on the order of a 10% performance penalty for Akka vs your solution using all sorts of excitement with countdown latches and thread parking. Are you seeing a difference of more than 10%? I can't see your results, so I can't see what differences you're observing. If you're seeing something out of line with my results, we should be looking at mine. If you're seeing performance that agrees with my experience, I think we can probably agree that a 10% performance penalty in exchange for not having to do explicit management is a worthwhile exchange in a non-negligible set of use cases.
No, my point is that using currentTimeMillis to obtain durations _at all_ is to be considered bad practice due to the shoddy accuracy.systemmillis is wall clock time. When measuring durations > 500 ms, accuracy issues are not a problem. Modern OSes + VM have better accuracy than older ones. I'll change to nanos just in case.
You're comparing apples to oranges, i.e. a transport with a model of computation.Akkas remoting transport is pluggable so you could implement an UM version of it if you so wish. Or even that WLLM!Ok, wasn't aware of that pluggability feature. Good.Reference?From post above (intel xeon 2 socket x 6 cores) ..========================================== 1m jobs each perform 100-pi-slice loop
AKKA
average 1 threads : 1914Abstraktor
average 1 threads : 1349synced Threading
average 1 threads : 800"sync'ed threading" schedules runnables to an executor which obviously is fastest.Abstraktor prototype pushes methods onto a concurrentlinkedqueue and executes calls via reflection. This already produces significant overhead.Akka has an overhead of >2 times the single threading case.This would not be a problem if they scale infinitely (Threading does not scale at all in the 1 million message case). But they don't because the queues passing inter-thread messages create contention (to a lesser extend compared to threading). Both abstraktor and akka stop scaling at a certain amount of CPU cores used.So if the basic overhead is too high, the break even never comes !
Even worse (don't know the exact reason): Default Akka Q's seem to produce more contention than my prototype'ish plain polled CLQ. So Akka comes with the highest dispatch overhead and scales out worst due to contention: double fail. You should do something about that. Fast message passing is at the core of the system, its not a good idea to relax regarding efficiency in such a critical part of your system.
That's what you need for high end throughput+failover imo. Doing typed actor message passing via JDK proxies e.g. is well .. you should know yourself :-)Absolutely, TypedActors used to be based on AspectWerkz proxies but were repurposed to use JDK Proxies due to the use-case. You are of course, if you want, free to use a JVM that ships with extreme performance JDK Proxies. :)400 lines of byte code weaving can fix that.
I've seen differences up to 4 magnitudes just with configuration changes. As you can imagine, I have spent quite some time tuning Akka.Ok, i just have to stop posting in order to do the test now ... :-)
- rüdiger
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.But it requires that workstealing is OK which is a subset of cases.
How is that work stealing. Consider N receivers in identical state responding to the same requests (multicast, so requests are not sent twice). N receiver respond, but the requestor just takes the first response and ignores the other responses.
Provably untrue; You MUST have a logical proxy since it is a distributed model. The wastefulness of said proxy is implementation dependent and as such you cannot make any claim of efficiency of unstated implementation or in general.
I can make the claim that it is awful slow with the only usable server java VM on the market on the most frequently used hardware platform :-). You can roll your own proxy implementation with reasonable effort.
Untyped actors on the other hand replace message dispatch with 'instancof' chaining, which prevents any hotspot call optimization in case of direct calls (both actors share same thread => direct method call done instead of queuing).
Which is what you want since otherwise you're synchronous, i.e. a malicious or broken recipient can prevent progress of the sender's logic leading to extremely brittle systems. See http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous
I think an actor framework should not support synchronous callbacks at all. You don't need them.In contradiction to the blog post above, in abstractor, callbacks do not come in a different thread, but put a message on theactors queue (except when sharing thread with caller).
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Uhh, that's a very academic point of view.
The speed difference of a direct call and a message being queued is >1000 times. One can keep the contract on Actors, but optimize the dispatch in case they share same thread/dispatcher.
As long synchronous results are forbidden, this does not affect functionality or behaviour of an Actor.Yes, it *may* happen the receiver blocks due to ill behaviour.
If the same ill Actor gets messages queued, it will get a queue overflow in most cases anyway. I'd consider this a bug that needs a fix. The performance tradeoff ismassive and forces coarse grained actor design, which on the other hand creates harder-to-balance apps.I see your reasons, for me this is a no go out of practical considerations.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
On Tue, Jan 7, 2014 at 7:59 PM, Rüdiger Möller <moru...@gmail.com> wrote:
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.But it requires that workstealing is OK which is a subset of cases.
How is that work stealing. Consider N receivers in identical state responding to the same requests (multicast, so requests are not sent twice). N receiver respond, but the requestor just takes the first response and ignores the other responses.That's called "hot standby"—which may be fine for some use cases but not all: consider a messaged called "LaunchMissiles".
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Uhh, that's a very academic point of view.
As long synchronous results are forbidden, this does not affect functionality or behaviour of an Actor.Yes, it *may* happen the receiver blocks due to ill behaviour.
If the same ill Actor gets messages queued, it will get a queue overflow in most cases anyway. I'd consider this a bug that needs a fix. The performance tradeoff ismassive and forces coarse grained actor design, which on the other hand creates harder-to-balance apps.I see your reasons, for me this is a no go out of practical considerations.
Am Dienstag, 7. Januar 2014 20:41:41 UTC+1 schrieb √:On Tue, Jan 7, 2014 at 7:59 PM, Rüdiger Möller <moru...@gmail.com> wrote:
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.But it requires that workstealing is OK which is a subset of cases.
How is that work stealing. Consider N receivers in identical state responding to the same requests (multicast, so requests are not sent twice). N receiver respond, but the requestor just takes the first response and ignores the other responses.That's called "hot standby"—which may be fine for some use cases but not all: consider a messaged called "LaunchMissiles".
:)) i admit having had some personal experience with the "transaction of death". Gets even better in an event sourced system where restarting involves a message replay incl. the transaction of death ..
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Uhh, that's a very academic point of view.>Blanket statement. Completely depends on what type of call (the morphicity of the callsite, what invoke-instruction), the implementation of the queue, the message being enqueued etc. The Disruptor project has shown that >you can get quite extreme "speed" with "message passing".
Direct (hotspotified) method dispatch from a generated proxy still dwarfes any queue-based dispatch. Its not only the queuing, but the absense of inlining, handcraftet dispatch, allocation of queue entries, cache misses due to object allocation which hurts. Direct dispatch is allocation free.
As long synchronous results are forbidden, this does not affect functionality or behaviour of an Actor.Yes, it *may* happen the receiver blocks due to ill behaviour.> Which is not an appropriate solution for non-academic software, if I may say so.I'd consider it a bug which should be fixed pre-production. There are classes of errors which cannot and should not get "repaired" at runtime, at least not with such a high price.
If the same ill Actor gets messages queued, it will get a queue overflow in most cases anyway. I'd consider this a bug that needs a fix. The performance tradeoff ismassive and forces coarse grained actor design, which on the other hand creates harder-to-balance apps.I see your reasons, for me this is a no go out of practical considerations.> If you want maximum single threaded performance, just use normal code. No need for multithreading at all, just use one thread per logical partition of operations.
Valid point. Downside is, one needs to decide at programming time which work is done single threaded. If one has the "direct dispatch" option, one may do a more fine grained actor design, later on
move some of the "local" actors to other dispatchers (statically by config or dynamically) in case. Additionally dynamic load balancing is applicable e.g. just do a "split" on an overloaded dispatcherthread into 2 different ones. With "always queue" actors the price for this "maybe" split is there even if it turns out your heap of actors consumes 30% of a thread only.
BTW: with new config results look much better :)
- ruediger
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
As long synchronous results are forbidden, this does not affect functionality or behaviour of an Actor.Yes, it *may* happen the receiver blocks due to ill behaviour.> Which is not an appropriate solution for non-academic software, if I may say so.I'd consider it a bug which should be fixed pre-production. There are classes of errors which cannot and should not get "repaired" at runtime, at least not with such a high price.
Am Dienstag, 7. Januar 2014 20:41:41 UTC+1 schrieb √:On Tue, Jan 7, 2014 at 7:59 PM, Rüdiger Möller <moru...@gmail.com> wrote:
>> How is this failover and not "competing consumers"? (i.e. you have to notice someone is down before failing over, death and delay is indistinguishable in distributed systems)
>>
>
>
> Maybe I named it wrong, however one can do delayless failover this way. It doesn't affect any client as the second instance keeps responding, so there is no delay in processing.But it requires that workstealing is OK which is a subset of cases.
How is that work stealing. Consider N receivers in identical state responding to the same requests (multicast, so requests are not sent twice). N receiver respond, but the requestor just takes the first response and ignores the other responses.That's called "hot standby"—which may be fine for some use cases but not all: consider a messaged called "LaunchMissiles".
:)) i admit having had some personal experience with the "transaction of death". Gets even better in an event sourced system where restarting involves a message replay incl. the transaction of death ..
Is Akka doing direct dispatch in case of typed actors on same dispatcher thread (if not: thanks god my bench is not covering this ;-)) ) ?
No, it doesn't, for the reasons mentioned above. Any distributed model based on synchrony seems like a bad idea.
Uhh, that's a very academic point of view.>Blanket statement. Completely depends on what type of call (the morphicity of the callsite, what invoke-instruction), the implementation of the queue, the message being enqueued etc. The Disruptor project has shown that >you can get quite extreme "speed" with "message passing".
Direct (hotspotified) method dispatch from a generated proxy still dwarfes any queue-based dispatch. Its not only the queuing, but the absense of inlining, handcraftet dispatch, allocation of queue entries, cache misses due to object allocation which hurts. Direct dispatch is allocation free.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
That looks like you are using ThreadPoolExecutor, but instead of guessing it would be much nicer if you could just publish the complete config matching these plots.
BTW: do I read that correctly that on Xeon Akka (with its full semantics) scales exactly as well as your “cut corners” prototype? ;-) (which would not surprise me at all … )
Am Freitag, 10. Januar 2014 11:05:49 UTC+1 schrieb rkuhn:That looks like you are using ThreadPoolExecutor, but instead of guessing it would be much nicer if you could just publish the complete config matching these plots.
I use the config someone posted above
// Create an Akka system
ActorSystem system = ActorSystem.create("PiSystem", ConfigFactory.parseString(
"akka {\n" +
" actor.default-dispatcher {\n" +
" fork-join-executor {\n" +
" parallelism-min = 2\n" +
" parallelism-factor = 0.4\n" +

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/cIa580Z1RLk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.
On Fri, Jan 10, 2014 at 11:58 AM, Rüdiger Möller <moru...@gmail.com> wrote:
Am Freitag, 10. Januar 2014 11:05:49 UTC+1 schrieb rkuhn:That looks like you are using ThreadPoolExecutor, but instead of guessing it would be much nicer if you could just publish the complete config matching these plots.
I use the config someone posted above
// Create an Akka system
ActorSystem system = ActorSystem.create("PiSystem", ConfigFactory.parseString(
"akka {\n" +
" actor.default-dispatcher {\n" +
" fork-join-executor {\n" +
" parallelism-min = 2\n" +
" parallelism-factor = 0.4\n" +
Cool, I recommend you to tune the parellelism-factor between 0.3 and 1.0 to find the optimum.