Aggregate counts in multi-jvm testing

64 views
Skip to first unread message

alex.f....@gmail.com

unread,
Dec 24, 2015, 12:15:59 PM12/24/15
to Akka User List
Is there any way to combines variables from different jvm's in multi-jvm testing?

More concretely, suppose each jvm has a counting variable (which, for eg, counts the number of jobs that have been sent to them), and I want to check that their sums is equal to some number (for eg. the total number of jobs sent out).

Thanks!
Alex

Konrad Malawski

unread,
Dec 24, 2015, 4:19:37 PM12/24/15
to akka...@googlegroups.com, alex.f....@gmail.com
You're in a distributed system – how would you do it there?
"The Akka way" of doing it is messaging :-)

Send a message to the "cluster actor", which each JVM has, 
from the same sender,
collect the replies,
once you got all of them (and you know, because you know how many questions you've sent),
you know the total number => success!

Happy hakking!

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Guido Medina

unread,
Dec 25, 2015, 5:22:03 AM12/25/15
to Akka User List
Or CRDT with any distributed data structure, I can mention a couple: Hazelcast, Riak, etc

Basically you want a distributed counter.

Konrad Malawski

unread,
Dec 25, 2015, 8:51:28 AM12/25/15
to akka...@googlegroups.com, Guido Medina
Without going much deeper – I certainly wouldn't add any of these to a project just to get a CRDT counter ;-)

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

On 25 December 2015 at 11:22:07, Guido Medina (oxy...@gmail.com) wrote:

Or CRDT with any distributed data structure, I can mention a couple: Hazelcast, Riak, etc

Basically you want a distributed counter.

Guido Medina

unread,
Dec 25, 2015, 8:59:22 AM12/25/15
to Akka User List
I will need distributed data for my project in the future so it was good you mentioned it cause I didn't know akka had it.

Konrad Malawski

unread,
Dec 25, 2015, 9:13:18 AM12/25/15
to akka...@googlegroups.com, Guido Medina
After having consulted a number of customers who included or previously-had hazlecast clusters along with 
Akka apps I remain unconvienced what hazlecast added to the table (other than - "yet another cluster, with different
and detached lifecycle to manage"). I'd avoid adding more separate independent clusters if possible.

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

On 25 December 2015 at 14:59:28, Guido Medina (oxy...@gmail.com) wrote:

I will need distributed data for my project in the future so it was good you mentioned it cause I didn't know akka had it.

alex.f....@gmail.com

unread,
Dec 25, 2015, 10:43:17 AM12/25/15
to Akka User List, oxy...@gmail.com
Exactly, I wouldn't want to add another piece to get those counters for the test.  So I'm definitely gonna use akka's distributed data for this.

@Konrad 'ktoso’ Malawski: I'm curious about your comment on Hazelcast (because I'm actually thinking of using it). Suppose I need to maintain distributed data what should I use instead? akka surely provides distributed data, but it's still experimental. Moreover, it doesn't seem to provide functionalities like locking or queries. The data is also replicated on all nodes which can be too much redundancies (or is it configurable?).

Alex Reisberg

unread,
Dec 25, 2015, 11:47:22 AM12/25/15
to akka...@googlegroups.com
I see. Thanks a lot!

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/3LbOxIXkfTA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

alex.f....@gmail.com

unread,
Dec 27, 2015, 10:38:14 AM12/27/15
to Akka User List
Just realized that due to confusion, I added the "I see. Thanks a lot!" (which was supposed to be for a different thread). I'm still curious to hear more about the comment on Hazelcast.

@Konrad 'ktoso’ Malawski: I'm curious about your comment on Hazelcast (because I'm actually thinking of using it). Suppose I need to maintain distributed data what should I use instead? akka surely provides distributed data, but it's still experimental. Moreover, it doesn't seem to provide functionalities like locking or queries. The data is also replicated on all nodes which can be too much redundancies (or is it configurable?).

Thanks!

On Friday, December 25, 2015 at 10:47:22 AM UTC-6, Alex Reisberg wrote:
I see. Thanks a lot!
To unsubscribe from this group and all its topics, send an email to akka-user+unsubscribe@googlegroups.com.

Konrad Malawski

unread,
Dec 27, 2015, 10:44:50 AM12/27/15
to akka...@googlegroups.com, alex.f....@gmail.com
Just @Konrad or @ktoso is fine ;-)

In order to properly answer we'd have to sit down together and think about your requirements,
deployment strategy etc. The short story is: you'll hazlecast maintains it's own cluster, it has some 
kind of failure detection. Akka is a cluster, it has some kind of failure detection. Both are *in process*,
in the same JVM. 

Now: what happens if Hazlecast decides there's a partition, but Akka does still see the other node just fine?
I've seen real deployments where this, or similar situations, caused all kinds of chaos and inconsistent state.
Sure, one can blame it on the dev team that "well, of course one has to take into account these things!", but that's
not a helpful stance as it simply gets exponentially more hard. It depends what for you'd use hazlecast though.

Since your question was on – I need to sum a counter from different nodes, that's trivial to implement using Akka,
so I would not add such additional heavy dependency. If you need CRDTs, you can try Distribtued Data.

The gist of that message is: think twice before adding another in process cluster to your project because you need 
a small part of functionality (which you could implement using Akka easily) – it comes with a large deployment 
and understandability cost.


-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

alex.f....@gmail.com

unread,
Dec 27, 2015, 12:15:28 PM12/27/15
to Akka User List, alex.f....@gmail.com
@Konrad: Thanks for the great answer! Disagreement on failure can indeed be a pain.

What I'm trying to do is essentially one of the first exercises one would do with akka: a chat app with online/offline notification. For passing messages around, one option is to use akka's distributed pubsub. The list of online people (and their friends) can be implemented using akka's distributed data. But now, there's one thing that I'm not too happy about: both distributed pubsub and distributed data seem to replicate all data onto all nodes instead of spreading out to the cluster. And this was the reason why I was thinking of Hazelcast. What would a pure akka implementation of this be?

Thanks!
Alex

To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Guido Medina

unread,
Dec 28, 2015, 5:04:50 AM12/28/15
to Akka User List
I was planning to have Hazelcast start/stop with each ActorSystem programmatically, Hazelcast API is as friendly as akka's

But again if akka has its own in-memory grid then no need, also think if you only need a counter or other things from a toolkit like Hazelcast and if the things you will need can be accomplished with akka alone.

HTH,

Guido.

Konrad Malawski

unread,
Dec 28, 2015, 5:12:25 AM12/28/15
to akka...@googlegroups.com, Guido Medina
API simplicity is one thing – runtime, failure and recovery semantics are another, that's there the dragon lies.
My only point here is, that it's tricky to have 2 clusters being running in the same jvm, because there's much more
failure scenarios you suddenly have to think about: 
- what if Akka detects unreachable, Hazlecast doesnt
- what if Akka downs a node, Hazlecast doesnt
- what if Hazlecast can't connect to the other node, Akka still can without problems

Those are where the problems show up. Ease of use APIs are nice, but not addressing these problems -
they're up to the developers to guard against. So I'd avoid including any other clustering solution unless you really need it.
For "just a counter" it's definitely not worth such increase of complexity IMHO.

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

Konrad Malawski

unread,
Dec 28, 2015, 5:16:45 AM12/28/15
to akka...@googlegroups.com, Guido Medina

But now, there's one thing that I'm not too happy about: both distributed pubsub and distributed data seem to replicate all data onto all nodes instead of spreading out to the cluster. 

That's not true – pubsub only sends data to such nodes, which subscribed to a given topic.

If all people are in all chat rooms, then yes it will be sent to all nodes. But is that really the case? Some people are in `apples chat room` and you can co-locate them, others are on `oranges chat room` and you can co-locate them, decreasing the amount of traffic and network hits a lot. With a distributed map it's a bit weird to do such things.

And this was the reason why I was thinking of Hazelcast. What would a pure akka implementation of this be?

Exactly what you explained and I don't see a problem with it.

"Let's add a distributed map" does the same thing - all nodes see the updates, so what would the upside be? Downsides in terms of increasing complexity I've explained.


Note: I don't mean to say bad things about Hazlecast here – in this scenario we're talking about though I don't see the need for it, but I see a lot of mental and complexity cost associated with introducing it. 

Hope this helps!

-- Konrad

alex.f....@gmail.com

unread,
Dec 28, 2015, 3:08:04 PM12/28/15
to Akka User List, oxy...@gmail.com
@Konrad: Great! It seems that I totally misunderstood what I saw in the akka source (which I only took a cursory look and jumped to conclusion too quickly).

So just to confirm my understanding: suppose I have a topic named "apple", then pubsub only sends the subscription data of "apple" to nodes with "apple" subscribers? From the source, I see that on each node, there's a Topic actor that takes care of local subscribers (and maybe this is what you refer to as co-location?) But then, a Publisher can be on any node, and still needs access to the list of Topic actors. So this list of Topic Actors must then be replicated across the cluster, right?

So it seems like if the number of topics is small and and the number of subscribers for each topic is large, then this is an efficient way of doing things. However, if the number of topics is large, and each topic might only have a couple of subscribers (for eg, in the chat example, each topic for each user) then a large amount of data (i.e. list of topic actors) has to be replicated across the cluster? As a general question, is pubsub the right tool to deal with this kind of problem?

I'm a complete newbie (only started with akka recently), and your answers have been great help. Thanks a lot!

Alex

PS. This seems to be very off topic from the original question.  Should I start a new thread?

Konrad Malawski

unread,
Dec 29, 2015, 9:40:00 AM12/29/15
to akka...@googlegroups.com, alex.f....@gmail.com, oxy...@gmail.com
Let's hop on a new thread with this, would you mind opening one? Thanks :)

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Typesafe

--

alex.f....@gmail.com

unread,
Dec 29, 2015, 6:06:29 PM12/29/15
to Akka User List, alex.f....@gmail.com, oxy...@gmail.com
Reply all
Reply to author
Forward
0 new messages