How to handle long GC in AKKA Actor model system

1,173 views
Skip to first unread message

seetaramayya vadali

unread,
May 15, 2015, 9:51:43 AM5/15/15
to akka...@googlegroups.com

Hi Akka Team,

    I am not expert in akka echo system. I am facing the following challenge, please advise me what to do.

    I have akka cluster with two nodes(say node1 and node2). Both of them seed nodes (I really don’t care who is leading).  Node2 is very very big machine.  GC takes 30 mins (during this period machine hangs). Please don’t suggest for distributed application which I know (I am working in crawling environment so it takes ages to go to distributed environment).

   If I set `auto-down-unreachable-after = 1800s` in akka configuration, what I understood is, there are high chances two clusters will be formed in case of network partition. That means messages from node1 to node2 (or vice versa) will not happen.

   What do you suggest in this scenario?

   What is important for me is

·         The order of node1 and node2 start up should not matter. (In my application messages are guaranteed )

·         If GC happens (30mins system hangs), after GC node2 automatically be available in the cluster.

 

  I hope I explained well. Thanks a lot for your help.

 

Regards,
Seeta Vadali

Martynas Mickevičius

unread,
May 16, 2015, 5:27:22 AM5/16/15
to akka...@googlegroups.com
Hi Seeta,

there is an acceptable-heartbeat-pause setting for remoting and for clustering. You may want to try increasing that one. But 30 minutes is a really long time for acceptable heartbeat pause, which may render cluster very slow to react and progress when nodes goes really down.

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Martynas Mickevičius
TypesafeReactive Apps on the JVM

Soumya Simanta

unread,
May 16, 2015, 9:05:57 PM5/16/15
to akka...@googlegroups.com
How do you know that GC is taking 30 minutes ? I have seen a GC pause that long. Just curious. 

seetaramayya vadali

unread,
May 18, 2015, 10:04:26 AM5/18/15
to akka...@googlegroups.com
Thanks Martynas for pointing me to heartbeat configuration property. I'll try and get back to you.


@Soumya, I didn't understand your question. Based on my understanding I am answering, if you have terribly huge (not very huge it is just 1 TB) heap space then JVM is very tired to clean it up.  

Stuart Small

unread,
May 18, 2015, 1:44:59 PM5/18/15
to akka...@googlegroups.com
I think he meant "I have never seen a GC pause that long".  I'm in the same boat, that's an insanely long pause.

So when I say this, keep in mind I am java guy who is pretty new to akka and scala, so this isn't expert advice. :)   But tuning akka doesn't seem like the appropriate place for this.  Either there is something in your application layer that requires this massive heap that could be re-evaluated or there is tuning you can do at the JVM layer that well help it handle GCs better.

I'm going to guess the majority of that heap is living in swap and that's why your GCs are so long?

Soumya Simanta

unread,
May 18, 2015, 2:18:17 PM5/18/15
to akka...@googlegroups.com



@Soumya, I didn't understand your question. Based on my understanding I am answering, if you have terribly huge (not very huge it is just 1 TB) heap space then JVM is very tired to clean it up.  

My question is - how do you know that the GC is pause is that long. Have you tried using VisualVM/YourKit to profile your JVM. 
Can you try with a smaller heap size. Say 64G. How much "actual" RAM do you have on your box ? 
 

Soumya Simanta

unread,
May 18, 2015, 2:20:44 PM5/18/15
to akka...@googlegroups.com



So when I say this, keep in mind I am java guy who is pretty new to akka and scala, so this isn't expert advice. :)   But tuning akka doesn't seem like the appropriate place for this.  Either there is something in your application layer that requires this massive heap that could be re-evaluated or there is tuning you can do at the JVM layer that well help it handle GCs better.

I agree. If this is a GC issue one needs to carefully look at the application. But still a GC pause of 30 minutes looks like an eternity to me. No impossible but very unlikely IMO. 
 
I'm going to guess the majority of that heap is living in swap and that's why your GCs are so long?

Interesting. Lets see how much physical RAM is on the machine. 

 

 

seetaramayya vadali

unread,
May 18, 2015, 2:39:56 PM5/18/15
to akka...@googlegroups.com
Guys may be I didn't stress the point of distribution stuff well. I agree with you all but if you see my first post 3rd line. May I am bit afraid that our discussion side tracking actual question. 

Its my fault I didnt explain well. 

  • AKKA is NOT causing GC problems ( our application is in production from last 6 or 7 years, it is written in Java)
  • With my (current team) interest towards and akka and scala slowly introducing stuff 
  • As we are moving towards akka direction we are encountering different challenges. One among them is what I specified over here. 
  • We are not taking akka advantages such as scalability and fault tolerence etc.. We are using just for broad cast messages between 3 JVMs (one among them has 880 GB heap space and 1 TB ram) 
I can not change infra all of the sudden. As a developer I suggest them about distributed advantages but decision is not in my hand. I hope I explained well.

seetaramayya vadali

unread,
May 18, 2015, 2:48:24 PM5/18/15
to akka...@googlegroups.com
Hi Martynas,

     After adding acceptable-heartbeat-pause and removing auto-down-unreachable-after  both nodes formed different clusters. I am still exploring failure-detector configuration part. If any one have any other suggestions I can save some time :) .

     Thanks a lot to every response to this thread. 
  
Regards,
Seeta

Michael Frank

unread,
May 18, 2015, 3:12:57 PM5/18/15
to akka...@googlegroups.com
On 05/18/15 11:48, seetaramayya vadali wrote:
Hi Martynas,

     After adding acceptable-heartbeat-pause and removing auto-down-unreachable-after  both nodes formed different clusters. I am still exploring failure-detector configuration part. If any one have any other suggestions I can save some time :) .

     Thanks a lot to every response to this thread.

* What java version are you using?
* How many cpu cores are available to you?
* What sort of garbage collection tuning have you done and what are your current GC parameters?

a 30 minute stop-the-world GC pause seems crazy, but i've never seen a host with 1TB of memory :)

-Michael

seetaramayya vadali

unread,
May 18, 2015, 3:19:21 PM5/18/15
to akka...@googlegroups.com
  • We are using Java 7
  • 32 cores I guess ( I am not sure though )
  • Considering Azul 
I didnt understand how come GC tuning matters with node joining the cluster. Is there any relation ? 



--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/9ZSFvxegNUY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Regards,
Seeta Ramayya Vadali

Patrik Nordwall

unread,
May 18, 2015, 4:37:47 PM5/18/15
to akka...@googlegroups.com


18 maj 2015 kl. 21:19 skrev seetaramayya vadali <srva...@gmail.com>:

  • We are using Java 7
  • 32 cores I guess ( I am not sure though )
  • Considering Azul 
I didnt understand how come GC tuning matters with node joining the cluster. Is there any relation ? 

You will not have a smooth experience if you are trying to use Akka Cluster across those machines. An Akka Cluster should be used between tightly coupled nodes, and something that is unresponsive for 30 minutes cannot fall into that category.

I would recommend something else for the communication. For example HTTP or an external message broker (Kafka, ActiveMQ, ...)

/Patrik

PS. I join the club that have never seen an attempt to use that big heap. That is the root cause of the problem, but you have made it clear that you cannot change that.

You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.

Michael Frank

unread,
May 19, 2015, 2:31:30 PM5/19/15
to akka...@googlegroups.com
patrik answered this much better than i could have :)  i was thinking that your experience would be better if you could tune the JVM to be more responsive.  but you're right, it didn't directly address your original question, sorry for the confusion.

i will note, since you're on java7, you might look into the G1 collector in order to reduce your GC pauses: http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html.

some documentation on tuning G1: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/G1GettingStarted/index.html

-Michael
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.

seetaramayya vadali

unread,
May 20, 2015, 4:29:32 AM5/20/15
to akka...@googlegroups.com
Thanks a lot for your valuable inputs. I understood what to do.

Regards,
Seeta 
Reply all
Reply to author
Forward
0 new messages