Scalability in presence of large numbers of actors stopping

39 views
Skip to first unread message

Chris Marshall

unread,
Apr 13, 2012, 7:38:01 AM4/13/12
to akka...@googlegroups.com
I have just tried an experiment whereby I have created large numbers of actors (~120k) in very quick succession. Each actor should stop itself roughly ten minutes after being created upon receiving a system-wide "ping" from its parent. The ping happens once-per-minute whereas actors are being created at the rate of 50-200/s, so I would expect any given ping to result in around 3,500 actors being stopped.

The system was running very smoothly until the actors started to stop themselves; at this point my PC ground to a halt with the CPU maxed out on the process. At this point, a JConsole told me that all of the dispatchers were doing this:

Name: eucleia-akka.actor.default-dispatcher-25
State: RUNNABLE
Total blocked: 6,000  Total waited: 44,098

Stack trace: 
scala.collection.immutable.VectorIterator.initFrom(Vector.scala:621)
scala.collection.immutable.VectorPointer$class.initFrom(Vector.scala:727)
scala.collection.immutable.VectorIterator.initFrom(Vector.scala:621)
scala.collection.immutable.Vector.initIterator(Vector.scala:61)
scala.collection.immutable.Vector.iterator(Vector.scala:68)
scala.collection.immutable.Vector.iterator(Vector.scala:36)
scala.collection.IterableLike$class.exists(IterableLike.scala:78)
scala.collection.immutable.Vector.exists(Vector.scala:36)
scala.collection.SeqLike$class.contains(SeqLike.scala:401)
scala.collection.immutable.Vector.contains(Vector.scala:36)
akka.event.ActorClassification$class.dissociateAsMonitor$1(EventBus.scala:285)
akka.event.ActorClassification$class.dissociate(EventBus.scala:292)
akka.actor.LocalDeathWatch.dissociate(ActorRefProvider.scala:558)
akka.actor.LocalDeathWatch.publish(ActorRefProvider.scala:561)
akka.actor.LocalDeathWatch.publish(ActorRefProvider.scala:558)
akka.actor.ActorCell.doTerminate(ActorCell.scala:692)
akka.actor.ActorCell.terminate$1(ActorCell.scala:580)
akka.actor.ActorCell.systemInvoke(ActorCell.scala:597)
akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:191)
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:180)
akka.dispatch.Mailbox.run(Mailbox.scala:161)
akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:505)
akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:997)
akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1495)
akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)


Name: eucleia-akka.actor.default-dispatcher-27
State: RUNNABLE
Total blocked: 6,607  Total waited: 53,355

Stack trace: 
akka.event.ActorClassification$class.dissociateAsMonitor$1(EventBus.scala:280)
akka.event.ActorClassification$class.dissociate(EventBus.scala:292)
akka.actor.LocalDeathWatch.dissociate(ActorRefProvider.scala:558)
akka.actor.LocalDeathWatch.publish(ActorRefProvider.scala:561)
akka.actor.LocalDeathWatch.publish(ActorRefProvider.scala:558)
akka.actor.ActorCell.doTerminate(ActorCell.scala:692)
akka.actor.ActorCell.terminate$1(ActorCell.scala:580)
akka.actor.ActorCell.systemInvoke(ActorCell.scala:597)
akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:191)
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:180)
akka.dispatch.Mailbox.run(Mailbox.scala:161)
akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:505)
akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:997)
akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1495)
akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

After approximately 15 minutes, the system went from 90% CPU usage down to 25% usage (at this point, there were still roughly ~110k actors running). Over the following 5 minutes, the actors gradually stopped at a rate of ~3.5k/minute and then, very quickly (in a matter of seconds) the remaining 90k actors shut down.

I'm not entirely clear what the system was doing during the period it was maxed out (originally I thought it was the stopping of actors was doing some O(N) traversal for each actor stopping - but now I'm not so sure because of how it sped up).

 - Can anyone explain the behaviour?
 - Are there any suggestions for how I might architect the system to avoid it?

Chris

Viktor Klang

unread,
Apr 13, 2012, 7:44:09 AM4/13/12
to akka...@googlegroups.com
As I replied on twitter, it's already been fixed in master (a while back).
Couldn't be included in 2.0.1 because of binary compatibility (sorry guys, those of you who value performance over binary compat will have to build a custom version of the "release-2.0"-branch by reverting this commit: https://github.com/akka/akka/commit/2a3c6d14bda05a761702a532d7a2a4666de01733 ).

Cheers,


Chris

--
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To post to this group, send email to akka...@googlegroups.com.
To unsubscribe from this group, send email to akka-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/akka-user?hl=en.

√iktor Ҡlang

unread,
Apr 13, 2012, 7:57:31 AM4/13/12
to akka...@googlegroups.com
Which raises the question about the cost of binary compatibility.

This particular fix will give much better performance for cleaning up dying actors, but anyone who has extended ActorClassification and manually tinker with the mappings will suffer from binary incompatibility.

Should we include this fix for RC2 and document the incompatibility, or shall everyone suffer?

Cheers,
--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

amulya rattan

unread,
Apr 13, 2012, 8:06:38 AM4/13/12
to akka...@googlegroups.com
Viktor,

Could you explain why this binary incompatibility exists? Hasn't Akka always maintained its binary compatibility?

~Amulya

2012/4/13 √iktor Ҡlang <viktor...@gmail.com>

√iktor Ҡlang

unread,
Apr 13, 2012, 8:08:26 AM4/13/12
to akka...@googlegroups.com
On Fri, Apr 13, 2012 at 2:06 PM, amulya rattan <talk2...@gmail.com> wrote:
Viktor,

Could you explain why this binary incompatibility exists?

See the commit I linked to.
 
Hasn't Akka always maintained its binary compatibility?

I'd say that it has _never_ actively maintained binary compatibility.
Nobody has ever been forced to upgrade.

Cheers,

amulya rattan

unread,
Apr 13, 2012, 8:33:46 AM4/13/12
to akka...@googlegroups.com
Understood. Is that the only change resulting in the performance gain? Seems like a very trivial change for fixing a major performance issue.
About having to include the fix in RC2 or not, I, for one, would hope that the changes go in with binary incompatibility properly documented.

√iktor Ҡlang

unread,
Apr 13, 2012, 8:41:42 AM4/13/12
to akka...@googlegroups.com
On Fri, Apr 13, 2012 at 2:33 PM, amulya rattan <talk2...@gmail.com> wrote:
Understood. Is that the only change resulting in the performance gain? Seems like a very trivial change for fixing a major performance issue.

It's beyond trivial to check out release-2.0-branch and do a git revert on the commit I showed and then just package akka-actor.jar and use that.
 
About having to include the fix in RC2 or not, I, for one, would hope that the changes go in with binary incompatibility properly documented.

We're going to stand by our word that we're being binary compatible between .x releases for Akka 2.0.
So it won't be included in the RC2, if you are bitten by this performance issue then it's fairly easy to build it yourself as described above,
and if you are a Typesafe Subscription customer we can always discuss the possibility of cutting an official, supported but binary incompatible version of 2.0.1.
Reply all
Reply to author
Forward
0 new messages