Sharding problem when restarting Cluster

1,235 views
Skip to first unread message

Moax76

unread,
Aug 6, 2014, 4:41:21 AM8/6/14
to akka...@googlegroups.com
Hi,

We have a project that is using Akka Persistence (2.3.4) with sharding in a cluster (with 2 nodes).
It has 3 differnt types of AbstractPersistentActor all using Sharding.

When the cluster is running, everything works fine and it usually works to stop the cluster and restart it again. But from time to time the following happens when trying to start the cluster:

When starting the first node (server1), the logs gets flodded with log-entries like this:

    ERROR akka.actor.OneForOneStrategy akka://NCSSystem/user/data/connectionCoordinator/singleton/coordinator :
    requirement failed
: Region Actor[akka.tcp://NCSSystem@server2/user/data/connection#789496176] not registered: State(Map(),Map(),Set())
    java
.lang.IllegalArgumentException: requirement failed: Region Actor[akka.tcp://NCSSystem@server2/user/data/connection#789496176] not registered: State(Map(),Map(),Set())
            at scala
.Predef$.require(Predef.scala:233) ~[scala-library-2.10.4.jar:na]
            at akka
.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1115) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
            at akka
.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1236) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
            at scala
.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) ~[scala-library-2.10.4.jar:na]
            at scala
.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.4.jar:na]
            at scala
.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) ~[scala-library-2.10.4.jar:na]
            at akka
.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:176) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
            at akka
.persistence.Recovery$State$class.processPersistent(Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$$anon$1.processPersistent(Recovery.scala:95) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$$anon$1.aroundReceive(Recovery.scala:101) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Recovery$class.aroundReceive(Recovery.scala:256) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.contrib.pattern.ShardCoordinator.akka$persistence$Eventsourced$$super$aroundReceive(ClusterSharding.scala:1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
            at akka
.persistence.Eventsourced$$anon$1.aroundReceive(Eventsourced.scala:35) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
            at akka
.contrib.pattern.ShardCoordinator.aroundReceive(ClusterSharding.scala:1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
            at akka
.actor.ActorCell.receiveMessage(ActorCell.scala:516) [akka-actor_2.10-2.3.4.jar:na]
            at akka
.actor.ActorCell.invoke(ActorCell.scala:487) [akka-actor_2.10-2.3.4.jar:na]
            at akka
.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.4.jar:na]
            at akka
.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.4.jar:na]
            at akka
.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.4.jar:na]
            at scala
.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na]
            at scala
.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na]
            at scala
.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na]
        at scala
.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na]




This happens for all 3 coordinators and it does not look like it will never stop logging. The connection#789496176 is not repeating.

The only way I have found to recover from this situation is to manually delete all journal- and snapshot-entries for the coordinators.

Any help or hint would be appreciated.

Best regards,
Morten Kjetland

Konrad Malawski

unread,
Aug 6, 2014, 6:40:44 AM8/6/14
to Akka User List
Hi Morten, 
thanks for reporting!
Which journal plugin are you using?

It looks like during replay it gets an ShardHomeAllocated without getting ShardRegionProxyRegistered first – which makes it blow up (we must first register, then allocate the shard).

One reason could be that the persist of ShardRegionProxyRegistered never succeeded...? 
Would you be able to verify if your journal contains such these events (or if SRPR is missing)?
It would be great to track down to the root of this problem. It could be a bug on our side, but hard to pinpoint exactly yet.

--
Cheers,
Konrad 'ktoso' Malawski
hAkker @ Typesafe


Morten Kjetland

unread,
Aug 7, 2014, 5:30:59 AM8/7/14
to akka-user
Hi,

Turns out there was a bug in our homebrew jdbc-snapshot implementation.

The loaded SelectedSnapshot was populated with Option(state) instead of just the state, so the following lines in ShardCoordinator was not executed:

 case SnapshotOffer(_, state: State) ⇒
      log.debug("receiveRecover SnapshotOffer {}", state)
      persistentState = state

The snapshot was therefor never applied, so when it started receiving events with sequenceNr after the snapshot, it blew up.

Thanks a lot for helping me in the right direction.

Best regards,
Morten Kjetland


On Wed, Aug 6, 2014 at 2:12 PM, Morten Kjetland <m...@kjetland.com> wrote:
Thanks the response,

We are using a homebrew jdbc journal.

I checked the journal and ShardRegionProxyRegistered is written to it.
But I was unable to reproduce the problem now.
It might be a problem related to snapshoting in combination with a bug in our jdbc journal.
I'll try to reproduce it later and check the db again.

I just saw that https://github.com/dnvriend/akka-persistence-jdbc was worked on during the summer, so I'll try to use that one instead of our own, and see if the problem goes away.

Best regards,
Morten Kjetland


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Konrad Malawski

unread,
Aug 7, 2014, 5:45:36 AM8/7/14
to Akka User List
Great to hear you've found the problem!
We'll provide a TCK for journal plugins with the next release (minor already), so I suggest grinding your custom plugin with it to see if it's really valid :-)

Happy hakking!

Richard Bowker

unread,
Nov 10, 2014, 8:42:08 AM11/10/14
to akka...@googlegroups.com
I have had seen a similar problem when restarting nodes in a cluster using sharding.

after restarting, the node with the shard coordinator went into an infinite error loop.

I was using akka 2.3.6 and "com.github.krasserm" %% "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.

section of the error log below, I didn not know what to do to recover this other than just manually delete all the akka keystores from the database which obviously isn't ideal!

any thoughts?

thanks

[ERROR] [11/10/2014 13:16:21.969] [ClusterSystem-akka.actor.default-dispatcher-17] [akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator] 

requirement failed: Region Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: State(Map(test47 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test29 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test14 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test28 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test32 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 

-> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> 


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test45 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map

(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, 

test19, test25, test30, test38, test47), Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, 

test0, test41, test23, test45, test35, test2, test9, test14, test22, test29, test37, test43)),Set(Actor

java.lang.IllegalArgumentException: requirement failed: Region Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 

State(Map(test47 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test29 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test14 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test28 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test22 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test11 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test37 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test23 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test45 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, 

test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, test19, test25, test30, test38, test47), Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, test0, test41, test23, test45, test35, test2, 

test9, test14, test22, test29, test37, test43)),Set(Actor[akka.tcp://Cluste...@172.31.24.129:49813/user/sharding/PollService#1785638107]))
        at scala.Predef$.require(Predef.scala:219)
        at akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1115)
        at akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1236)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168)
        at akka.persistence.Recovery$class.runReceive(Recovery.scala:48)
        at akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1192)
        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
        at akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185)
        at akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1192)
        at akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33)
        at akka.persistence.Recovery$$anon$1.processPersistent(Recovery.scala:104)
        at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.scala:110)
        at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)
        at akka.contrib.pattern.ShardCoordinator.akka$persistence$Eventsourced$$super$aroundReceive(ClusterSharding.scala:1192)
        at akka.persistence.Eventsourced$$anon$1.aroundReceive(Eventsourced.scala:35)
        at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
        at akka.contrib.pattern.ShardCoordinator.aroundReceive(ClusterSharding.scala:1192)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

[ERROR] [11/10/2014 13:16:21.978] [ClusterSystem-akka.actor.default-dispatcher-15] [akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator] 

requirement failed: Region Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: State(Map(test47 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test29 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test14 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test28 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test32 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 

-> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> 


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test45 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map

(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, 

test19, test25, test30, test38, test47), Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, 

test0, test41, test23, test45, test35, test2, test9, test14, test22, test29, test37, test43)),Set(Actor

java.lang.IllegalArgumentException: requirement failed: Region Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 

State(Map(test47 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test29 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test14 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test28 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor


[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test22 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test11 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test37 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test23 -> 

Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test45 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, 

test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, test19, test25, test30, test38, test47), Actor

[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, test0, test41, test23, test45, test35, test2, 

test9, test14, test22, test29, test37, test43)),Set(Actor[akka.tcp://Cluste...@172.31.24.129:49813/user/sharding/PollService#1785638107]))
        at scala.Predef$.require(Predef.scala:219)
        at akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1115)
        at akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1236)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168)
        at akka.persistence.Recovery$class.runReceive(Recovery.scala:48)
        at akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1192)
        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
        at akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185)
        at akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1192)
        at akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33)
        at akka.persistence.Recovery$$anon$1.processPersistent(Recovery.scala:104)
        at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.scala:110)
        at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)
        at akka.contrib.pattern.ShardCoordinator.akka$persistence$Eventsourced$$super$aroundReceive(ClusterSharding.scala:1192)
        at akka.persistence.Eventsourced$$anon$1.aroundReceive(Eventsourced.scala:35)
        at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
        at akka.contrib.pattern.ShardCoordinator.aroundReceive(ClusterSharding.scala:1192)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Patrik Nordwall

unread,
Nov 10, 2014, 11:04:18 AM11/10/14
to akka...@googlegroups.com
Hi Richard,

That is not good. We have seen similar issue a few times and tracked it down to bugs in the journal implementations. It will happen when events are replayed in the wrong order.

Is there a way we can reproduce this?

Regards,
Patrik

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Richard Bowker

unread,
Nov 10, 2014, 11:18:53 AM11/10/14
to akka...@googlegroups.com
Hi Patrik, unfortunately not.  In fact its only happened once to me so far so may be a difficult one to reproduce.

I will of course get back to you if I can find a trigger.

Tackar!


On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
Hi Richard,

That is not good. We have seen similar issue a few times and tracked it down to bugs in the journal implementations. It will happen when events are replayed in the wrong order.

Is there a way we can reproduce this?

Regards,
Patrik
On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker <mechajoh...@googlemail.com> wrote:
I have had seen a similar problem when restarting nodes in a cluster using sharding.

after restarting, the node with the shard coordinator went into an infinite error loop.

I was using akka 2.3.6 and "com.github.krasserm" %% "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.

section of the error log below, I didn not know what to do to recover this other than just manually delete all the akka keystores from the database which obviously isn't ideal!

any thoughts?

thanks

[ERROR] [11/10/2014 13:16:21.969] [ClusterSystem-akka.actor.default-dispatcher-17] [akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator] 

requirement failed: Region Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: State(Map(test47 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test29 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test28 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 

-> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> 


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test45 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map

(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, 

test19, test25, test30, test38, test47), Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, 

test0, test41, test23, test45, test35, test2, test9, test14, test22, test29, test37, test43)),Set(Actor

java.lang.IllegalArgumentException: requirement failed: Region Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 

State(Map(test47 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test29 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test28 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test11 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test37 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test23 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test45 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, 

test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, test19, test25, test30, test38, test47), Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, test0, test41, test23, test45, test35, test2, 
requirement failed: Region Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: State(Map(test47 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test29 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test28 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test0 

-> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> 


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test45 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map

(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, 

test19, test25, test30, test38, test47), Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, 

test0, test41, test23, test45, test35, test2, test9, test14, test22, test29, test37, test43)),Set(Actor

java.lang.IllegalArgumentException: requirement failed: Region Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 

State(Map(test47 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test29 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test18 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test36 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test28 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> Actor


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test20 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test33 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 


[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test11 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 

test37 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test16 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test9 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test23 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test45 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test38 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test8 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], test5 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> Vector(test6, 

test18, test28, test20, test33, test44, test16, test34, test8, test24, test5, test11, test19, test25, test30, test38, test47), Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] -> Vector(test42, test36, test32, test15, test0, test41, test23, test45, test35, test2, 

--
>>>&g
...

Patrik Nordwall

unread,
Nov 10, 2014, 11:35:24 AM11/10/14
to akka...@googlegroups.com
On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <mechajoh...@googlemail.com> wrote:
Hi Patrik, unfortunately not.  In fact its only happened once to me so far so may be a difficult one to reproduce.

I will of course get back to you if I can find a trigger.

Thanks. That is problematic with these bugs. I don't know much about Cassandra. Is it possible to export the data from cassandra if this happens again so it could be analyzed (replayed) by us? Given that you don't have any sensitive information in it.

/Patrik



--

Richard Bowker

unread,
Nov 10, 2014, 11:36:55 AM11/10/14
to akka...@googlegroups.com
sure, no problem.
...

Richard Bowker

unread,
Nov 11, 2014, 7:34:03 AM11/11/14
to akka...@googlegroups.com
Hi Patrik, I have managed to repro it twice again. We have typesafe support so I will get in touch with them to discuss how best to send the repro setup, as it's not a simple attachment!

thanks

Rich
...

Patrik Nordwall

unread,
Nov 11, 2014, 8:40:28 AM11/11/14
to akka...@googlegroups.com
On Tue, Nov 11, 2014 at 1:34 PM, Richard Bowker <mechajoh...@googlemail.com> wrote:
Hi Patrik, I have managed to repro it twice again. We have typesafe support so I will get in touch with them to discuss how best to send the repro setup, as it's not a simple attachment!

Excellent!
Thanks
 

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Anders Båtstrand

unread,
Jun 9, 2015, 5:27:18 AM6/9/15
to akka...@googlegroups.com
Is there any news about this? I have experienced the same, using Akka 2.3.10 and Cassandra (latest version of the plugin).

Best regards,

Anders
...

Brandon Arp

unread,
Jun 9, 2015, 5:58:11 PM6/9/15
to akka...@googlegroups.com
I am seeing this as well with Akka 2.3.10.
...

Patrik Nordwall

unread,
Jun 10, 2015, 4:27:55 AM6/10/15
to akka...@googlegroups.com
We need logs (debug level) and description of the scenario. Perhaps it is best that you create a github issue and we can discuss over there.

/Patrik

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
/Patrik

GG

unread,
Jun 15, 2015, 4:41:25 PM6/15/15
to akka...@googlegroups.com
Was there a github issue created for this? I am seeing something that looks very similar and I don't want to create a duplicate ticket if one already exists.

GG

unread,
Jun 15, 2015, 9:07:31 PM6/15/15
to akka...@googlegroups.com
A little more detail on my issue: We've found that if we simply move our leveldb out of the way, the issue goes away which seems to align with Patrik's earlier post indicating a possible problem in the persistence impl. We are currently using the leveldb plugin in native mode. There seems to be some issue during replay where a Region Actor failed to register with a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set())
at scala.Predef$.require(Predef.scala:219) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1242) ~[referrals:1.0]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) ~[referrals:1.0]
at akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168) ~[referrals:1.0]
at akka.persistence.Recovery$class.runReceive(Recovery.scala:48) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1195) ~[referrals:1.0]
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[referrals:1.0]
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[referrals:1.0]
at akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1195) ~[referrals:1.0]
at akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33) ~[referrals:1.0]
at akka.persistence.Recovery$$anon$1.processPersistent(Recovery.scala:104) ~[referrals:1.0]
at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.scala:110) ~[referrals:1.0]
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator.akka$persistence$Eventsourced$$super$aroundReceive(ClusterSharding.scala:1195) ~[referrals:1.0]
at akka.persistence.Eventsourced$$anon$1.aroundReceive(Eventsourced.scala:35) ~[referrals:1.0]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369) ~[referrals:1.0]
at akka.contrib.pattern.ShardCoordinator.aroundReceive(ClusterSharding.scala:1195) ~[referrals:1.0]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [referrals:1.0]
at akka.actor.ActorCell.invoke(ActorCell.scala:487) [referrals:1.0]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [referrals:1.0]
at akka.dispatch.Mailbox.run(Mailbox.scala:220) [referrals:1.0]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [referrals:1.0]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [referrals:1.0]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [referrals:1.0]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [referrals:1.0]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [referrals:1.0]

Also similar to Richard's post, our cluster seems to get into an infinite loop and print these errors along with some other failure message indefinitely. 

...

Patrik Nordwall

unread,
Jun 16, 2015, 4:08:22 AM6/16/15
to akka...@googlegroups.com
How did you use Leveldb journal? It can't really be used in a clustered system. It is possible to use it for demo or testing with shared journal, but that must not be used for production.

/Patrik

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

GG

unread,
Jun 16, 2015, 1:44:06 PM6/16/15
to akka...@googlegroups.com
Patrick,

Thanks for your reply. We are using leveldb in a cluster system following the SharedLevelDb store instructions in the akka persistence docs. My understanding is that it shouldn't be used in production as it's a single point of failure.  We eventually want to move to a more available storage system but for developing and testing our application, this was the fastest way to get started.
If the issue at hand can, with 100% certainty, be blamed on our usage of a shared leveldb then we can move forward and invest in another persistence implementation. If, on the other hand, it's the result of a bug in akka remoting or clustering then we'll need to dig into and resolve that issue before we can confidently use those technologies in production.

 Do you have any insight on this Patrik? If the solution is to move to another persistence layer, we're considering Cassandra, Dynamo and Kafka (in roughly that order) as our production impls. Do you have any insight into the maturity of any of those impls?

Thanks



On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
How did you use Leveldb journal? It can't really be used in a clustered system. It is possible to use it for demo or testing with shared journal, but that must not be used for production.

/Patrik

On Tue, Jun 16, 2015 at 3:07 AM, GG <gr...@makewonder.com> wrote:
A little more detail on my issue: We've found that if we simply move our leveldb out of the way, the issue goes away which seems to align with Patrik's earlier post indicating a possible problem in the persistence impl. We are currently using the leveldb plugin in native mode. There seems to be some issue during replay where a Region Actor failed to register with a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set())

Patrik Nordwall

unread,
Jun 16, 2015, 4:05:39 PM6/16/15
to akka...@googlegroups.com
On Tue, Jun 16, 2015 at 7:44 PM, GG <gr...@makewonder.com> wrote:
Patrick,

Thanks for your reply. We are using leveldb in a cluster system following the SharedLevelDb store instructions in the akka persistence docs. My understanding is that it shouldn't be used in production as it's a single point of failure.  We eventually want to move to a more available storage system but for developing and testing our application, this was the fastest way to get started.
If the issue at hand can, with 100% certainty, be blamed on our usage of a shared leveldb then we can move forward and invest in another persistence implementation. If, on the other hand, it's the result of a bug in akka remoting or clustering then we'll need to dig into and resolve that issue before we can confidently use those technologies in production.

I can't be 100% of course, and if you want me to investigate it we have to do that in the Typesafe support channel (contact in...@typesafe.com if you are not subscriber).
 

 Do you have any insight on this Patrik? If the solution is to move to another persistence layer, we're considering Cassandra, Dynamo and Kafka (in roughly that order) as our production impls. Do you have any insight into the maturity of any of those impls?

Cassandra should be a good first choice.

/Patrik
 

Thanks



On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
How did you use Leveldb journal? It can't really be used in a clustered system. It is possible to use it for demo or testing with shared journal, but that must not be used for production.

/Patrik

On Tue, Jun 16, 2015 at 3:07 AM, GG <gr...@makewonder.com> wrote:
A little more detail on my issue: We've found that if we simply move our leveldb out of the way, the issue goes away which seems to align with Patrik's earlier post indicating a possible problem in the persistence impl. We are currently using the leveldb plugin in native mode. There seems to be some issue during replay where a Region Actor failed to register with a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set())

GG

unread,
Jun 17, 2015, 12:56:16 PM6/17/15
to akka...@googlegroups.com
Alright. We'll give Cassandra a try. Thanks for the help Patrik.


On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:
On Tue, Jun 16, 2015 at 7:44 PM, GG <gr...@makewonder.com> wrote:
Patrick,

Thanks for your reply. We are using leveldb in a cluster system following the SharedLevelDb store instructions in the akka persistence docs. My understanding is that it shouldn't be used in production as it's a single point of failure.  We eventually want to move to a more available storage system but for developing and testing our application, this was the fastest way to get started.
If the issue at hand can, with 100% certainty, be blamed on our usage of a shared leveldb then we can move forward and invest in another persistence implementation. If, on the other hand, it's the result of a bug in akka remoting or clustering then we'll need to dig into and resolve that issue before we can confidently use those technologies in production.

I can't be 100% of course, and if you want me to investigate it we have to do that in the Typesafe support channel (contact in...@typesafe.com if you are not subscriber).
 

 Do you have any insight on this Patrik? If the solution is to move to another persistence layer, we're considering Cassandra, Dynamo and Kafka (in roughly that order) as our production impls. Do you have any insight into the maturity of any of those impls?

Cassandra should be a good first choice.

/Patrik
 

Thanks



On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
How did you use Leveldb journal? It can't really be used in a clustered system. It is possible to use it for demo or testing with shared journal, but that must not be used for production.

/Patrik

On Tue, Jun 16, 2015 at 3:07 AM, GG <gr...@makewonder.com> wrote:
A little more detail on my issue: We've found that if we simply move our leveldb out of the way, the issue goes away which seems to align with Patrik's earlier post indicating a possible problem in the persistence impl. We are currently using the leveldb plugin in native mode. There seems to be some issue during replay where a Region Actor failed to register with a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set())
...

Patrik Nordwall

unread,
Jun 17, 2015, 3:29:00 PM6/17/15
to akka...@googlegroups.com
You're welcome. Let me know if you see same problem with Cassandra, then I will give it more attention.
/Patrik

ons 17 jun 2015 kl. 18:56 skrev GG <gr...@makewonder.com>:
Alright. We'll give Cassandra a try. Thanks for the help Patrik.

On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:


On Tue, Jun 16, 2015 at 7:44 PM, GG <gr...@makewonder.com> wrote:
Patrick,

Thanks for your reply. We are using leveldb in a cluster system following the SharedLevelDb store instructions in the akka persistence docs. My understanding is that it shouldn't be used in production as it's a single point of failure.  We eventually want to move to a more available storage system but for developing and testing our application, this was the fastest way to get started.
If the issue at hand can, with 100% certainty, be blamed on our usage of a shared leveldb then we can move forward and invest in another persistence implementation. If, on the other hand, it's the result of a bug in akka remoting or clustering then we'll need to dig into and resolve that issue before we can confidently use those technologies in production.

I can't be 100% of course, and if you want me to investigate it we have to do that in the Typesafe support channel (contact in...@typesafe.com if you are not subscriber).
 

 Do you have any insight on this Patrik? If the solution is to move to another persistence layer, we're considering Cassandra, Dynamo and Kafka (in roughly that order) as our production impls. Do you have any insight into the maturity of any of those impls?

Cassandra should be a good first choice.

/Patrik
 

Thanks



On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
How did you use Leveldb journal? It can't really be used in a clustered system. It is possible to use it for demo or testing with shared journal, but that must not be used for production.

/Patrik

On Tue, Jun 16, 2015 at 3:07 AM, GG <gr...@makewonder.com> wrote:
A little more detail on my issue: We've found that if we simply move our leveldb out of the way, the issue goes away which seems to align with Patrik's earlier post indicating a possible problem in the persistence impl. We are currently using the leveldb plugin in native mode. There seems to be some issue during replay where a Region Actor failed to register with a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)] resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: requirement failed: Shard [57] already allocated: State(Map(67 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 12 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 23 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 68 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 57 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 69 -> Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 42 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 27 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 97 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 91 -> Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] -> Vector(), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> Vector(), Actor[akka.tcp://Cluste...@172.31.10.125:9599/user/sharding/ReferralView#-575704996] -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://Cluste...@172.31.15.250:9599/user/sharding/ReferralView#1263176985] -> Vector(12, 23, 68, 69, 57), Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> Vector(48, 53, 25, 40)),Set())
--
/Patrik

Richard Bowker

unread,
Jun 17, 2015, 3:33:07 PM6/17/15
to akka...@googlegroups.com

It's been a while! But just for reference, Patrik investigated my issue at the time and we came to the conclusion I had accidentally created two clusters writing to the same database (I had not set up my seed nodes in a resilient way as I was only doing a prototype). Once I fixed this the issue was never seen again.

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/4lBUX7N7W6k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Anders Båtstrand

unread,
Jun 23, 2015, 9:06:59 AM6/23/15
to akka...@googlegroups.com
This did happen again to me now, after a problem with the clustering
(separate thread). The nodes did not agree on who was the leader, or
the cluster size. Seems two clusters writing to the same database is
the cause of this.
>>>> Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322]
>>>> not registered: State(Map(test47 -> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 ->
>>>> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test29 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>>
>>>> test14 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test36 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test28 ->
>>>>
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43
>>>> ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>>
>>>> test32 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test20 -> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test33 ->
>>>>
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22
>>>> ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test0
>>>>
>>>> ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>>
>>>> test11 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 ->
>>>> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test37 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test9 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test23 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test45 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test8
>>>>
>>>> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test35 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test24 ->
>>>>
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2
>>>> ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map
>>>>
>>>> (Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] ->
>>>> Vector(test6, test18, test28, test20, test33, test44, test16, test34, test8,
>>>> test24, test5, test11,
>>>>
>>>> test19, test25, test30, test38, test47),
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]
>>>> -> Vector(test42, test36, test32, test15,
>>>>
>>>> test0, test41, test23, test45, test35, test2, test9, test14, test22,
>>>> test29, test37, test43)),Set(Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.24.129:49813/user/sharding/PollService#1785638107]))
>>>> java.lang.IllegalArgumentException: requirement failed: Region
>>>> Actor[akka.tcp://Cluste...@172.31.18.169:2552/user/sharding/PollService#546005322]
>>>> not registered:
>>>>
>>>> State(Map(test47 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30
>>>>
>>>> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test42 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>>
>>>> test29 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test18 -> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test14 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test36 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>>
>>>> test28 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 ->
>>>> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test32 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test20 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test15 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test33 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>>
>>>> test22 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test0 -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>> test11 ->
>>>>
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41
>>>> ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>>
>>>> test37 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test16 -> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test9 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test23 ->
>>>>
>>>>
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>>>>
>>>> test45 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test38 -> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test8 ->
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 ->
>>>> Actor
>>>>
>>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test35 ->
>>>> Actor[akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828],
>>>> test5 ->
>>>>
>>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test24
>>>> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2
>>>> -> Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981]
>>>> -> Vector(test6,
>>>>
>>>> test18, test28, test20, test33, test44, test16, test34, test8, test24,
>>>> test5, test11, test19, test25, test30, test38, test47), Actor
>>>>
>>>>
>>>> [akka.tcp://Cluste...@172.31.21.9:2552/user/sharding/PollService#1716980828]

Jim Hazen

unread,
Aug 4, 2015, 2:57:51 PM8/4/15
to Akka User List
I see this issue happen whenever AWS has a network hiccup.  I have a multi-node cluster behind a LB and akka cluster sharding along with akka persistence writing to a Dynamo journal.  I'm currently on akka 2.3.11, which means the same shared Dynamo table used to store my persistent actors is also being used to store cluster information.  I know of no way to prevent this until akka 2.4.

I have my min-nr-of-members set to (nodes / 2) + 1.  Things seem to work fine during clean node restarts and code deploys.

However, I run into the same problem as the OP when AWS suffers an intermittent network partition.  The nodes within the akka cluster can't fully communicate, yet the LB is able to reach all nodes.  Cluster state is persisted into the same location, because that's unavoidable while using akka persistence for other things.  Eventually cluster sharding gets upset and panics.  Causing the error below to be repeated constantly until the full cluster is shut down and started back up cleanly.

What should a developer do to isolate from split brained issues when using cluster sharding?  min-nr-of-members appears to only be checked during cluster startup.  However, once started and participating, what happens automatically when the cluster detects that the cluster has dropped below min-nr-of-members?  I can attempt to guard against possible issues in application land by subscribing to the cluster events and taking some action.  I'm not sure if there's anything I can do to prevent the cluster sharding internals from running into state however, since writing cluster state to a shared journal is unavoidable and network issues are unavoidable.


2015-08-02 05:33:30.138 05:33:30.138UTC [Device] ERROR akka.actor.OneForOneStrategy DeviceSvc-akka.actor.default-dispatcher-3 akka://DeviceSvc/user/sharding/UserDeviceIndexCoordinator/singleton/coordinator - requirement failed: Shard [2] already allocated: State(Map(-2 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], 0 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], 2 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], -1 -> Actor[akka.tcp://Devi...@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773], 3 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203] -> Vector(2, 3, -2, 0), Actor[akka.tcp://Devi...@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773] -> Vector(-1)),Set())

> java.lang.IllegalArgumentException: requirement failed: Shard [2] already allocated: State(Map(-2 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], 0 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], 2 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], -1 -> Actor[akka.tcp://Devi...@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773], 3 -> Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://Devi...@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203] -> Vector(2, 3, -2, 0), Actor[akka.tcp://Devi...@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773] -> Vector(-1)),Set())

>        at scala.Predef$.require(Predef.scala:219) ~[org.scala-lang.scala-library-2.11.6.jar:na]

>        at akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119) ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]

>        at akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1242) ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]

>        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) ~[org.scala-lang.scala-library-2.11.6.jar:na]

>        at akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]

>        at akka.persistence.Recovery$class.runReceive(Recovery.scala:48) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]

>        at akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1195) ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]

>        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]

>        at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]

>        at akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]

>        at akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1195) ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]

>        at akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33) ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
Reply all
Reply to author
Forward
0 new messages