Using:* Akka 2.4.6 project* with cluster sharding* Cassandra persistence* Reactive Kafka ConsumerSetup:* 5 nodes* 4 nodes consume from kafka* 1 node in standby* each node has cluster persistence and persistence for two other kind of entities* one entity is setup for 256 shard per node* remember entities is on* subscribing to and logging LeaderChanged cluster messagesWhen performing a rolling release each node* is brought down,* project upgraded* started back up* waits 5min before moving to the next nodeThe situation is that when the leader node is restarted, leadership goes from Node 1 to Node 2 and around 10s later leadership goes back to Node 1 again.
Can anybody tell me if this behaviour affects sharding hand-off/rebalancing?
Is there a way to keep leadership from flopping back and forth?The whole cluster slows downs during this situation and I'm concerned it might get worse as the load grows.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
Patrik Nordwall
Akka Tech Lead
Lightbend - Reactive apps on the JVM
Twitter: @patriknw
Apologies, that last comment was a bit ambiguous.The slow down might be the result of quite a few moving pieces, not necessarily cluster sharding related - but that's what I'm trying to find out :)Each node reads from kafka (dedicated partition), fetches some intermediary data from cassandra, passes the data to it's pertaining shard and gets persisted to cassandra.So each node keeps metrics on the current committed offset that was consumed and when the leader node is restarted, the offset metrics from the remaining nodes don't move forward ("kafka lag") until the leader node resumed it's activity.This can also be caused by kafka consumer rebalance, although I did not observe the same amount of kafka lag when another node is restarted.I suspect that the leader node is also the coordinator node (by coincidence) hence my previous assumption that there was a relation between the too, leading me to believe that the shard handoff is in limbo.
Is it possible to subscribe to sharding related messages just like cluster messages? That might help me figure out what's happening.