Problems with Persistence when rolling nodes.

48 views
Skip to first unread message

kraythe

unread,
Nov 18, 2016, 11:49:54 AM11/18/16
to Akka User List
Greetings, I am currently using the community plugin for mysql ("com.github.mauricio" %% "mysql-async" % "0.2.16")  for AKKA persistence. However, when I perform a rolling restart I get exceptions like the following: 

Nov 18 10:11:20 vtest-app01 application-9001.log:  2016-11-18 16:11:19 +0000 - [ERROR] - [PersistentShardCoordinator] akka.tcp://a...@10.77.21.34:2551/system/sharding/UserActivityActorCoordinator/singleton/coordinator -  Failed to persist event type [akka.cluster.sharding.ShardCoordinator$Internal$ShardRegionRegistered] with sequence number [837] for persistenceId [/sharding/UserActivityActorCoordinator].
Nov 18 10:11:20 vtest-app01 application-9001.log:  com.github.mauricio.async.db.mysql.exceptions.MySQLException: Error 1062 - #23000 - Duplicate entry '3-837' for key 'PRIMARY'
Nov 18 10:11:20 vtest-app01 application-9001.log:   at com.github.mauricio.async.db.mysql.MySQLConnection.onError(MySQLConnection.scala:124)
Nov 18 10:11:20 vtest-app01 application-9001.log:   at com.github.mauricio.async.db.mysql.codec.MySQLConnectionHandler.channelRead0(MySQLConnectionHandler.scala:105)
Nov 18 10:11:20 vtest-app01 application-9001.log:   at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)

Now I grant that this may be because of the persistence plugin having issues but I am wondering if there is something else I am doing wrong in the shutdown. The steps I used to shut down the rolling node are: 

1. I send each shard region a graceful shutdown instance and wait for them all to terminate
2. send all the top level actors a PoisonPill
3. issue a cluster leave and wait for the member to be removed 
4. After a 10 second delay terminate the actor system. 

Is there another persistence plugin that would potentially suit my needs better? I would prefer to store the journal and snapshots in our main SQL based RDBMS if possible but I am open to options. 

Thanks

Roland Kuhn

unread,
Nov 18, 2016, 11:58:57 AM11/18/16
to akka-user
Hi Robert,

I cannot comment on whether mysql-async has issues, but assuming that it does not this would point towards having two actors with the quoted persistenceId active at the same time.

Regards,

Roland

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

kraythe

unread,
Nov 18, 2016, 12:29:05 PM11/18/16
to Akka User List
Roland,

I am wondering how that can happen. When the new node comes in I was under the impression a rebalance would happen with cluster sharding and the actors would be shut down on the old node and started on the new node. Do I have to do something specific other than a cluster.join() to make this happen? Your advice is much appreciated.

-- Robert

Konrad Malawski

unread,
Nov 18, 2016, 12:31:45 PM11/18/16
to akka...@googlegroups.com, kraythe
Depends if you down them or not.
Please read up on downing (split brain resolver docs are also a great read),
don't use auto-downing but do use cluster.leave() or .down() to down nodes,
then balancing happens.

-- 
Konrad `ktoso` Malawski
Akka @ Lightbend

kraythe

unread,
Nov 18, 2016, 12:35:03 PM11/18/16
to Akka User List, kra...@gmail.com
Indeed I issue a cluster.leave() when the node leaves and I wait for the MemberRemoved before terminating the actor system. Alas I don't have access to the SBR because we are not a commercial customer and can't be until our product starts showing revenue. A snippet of the receive loop is below: 

@Override
public void onReceive(final Object message) throws Throwable {
   
if (message instanceof SignalMessages && SignalMessages.BEGIN_SHUTDOWN.equals(message)) {
       
shardRegions.forEach(ref -> {
            context
().watch(ref);
           
ref.tell(ShardRegion.gracefulShutdownInstance(), getSelf());
       
});
   
} else if (message instanceof Terminated && !shardRegions.isEmpty()) {
       
shardRegions.remove(((Terminated) message).getActor());
       
if (!shardRegions.isEmpty()) return;
       
actors.forEach(actorSelection -> actorSelection.tell(PoisonPill.getInstance(), getSelf()));
       
log.info("Actors Shut down.");
       
cluster.leave(cluster.selfAddress());
   
} else if (message instanceof MemberRemoved) {
       
// wait 10 seconds for graceful stop.
        context().system().scheduler().scheduleOnce(Duration.create(10, SECONDS), self(),
               
SignalMessages.STOP_ACTOR_SYSTEM, context().dispatcher(), self());
   
} else if (message instanceof SignalMessages && SignalMessages.STOP_ACTOR_SYSTEM.equals(message)) {
       
system.terminate();
       
future.complete(null);
   
} else {
        unhandled
(message);
   
}
}



-- Robert
Reply all
Reply to author
Forward
0 new messages