Role of Coordinators in Lagom

116 views
Skip to first unread message

Joo Lee

unread,
Oct 11, 2017, 7:51:30 AM10/11/17
to Lagom Framework Users
Hi guys,

I am having some issue with the Coordinators in one of my Lagom services. Believe or not, I have lost some of the messages that belong to the certain Coordinator (/sharding/kafkaProducer-someService-xyzEventCoordinator). Not only that, the incorrect messages have been written to the messages table for this same persistence_id with the duplicated sequence number, which gave us the error messages like the following:


Invalid replayed event [sequenceNr=9, writerUUID=b363c43e-8988-4b11-a141-9b2e219a7956]. There was already a newer writer whose last replayed event was [sequenceNr=8, writerUUID=xxxxxx5472-fe40-4c14-9f62-ee6796c45a20] for the same persistenceId [/sharding/xyzProcessorCoordinator].Perhaps, the old writer kept journaling messages after the new writer created, or duplicate persistentId for different entities?


My understanding of Coordinator is that it is responsible for processing event streams of the certain type for some kind of writer / producer. 

I have three questions:

1. What is really xyzCoordinators doing in Lagom?
2. Are they important enough to be using PersistentEntity? Why Lagom persist their events in the journal for?
(Most Important) 3. What would be implications / side effects if we just delete all the messages that belong to this particular coordinator that we are having this problem with from the messages table and restart the service?

I am using Cassandra for the persistence.


Many Thanks,

oo

Joo Lee

unread,
Oct 11, 2017, 7:57:54 AM10/11/17
to Lagom Framework Users
In fact, just realized that I have NOT lost any messages. I just thought I have lost the messages only because I saw the very small amount of messages for particular coordinator persistence_id, but in fact the message just stopped at the last day when I deployed that service.

Still, wondering what's the role of coordinator and if it is safe to delete the messages.

Thanks,

Daniel Stoner

unread,
Oct 12, 2017, 3:41:42 AM10/12/17
to Lagom Framework Users
This could be leading you down the wrong path - but my understanding from Akka persistent sharding, is that the co-ordinator messages indicate what the state of the cluster is over time. For instance whenever a node goes up/down and it effects that persistenceId (e.g. forces it to move to another node) then this is stored. It is utilised so that on startup it can be understood 'Oh wait a minute, node 6 was handling this persistenceId and it never said it had stopped doing so - I better check with node 6 that it isn't still handling it' [Or in reality, keel over complaining about the situation node 6 left for it]).

We only cleared out the co-ordinator messages any time we completely stopped the service for downtime and started a fresh cluster. This was not uncommon as in the version of Akka we used, it liked to corrupt its co-ordinator messages a fair bit.
If you always keep your service 'up' on at least 1 node, then you shouldn't delete this data or you definitely can end up with the same persistenceId on many nodes as we did many times accidentally when we went about clearing things.

Thanks,
Dan
Reply all
Reply to author
Forward
Message has been deleted
0 new messages