Should I use saga pattern to deal with replication lag across multiple services in CQRS?

92 views
Skip to first unread message

Jocki Hireplace

unread,
Jun 13, 2020, 3:36:32 PM6/13/20
to microservices

I tried to apply microservice and CQRS to a system that contains services like candidate service, interviewer service, scheduler service (to manage the appointment), and others. When a new candidate is joining the room, the command handler will raise CandidateAssignedToRoomEvent. Upon receiving this event, candidate service, interviewer service, scheduler service, and other services will create new replicate in their own database. Then, it is usually followed by CandidateCalledEvent raised by scheduler to start the interview.


While it is working fine so far in development, I found that in production, sometimes CandidateCalledEvent is raised so fast that almost immediately after CandidateAssignedToRoomEvent. Since the services that process CandidateCalledEvent require the replicate (by CandidateAssignedToRoomEvent) to be ready in their own database, in rare cases (like 10% of total requests), the handler for CandidateCalledEvent failed. They failed because CandidateAssignedToRoomEvent handler haven't written the replicate in their private database yet. Those failed event handlers are then re-executed (retried). It usually executed successfully after one or two retries.


Should I implement Saga pattern for this kind of case? I feel that event like InitialCandidateHasBeenReplicatedEvent doesn't represent the business domain and making the communication more complex because I've to wait for all services to raise this event before CandidateCalledEvent can be raised.  But I also don't like the error alert from monitoring tool (because this is not an "actual" error anyway). Should I just leave it to depend on the retries?

Reply all
Reply to author
Forward
0 new messages