Merging Events from Occasionally-connected Clients

1,719 views
Skip to first unread message

@nbarnwell

unread,
Apr 24, 2012, 12:32:17 PM4/24/12
to ddd...@googlegroups.com
I have a requirement for a "smart client" app using web services to talk to servers. It's a fairly complex collaborative domain involving many other systems so having thought it through we believe a DDD/CQRS/ES style approach is suitable. I've attended Greg's excellent 3-day course but am now a little fuzzy on the details of synchronising occasionally-connected clients. I understand the fundamental concept of "merging" event streams, but have questions on where and how that happens.

Currently my plan is keep the Domain logic on the server and send commands synchronously from the client via WCF to have them handled, whereupon the events would come back one way or another. However this means the application would have to be "read only" when offline for some reason (network connection, server issues etc) and of course there is a performance hit on the network round-trip. As such I'd like to overcome my fears over implementing a full disconnectable system with event merging etc.

I've today watched (twice) Greg's video "In The Brain of Greg Young: CQRS, not just for server systems" and so now I think I can formulate a couple of specific questions:

Resolving conflicting events
Since the essence of events is that they happened and can't be denied, if a client receives events from other clients (i.e. via the "store and forward" subscription-based approach Greg mentions in his video) that conflict with events the client has already stored locally, it's not possible for the client to resolve the conflict by deleting "losing" events. So what can we do? Users of both clients will have gone about their day assuming their commands were processed successfully (they were, after all). From the video it looks like we build some way of discovering and displaying a UI for the commands that caused the conflicts with incoming events, but we can't delete the events and try again with a different command, we surely have to expect the user to issue some sort of compensating command? Meanwhile, now we're reconnected our events are making their way out to other clients, which leads me to...

Two clients receive each other's events and resolve at the same time, causing more conflicts
How might we handle/prevent the situation where a client comes online, starts seeing conflict notifications in their UI and starts resolving them, meanwhile commands they have already stored locally are being sent up to the server and propagated to those other clients, who will themselves see conflicts? I'm thinking that upon connectivity being restored we either:
  1. Find commands that caused conflicts (and any subsequent commands for the same aggregate roots) to the server until their conflicts are resolved.
  2. Have some sort of algorithm that means the "other" clients know not to display conflicts based on timestamps or event IDs or somesuch, resulting in only the client causing the conflicts to have to deal with them, though their commands still get sent up.
Resolving conflicts on the server
My last question (for now) is on how to resolve conflicts that happen on the server. Depending on the answers to question 2 it's possible this never happens because conflicts are resolved before those commands are sent to the server or the server simply ignores them, but if that isn't the case, how do users resolve these conflicts?

Many thanks in advance.

Neil.

Henrik Feldt

unread,
Apr 24, 2012, 2:34:40 PM4/24/12
to ddd...@googlegroups.com
[1] The client won't be able to delete events that it received from the message broker, but it can perhaps merge them by altering what commands it wants applied. You'll have two cases; the client's commands were disjoint and their resulting runs/uncommitted events are non-conflicting. So, [2] you can now send the commands to the server for processing (which might fail as well, but that's another issue). But you're only sending commands where you know from a domain standpoint, that their runs/resulting events won't conflict.

Formulated differently, you wouldn't send the chunk of the event stream that was created offline inside the client, as authoritarian events to other subscribers, but instead send their respective causing commands to the server, which would in turn publish the same chunk of events that you have computed offline (because your business logic is deterministic).

Another case, is when you receive a batch of new events upon restoring connectivity, you consider the above, and then you send what commands (meaning to create a resulting batch of events A as a result of their runs) you are interested in executing to the server; but concurrently as you do, conflicting events B happen; in this case the server would perform the same check for disjointedness for all events in A and B; those events that conflict would cause the same feedback to the user that you'd have locally upon connectivity restoration.

Your job is to excavate enough of your domain to know what events are conflicting, and to what extent commands sourced from later or earlier histories are allowed to be applied to the domain.

Example;

You are a GP; as you exit the building on your way to a patient you download their journal (i.e. all of their medical history), but before you get to see the patient, another doctor subscribes new medicine B for illness B_1, so now you are acting on stale information.

You meet with the patient and decide on medicine C for illness C_1; a recommendation that makes sense given that you found no medication interactions in this patient's journal. However, because the Chemistry moves in mysterious pathways, B and C would interact in a bad way.

You get back to your office and your client start synchronizing its state with the server -- running the comparison check for each of the MedicationSubscribed events -- and because it finds *any* other medication subscribed for your patient, it knows that the medication you subscribed needs to be manually checked for possible negative interactions. Your client has not sent its events (that it used to update its UI and app state) to the server, so for each of the conflicting events, it queries you by showing what command (subscription to the patient) result in a conflict with the server's MedicationSubscribed event. You notice the interaction and subscribe a different compound with another active substance as a result, and your SubscribeMedicine command is successfully applied to your domain.

In the future, as weak AI using credulous reasoning becomes a part of your domain model your system becomes more intelligent about querying you and doesn't query when it knows your recommendations wouldn't interact with what happened when you were offline.

On Tue, Apr 24, 2012 at 6:32 PM, @nbarnwell <mai...@neilbarnwell.co.uk> wrote:
I have a requirement for a "smart client" app using web services to talk to servers. It's a fairly complex collaborative domain involving many other systems so having thought it through we believe a DDD/CQRS/ES style approach is suitable. I've attended Greg's excellent 3-day course but am now a little fuzzy on the details of synchronising occasionally-connected clients. I understand the fundamental concept of "merging" event streams, but have questions on where and how that happens.

Currently my plan is keep the Domain logic on the server and send commands synchronously from the client via WCF to have them handled, whereupon the events would come back one way or another. However this means the application would have to be "read only" when offline for some reason (network connection, server issues etc) and of course there is a performance hit on the network round-trip. As such I'd like to overcome my fears over implementing a full disconnectable system with event merging etc.

I've today watched (twice) Greg's video "In The Brain of Greg Young: CQRS, not just for server systems" and so now I think I can formulate a couple of specific questions:

Resolving conflicting events
Since the essence of events is that they happened and can't be denied, if a client receives events from other clients (i.e. via the "store and forward" subscription-based approach Greg mentions in his video) that conflict with events the client has already stored locally, it's not possible for the client to resolve the conflict by deleting "losing" events.

[1]
 
So what can we do? Users of both clients will have gone about their day assuming their commands were processed successfully (they were, after all). From the video it looks like we build some way of discovering and displaying a UI for the commands that caused the conflicts with incoming events, but we can't delete the events and try again with a different command, we surely have to expect the user to issue some sort of compensating command? Meanwhile, now we're reconnected our events are making their way out to other clients, which leads me to...

[2]
 

gregor...@gmail.com

unread,
Apr 25, 2012, 1:48:49 AM4/25/12
to ddd...@googlegroups.com, ddd...@googlegroups.com
This example is a bit off.

When the gp is disconnected he will not be batching commands to send but events to send to the server. This is a system that is interacting with the real world. The backed must accept that this has been done (it may cause another authoratative business event like "possibility do adverse reaction found" but it must accept it. This is a small but important distinction. Most systems involving the real world work this way.

Sent from my iPad

Henrik Feldt

unread,
Apr 25, 2012, 2:12:44 AM4/25/12
to ddd...@googlegroups.com

So how would the event versions be and how would the server handle the merge of the example above?

Greg Young

unread,
Apr 25, 2012, 2:19:39 AM4/25/12
to ddd...@googlegroups.com
server must accept that the thing happened in the real world. It would be a downstream event processor (think SEP/CEP). It woud raise another event that there is a patient with a possible adverse reaction happening. Its not a command any more because the server *must* accept it.
--
Le doute n'est pas une condition agréable, mais la certitude est absurde.

Henrik Feldt

unread,
Apr 25, 2012, 4:38:40 AM4/25/12
to ddd...@googlegroups.com
Are you saying the server would be a downstream to the client-authority in this case?

How would you handle the versioning of the events? If all of your events are state deltas, and the AR has already progressed beyond what the events from the clients are aimed at - how would you renumber the events and what numbers would you send the events to the server with?

@nbarnwell

unread,
Apr 25, 2012, 5:43:57 AM4/25/12
to ddd...@googlegroups.com
Thanks guys, I think these answers combined answer the questions I had very well.

In my design I had only planned for concurrency checks at the point the events generated by a domain object are saved (i.e. after generating but before saving them, check if there are any newer events since the "expected version" and do conflict checking against them, throwing the RealConcurrencyException if there are. It sounds though like I need another place to call into concurrency checking in my system - i.e. when events turn up from a remote source (either as they are received by the server or the client).

This would mean that as each event turns up at the client, see if it conflicts with any local events and display a UI of the commands that raised the conflicting local events. Do a similar thing at the server - as events arrive from clients, check to see if they conflict with events the server already has.

Nils Kilden-Pedersen

unread,
Apr 25, 2012, 8:52:54 AM4/25/12
to ddd...@googlegroups.com
On Wed, Apr 25, 2012 at 1:19 AM, Greg Young <gregor...@gmail.com> wrote:
server must accept that the thing happened in the real world. It would be a downstream event processor (think SEP/CEP). It woud raise another event that there is a patient with a possible adverse reaction happening. Its not a command any more because the server *must* accept it.

If the disconnected client doesn't have access to the central repository, how can this be done? Isn't this essentially saying that there can be no conflict resolution, instead last man (or rather disconnected man) wins?

Greg Young

unread,
Apr 25, 2012, 10:44:01 AM4/25/12
to ddd...@googlegroups.com
People seem to be getting very confused in many of these scenarios.

Sometimes there is a command.
Sometimes there is not (client raises events).

Using the Dr example earlier would your preference be to tell the Dr to jump into his time machine and unprescribe the medication he prescribed when he was not connected?! No it must be that this really happened and the system may not like it (may raise an event saying it was bad) but must accept that it occurred.

I like to use the example of a warehouse. I checked out some stuff you didn't know was checked in yet... Does that mean the boxes didn't leave?

Greg

@nbarnwell

unread,
Apr 25, 2012, 11:29:16 AM4/25/12
to ddd...@googlegroups.com
Nope, no confusion - I understand and agree 100%. :)

I'm personally just a bit concerned by the elephant in the room - the actual conflict detection process. For example:

1) Assuming no conflicts are found, where in the "final" event stream do "my" events go, relative to "theirs"? Do I append, try and use timestamps (eek!) to sequence them correctly? I presume I can't really use version numbers because "my" events and "their" events will have the same version numbers.
2) It is entirely possible that while "my" event conflicts with the first of "their" events, there may be another event coming that inherently resolves the conflict. In that scenario I have to check each of "my" events with *all* of theirs and keep an eye on whether any potential conflicts are resolved by subsequent events of "theirs", rather than throwing a concurrency exception immediately upon detection.

Are there any popular solutions to this problem? I've tried TDD'ing (I'm not great at TDD yet but I'm trying) the interface for a IConflictResolver which currently stands at "FindConflicts(IEnumerable<Event> common, Event mine, IEnumerable<Event> theirs)", but I'm still unsure what the implementation should look like. I guess I could just press on with TDDing something that works for me (i.e. my tests pass) but it does feel like some clever bloke somewhere might have solved this problem already.

Many thanks in advance.

Henrik Feldt

unread,
Apr 27, 2012, 2:11:21 AM4/27/12
to ddd...@googlegroups.com

It would be awesome to get a reply from Greg on this! :)

Greg Young

unread,
Apr 27, 2012, 2:22:15 AM4/27/12
to ddd...@googlegroups.com
The issue here is that you are considering yourself to be book of record instead of a down stream event processor. In the cae where clients raise events *they* are the book of record. Most down stream event processors I have dealt with do not use event sourcing (this is likely where the issue is coming to play). They store small amounts of state and basically do CEP off the event stream (often times with state machines). In the case where they say "hey I said I disagree with it now I agree with it." yep it can happen. 

These systems are not as tricky to build in many scenarios as people make them out to be if the rule is followed (and businesses can generally understand why). The system makes the best decisions that it can given the information that it has. 

If you want to be book of record make the clients send up commands.

Let's go through a quick example:

Delivery driver syncs and says he delivered a package. (right now we think its in the warehouse). That's ok.
Later we get a sync saying it left the warehouse. 

Very often we also put timeouts on this kind of stuff (we expect it within 24 hours or else raise an event).

If however you got Driver A delivered package and driver B delivered package you raise up a duplicate delivery event. 

These types of systems are tricky to build but have their places in terms of systems. 

Kliment Mamykin

unread,
Jun 12, 2013, 12:51:16 AM6/12/13
to ddd...@googlegroups.com
The distinction Greg makes b/w the mobile app being the book of record may be clarified once you separate the aggregates of the domain. For a GP example (I have little knowledge of the actual domain, but we all go to the doctor dont we:), it may be that the mobile app domain is all about Visits to patients. Doctor records a new visit and prescribes medications. There is no chance for a concurrency exception, doctor is the only one working on that visit. All changes in the app are events, its about the visits and can't/shouldn't be "declined" by the server.  Server here acts as a downstream processor, may be even a separate BC. Server processes visit events and builds a Patient Treatment Plan (or something like that) aggregate, which detects conflicting medications and raises ConflictMedicationDetectedEvent.
The mobile client may even subscribe to the PatentTreatmentPlan events and rebuild a read model on the client, to help detect potential conflicts earlier.
Reply all
Reply to author
Forward
0 new messages