What is your favorite solution for Eventual Consistency and Race Conditions?

1,186 views
Skip to first unread message

Andrew Holm

unread,
Jun 23, 2016, 12:52:37 PM6/23/16
to DDD/CQRS
So I have been learning CQRS for the past 6 months and I noticed that one of the challenges with implementing CQRS in your system is dealing with eventual consistency and race conditions.

Scenario: You are on a page in your UI with a form and a submit button. When you click submit a web request hits your web api layer fires off a command asynchronously and returns a 200 or 202 to the client. The command will do some work and eventually update Data A  to be Data B. On your UI you get back the 200/202 response and you route to the next page and query your Data on the read side to display or do some calculation. You are expecting Data B to show up but it turns out it is still Data A. If you refresh the page and it queries the Data Field and it still shows Data A, it takes you back to your form to resubmit the information you already submitted (it is currently being processed but hasn't completed and updated the field).

I am just curious at what other people use to solve this problem the following solutions I have found include:

1.  Web sockets/ signal R approach - where after your Data field has been updated fire off a signal R event with Data B as the value or simply send an event saying "B is done processing, requery".
       Cons: I am new to signal R but from what I have heard it is not reliable with losing connections and no message retry, messages can be lost and the user will be stuck waiting for an event that will never make it back to the client.
2. Polling - creating some kind of state in the UI to requery until Data A turns into Data B then displaying or calculating or continuing on with your workflow.
       Cons: Creating state on the UI means that it is limited to that device. If the user refreshes the page, or switches devices your polling mechanism is gone and you could still find yourself in an invalid state.
3. Arbitrary Delay- after sending the web request you try to buy time by throwing a spinner up for 30 seconds, or sending them to a congratulations page hoping that the command completes in that window of time.
       Cons: You run a risk of still having a race condition depending on the speed of your server/ connection. On the other hand you could have the delay too long and hurt your user experience.
4. Versioning Data - You could implement some kind of version number to the data so that {Data= A, Version = 1}. Then after you submit if you query the Data field and the version is still 1 you know your command hasn't gone through yet and you don't redirect your user back to the form.
      Cons:  You would still have to know when to re-query using one of the above methods.
5. Flag in Local storage - You could do somewhat of a hack and write a flag to local storage saying hey this user has submitted the form if they refresh dont navigate back to the form keep moving on and just trust that it will be successful. You could put an expiration date on the flag so that it doesn't last forever. This would work if the user refreshed the page.
      Cons: The flag would either last forever or have an expiration date in which case it is just an uglier version of Arbitrary Delay. Not to mention you lose your ability to manage if the form command failed and it still wouldn't work if they switched devices.
6. Synchronous server - You COULD not saying you should but you could have your web api wait synchrounously for the command to finish and listen for the event before returning the 200.
      Cons: This blocks up threads on your server and can cause some major performance issues.
7. Combo - Currently at my company we use a few of these together. We have congratulation/verification screens to delay the user, we also have signal R with a fall back route and sometimes fallback polling in case signal R fails. We haven't gone into production though so I am not sure how our app will handle high volume/ slow connections.
8. ?

I am really interested in how other people have solved this issue, I am sure there is some kind of better UI workflow pattern that would probably avoid a lot of the pain points behind eventual consistency I would love to hear about that or any other solutions people have found for this very common problem. If you have found specific libraries that facilitate your solution I would also like to hear about them as well. Also if I have misrepresented one of the solutions above or missed some key to making them more viable please do correct me.

Thanks!
        

Michael Yeaney

unread,
Jun 23, 2016, 2:06:09 PM6/23/16
to ddd...@googlegroups.com
Just a few thoughts, but ones I've used successfully over time when working with eventual consistent systems (not necessarily CQRS-specific).

1. Stop re-querying the raw datastore every time for every UI action/view. Instead, implement read-caching + workflow states indicating the form data is in-processing for the current user and tell them that (put it in a "pending update" state).  Note the "current-user" part...this is half-way implementing weak session consistency, whereby the user who submitted the change sees the status of "pending", but all other users see the old data until it's updated.

2. Assume the change will work, and "fake" the UI update for that user (again, sort-of-weak-session-consistency). This depends greatly on the data in question, but is used quite often in apps like Gmail (reports success immediately, but may report an error in a few seconds/minutes and restore the UI).

These patterns rely on the fact that (rarely) is there a actual "human" requirement for two users to see exact data changes at the exact instant on different devices. Coupling these patterns with optimistic concurrency checks can give some very nice results, but as you pointed out the domain needs to be tolerant of the workflow.

I'm sure there are other ways to look at this that solidify it into CQRS terms (such as firing domain events to update the "pending" state to "completed", etc.), but the basic ideas are still the same.

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Johnny KNOBLAUCH

unread,
Jun 25, 2016, 6:00:18 PM6/25/16
to ddd...@googlegroups.com
Hi everyone,

Thanks for your question!
This question I'm asking a lot of time, I will have to find a solution soon for my own project ;-)
From my theorical point of view:

1) the command MUST be synchronously treated to allow the domain to validate and reject it. So that the UI will be directly informed in the API response! Otherwise it's an event.
2) the write API can respond more information than ok or not ok. Give the error description for example.
3) use read API using the read model to search data.

That's all I know... I hope it will be useful ;-)

Happy coding !
Johnny

Ben Kloosterman

unread,
Jun 25, 2016, 8:36:40 PM6/25/16
to ddd...@googlegroups.com
IMHO Commands should block and you need a very good argument to make them async so hold the 200 request till it completes  ( no difference to all the other XML services there) for a full synch domain you dont even have to wait for the event . It also solves the weird threading issues you need to handle when an object is processing 2 commands at once ( locking is likely to give worse performance) .   Re your point 6 an asynch await / modern web server would not hold a  thread .if your domain uses caching ( you need an identity map cache!)   it will be fast , if you go to  > 10K per second transactions then look at call backs for IO in the domain / in memory domains..(As an example of how fast you can get on 1 thread synch  look at the Lmax work) 

If needed Get the Facade / service to validate the result and command via query . I dont think doing asynch posts to a domain direct  via http is a good idea.

Ben

Peter Hageus

unread,
Jun 26, 2016, 5:58:30 AM6/26/16
to ddd...@googlegroups.com
It's already been said, but commands really should be synchronous.

My favourite pattern för eventually consistent read models atm is returning a versionnumber/position in the command result. Then pass that along with the query, and block until the projection is at the requested version. Depending on your underlying storage and architecture, this could be done very efficiently (waithandles) or by polling. It’s rarely needed, but handy in same cases.

/Peter

Reply all
Reply to author
Forward
0 new messages