Need advice on using (or not) eventual consistency

212 views
Skip to first unread message

Werner Clausen

unread,
Jan 23, 2012, 7:31:11 AM1/23/12
to ddd...@googlegroups.com
Hi,
.
Calling for some advice on how to interpret a customers reaction to CQRS-ES discussion.

One of our fairly large customers are in the need for a change. A system needs complete rewrite. Our expertise is reservation systems, and in particular this customers domain, which is that of car rental systems. The domain is rather complex, and one of the future requirements is a very responsive UI and better grounds for expanding the domain. Their current system is based on one database where somethings just can't be done during daytime because of database locking etc. There are a lot of common tables being pulled and updated at the same time; both internally and from external partners using different channels. The talk with the customer soon evolved around CQRS.
.
But today on a meeting, one of their domain experts told us, that consistency is of great importance and he was reluctant to embrase the eventual consistency inherited by CQRS. For example, when a new reservation is created, it must not take 5 minutes before the availability lists are updated. And it is not acceptable that it might take 5 minutes before that reservation is available in some statistic print. A reservation could cost as much as $50.000 (covering air, car and accomodation). If that reservation is created at time = t, it is not acceptable that is isn't on some accounting list pulled at time = t + 5 min. (1 minute would be considered alright, but certainly not 5 minutes).
Our experience in CQRS, is that 5 minutes really is a long time, and any read-cache update that is more than 1 minute behind, is probably due to some error. But from a design point of view, it is not possible to guarantee that it won't take 5 minutes. We do not have enough experience using CQRS in such a large installation, so my question here is: Should we interpret their demands in such way, that CQRS isn't something for them? Regardless of all the goodies it brings to other concerns? I guess we can't be the first to see some reserved attitude towards eventual consistency...Is the above reaction something you see a lot when trying to define what "eventual" really means? From my brief intro here, how would you (if possible at all) recommend our next step?
.  
Werner
 

@yreynhout

unread,
Jan 23, 2012, 8:13:11 AM1/23/12
to DDD/CQRS
A few things come to mind, including
- running the availability/accounting list denormalizer/projection
inside the same tx. This has other disadvantages, but at least you'd
be taking this decision consciously.
- what they use the availability/accounting lists for (e.g. some
process using this list as input) ... digging deeper into the why.
- how do they know the list is "up to date" today? And do the
advantages outweigh the disadvantages?
- not talking about eventual consistency as a term at all (they hear
inconsistency which is an entirely different thing). The discussion is
better oriented towards getting SLAs and important/not important
ratio's right (e.g. You: "If role A does X, what does that have an
effect on?" Them: "Y." You: "When does this *change* need to be
reflected in Y after X happened?" etc.).

Regards,
Yves.
> Our experience in CQRS, is that 5 minutes really *is* a long time, and any

Nils Kilden-Pedersen

unread,
Jan 23, 2012, 10:03:30 AM1/23/12
to ddd...@googlegroups.com
On Mon, Jan 23, 2012 at 6:31 AM, Werner Clausen <item...@hotmail.com> wrote:
Our experience in CQRS, is that 5 minutes really is a long time

I would imagine that in most systems, you're talking ms latencies. However, IMO, it's not the average time that's interesting with eventual consistency, but the possibility of some event listener process being unavailable, which can lead to much more than 5 minute delays.
 
, and any read-cache update that is more than 1 minute behind, is probably due to some error. But from a design point of view, it is not possible to guarantee that it won't take 5 minutes.

Right, that's an architectural concern, so you need a properly architected system to minimize the chances of a failure in one of your denormalizer/read-model event listener processes.
 
We do not have enough experience using CQRS in such a large installation, so my question here is: Should we interpret their demands in such way, that CQRS isn't something for them?

Quite the contrary. It sounds like they have problems scaling with the current system.
 
Regardless of all the goodies it brings to other concerns? I guess we can't be the first to see some reserved attitude towards eventual consistency...Is the above reaction something you see a lot when trying to define what "eventual" really means? From my brief intro here, how would you (if possible at all) recommend our next step?

It sounds like you've already been told that anything less than a minute is acceptable. That's a nice wide margin for the average/median/non-failure case, so they seem to be on board with the concept already. Now all you have to do is find out the cost of an unacceptable delay (> 1 minute) and architect the system accordingly.

Peter Ritchie

unread,
Jan 23, 2012, 11:07:32 AM1/23/12
to ddd...@googlegroups.com
One of my favourite discussions RE statistics is to ask a stakeholder to describe what they do when the do "reporting".  The conversation usually goes something like "I generate the report, print it off, then read it later".  This is usually a good segue into not needing a whole lot of horsepower to get what they view as real-time statistics... YMMV

If they're really dead-set against data latency, they're probably looking for notifications about the data.  They  can't possibly report, consume, and act on "real-time" data within 5 minutes; it's futile to try and keep them in that scenario.  See if what they're really looking for is close-to-real-time notifications.

Apart from that, you don't have to have *all* data consistent, if you break down the data and come up with a sub-set that should be more close to real-time consistent (with the rest being more EC) then you can probably alleviate some of the pressure on the system.  

Cheers -- Peter

Julian Dominguez

unread,
Jan 23, 2012, 1:44:53 PM1/23/12
to ddd...@googlegroups.com
Or, if they are ok with some EC, which by your description, they are,
you can put in place a mechanism where the denormalizer periodically
notifies that it caught up reading the messages. If this notification
does not come after a minute for example, then it might mean that the
denormalizer is down and you need to degrade the SLA (which in the
traditional architecture you would already be in problems as soon as
the app starts missbehaving or in high demand).
Also, you can immediately spin up new instances of the denormalizer if
you are hosting the app in a cloud, and keep up with the required SLA.

Julian

Sent from my Windows Phone

From: Peter Ritchie
Sent: 1/23/2012 8:07 AM
To: ddd...@googlegroups.com
Subject: Re: [DDD/CQRS] Need advice on using (or not) eventual consistency

Werner Clausen

unread,
Jan 25, 2012, 3:24:02 AM1/25/12
to ddd...@googlegroups.com
@All,
 
Thanks a bunch for all your answers. I printed them all and we discussed it internally. Yesterday evening and again this morning, we had a session with the customer and the project is back on the right track again. And that talk led to some interesting facts about their way of doing things. In the end it looks like that major fear of eventual consistency is in reality just concerns about a few very specific issues. Issues we can handle and deal with quite easily I think. But as Yves suggested, I'll probably never use the words "eventual consistency" to anyone non-technical again :)
 
Your advices and answers are really appreciated. Hopefully, some day I'll be able to provide something back to this community.
 
Werner
 

Greg Young

unread,
Jan 25, 2012, 3:49:05 AM1/25/12
to ddd...@googlegroups.com
This is more often than not the case. Usually the process leads to the
discovery of many domain concepts along the way... You can handle
making things fully consistent on a specific use case if needed but
generally there should be a business understanding (making explicit)
of this as well. A canonical example of this would be ordering concert
tickets through ticketmaster where individual seats enter into a
reserved state for a period of time (aka reservation pattern, aka
pessimistic locking)

--
Le doute n'est pas une condition agréable, mais la certitude est absurde.

Werner Clausen

unread,
Jan 26, 2012, 2:59:06 AM1/26/12
to ddd...@googlegroups.com
Very true. The most interesting thing for me at the moment, is that
the business concerns the customer has/had - as you say - are true domain
concepts that really should be addressed. Most likely, these issues would never 
reach the surface in and "old school" approach. This is somewhat an eye opener
as this is the first time I see this effect up close. Very interesting...
 
Thanks. 
 
Werner
Reply all
Reply to author
Forward
0 new messages