CQRS/ES: Mitigate risk that user makes decisions based on stale views?

118 views
Skip to first unread message

urbanhusky

unread,
Jun 30, 2016, 7:45:35 AM6/30/16
to DDD/CQRS
One of the advantages of an event sourced system, which is often emphasised, is being able to add new views (read models) at a later point and have all the data available based on the historical events in the system.
Similarly, changes to views can be achieved by resetting the read model(s) and restarting the affected projection(s) instead of dealing with complex data-migrations.

There is a slight catch with that however: there is a certain delay until the read model projection has caught up to a somewhat recent state.
We can deal with eventual consistency when it comes to small delays, since most systems and processes were working with stale data at some point anyway - for example: when a user opens a view, the data that is presented to him might have already have changed or be in the process of change while he is viewing it.
However, larger delays might cause some serious trouble - especially if the user is not aware just how stale the data is and bases their decision on it.

I think, that ideally, you would aim to deploy such changes in parallel on a secondary system and switch over once the projections caught up (blue/green deployment, sourced from the same event stream).
What other options could be explored here?

Alerting the user in case the view he uses is not up-to-date (metric: projection has not hit end of event stream yet, or there are at least X events left to process)?
This would not even be possible to do in all views - or how would you inform your BI people that their DHW or olap cube isn't "ready" yet?

Switching over on the view model level instead of on the entire deployment could be another option - as long as you have control over the old and new views and can switch between them on demand.
This would boil down to enabling features (i.e. views) only after the projections for the new features have caught up. After a while you'd end up accumulating all the old versions and needing to switch between them.

I ask because we don't have the luxury of blue/green deployment available to us and we don't deploy centrally either - we effectively only provide the new builds. When (and if) they are deployed, is up to other people - some might update often and in small increments, others might only update rarely. The actual deployment should be as automatic as possible too - so needing to switch deployments is not something we can rely on (not every region has a competent operations team).


urbanhusky

unread,
Jul 28, 2016, 10:06:01 AM7/28/16
to DDD/CQRS
Since we're using EventStore, we know when a projection (i.e. a subscription) hits the end of the event stream (live processing) or when it falls behind again (subscription dropped).
We have metrics available, which track this state, so every request against the read model repository can also return metadata about the health/vitality of the data contained. For our web API this will result in adding headers to the response, which the application on top uses to guide the user or temporarily lock them out (similar to toggling maintenance mode in some applications).

We even estimate how long it will take for the projection to catch up, so that gives us a reference when we should recheck to fetch the latest version of the data.

This does not inform direct consumers sitting on top of a "reporting" SQL database, but nothing stops us from simply writing the metrics into that database as well.

I'm confident that this is a good solution. The system would automatically pause and resume certain views whenever the projections lag behind or catch up - which should usually only happen right after an update or when restoring from a backup.
Reply all
Reply to author
Forward
0 new messages