change of read model with and without event sourcing

1,227 views
Skip to first unread message

Christian Setzkorn

unread,
Jun 11, 2014, 3:54:27 AM6/11/14
to ddd...@googlegroups.com
One advantage of event sourcing is that, if one requires to adapt the read model, one could just drop the whole read model and republish all the events. I have two questions regarding this. 

(1) Let us say the read model is persisted as denormalised flat table in a relational database. If I now intend to add another flat table and I know the relevant events, is the common practise to just re-publish those events?

(2) If I use CQRS without event sourcing, how do I manage change of the read model? Would I have to query the domain's write model to create a particular new table? This obviously requires down time.

Any feedback would be very much appreciated. Many thanks.

@yreynhout

unread,
Jun 11, 2014, 5:52:12 AM6/11/14
to ddd...@googlegroups.com
(2) Be append only (depending on the relational store I doubt it'll require downtime). Yes, querying the domain's write model storage would seem reasonable, as long as the shape of the data matches to some extent and you didn't lose the data you were interested in in the first place.

Philip Jander

unread,
Jun 11, 2014, 6:39:40 AM6/11/14
to ddd...@googlegroups.com

Hi Christian,

regarding (1):
whatever part of your software hosts/manages/projects the readmodel, should be subscribed to the event store and should keep track of what events have been processed/projected.
So if you reset a readmodel, the projection will detect that the current state is stale (and pathetically so, since it has been reset) and request all events relevant for bringing the projection up to date.
The answer is to use a pull model (query all missing events) instead of a push model ("republish events").

No matter if you use event sourcing or not, your live will be a lot easier if you design your software such that readmodels are automatically updated whenever they are out-of sync, and can generate themselves including possible DB/table structure from scratch.
This way, you can just reset/delete any readmodel (e.g. drop the tables in a DB) and wait for it to come back.
Once your regeneration time becomes too long (should never happen with (2) except for BI/statistical analysis data; some millions of events for (1)) you can use double buffering to regenerate readmodels behind the scenes.

Also, regarding (1), I would challenge the idea of having "the read model persisted as denormalised flat table in a relational database". I use many different readmodels for many different applications using different storage technologies:
 - lists, statistical data and indices are commonly kept in memory only for rapid access, RAM is cheap after all ;)
 - historical data is only ever projected on-request from the event store,
 - detail records is kept in a key value store (small systems: in memory, medium systems: any kind of DB, large systems: sharded),
 - data for reporting is kept in 1NF relational databases,
 - ad-hoc querying is usually done from the reporting database - I love MS access fontends to MS-SQL DBs here.
It pays to have the projection isolated from the data persistence so you can configure readmodels to different storage methods.

Cheers
Phil
--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christian Setzkorn

unread,
Jun 11, 2014, 6:58:47 AM6/11/14
to ddd...@googlegroups.com
Philip thanks for this detailed reply. I have 2 questions - hope you do not mind:

(1) I thought the read model should be totally independent of the write model? But if you have a pull model, as you suggested, is the read model not somehow coupled to the write/domain model (i.e. it has to tell the write/domain model - give me all events x and y)? I thought that a bus/publisher-subscriber pattern (in memory our not) decouples things. Hence, there is no bidirectional communication taking place.

(2) You mention 'reporting database'. Are you suggesting that the reporting database is separate from the read model and if so how is it kept up to date?

Thanks.


--
You received this message because you are subscribed to a topic in the Google Groups "DDD/CQRS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dddcqrs/wtJbvLUmh5k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dddcqrs+u...@googlegroups.com.

Philip Jander

unread,
Jun 11, 2014, 1:18:04 PM6/11/14
to ddd...@googlegroups.com

Am 11.06.2014 12:58, schrieb Christian Setzkorn:
Philip thanks for this detailed reply. I have 2 questions - hope you do not mind:
sure.


(1) I thought the read model should be totally independent of the write model? But if you have a pull model, as you suggested, is the read model not somehow coupled to the write/domain model (i.e. it has to tell the write/domain model - give me all events x and y)? I thought that a bus/publisher-subscriber pattern (in memory our not) decouples things. Hence, there is no bidirectional communication taking place.

The event store is *not* the write model. Published events are the output of the write model operating in response to a command. Structurally, events decouple the write model and the read model. Both depend on a common abstraction: the events.
(Side note: a slight drawback of the non-event-sourced CQRS (your (2)): in that case, projections run directly from write-side-database into read-models, without a structural abstraction in between.)

As far as control-flow is concerned, a pull model decouples better: if you use a push strategy, the publishing side has to keep track of/ensure that all interested parties get all the events. Also, if you need to rerun a projection, the publishing side needs to be actively involved ("republish"). If distributing, you need heavy middleware to guarantee delivery. IMHO this is quite unfavourable design. In a pull model, the publishing side (here: event store) offers any events for retrieval and offers subscriptions to a pointer to the latest chunk of events/transaction/whatever you call it. Subscribers are responsible for knowing what events they processed to far. If a subscriber missed some notifications, it just retrieves the missing events lateron. If you have a new subscriber (or reset a subscriber's read model storage), you just retrieve all of the relevant history once for initial processing before resuming normal operations. Your publisher does not need to be aware of subscribers' states at all -> decoupling.

In particular, I dislike the term "republish events". Events get published, after that they are just that - published (aka publicly available).

The independence of write and read-model is a different topic altogether. That is mostly about keeping non-functional requirements of one model corrupting the implementation of the other. E.g. the write model benefits from data normalization, the read models benefit from data de-normalization; the write model has to be (at least partially) transactionally consistent, the system can benefit from eventually consistent read models; the write model is mostly concerned with invariants/business logic, the read models are concerned with delivering data, ...

I hope this clears this up a little bit.




(2) You mention 'reporting database'. Are you suggesting that the reporting database is separate from the read model and if so how is it kept up to date?
Not specifically, but generally yes ;)
The main point is that there is no "one read model". Instead, I suggest that each client (where client might for example mean a specific reporting engine) should have a read model designed for optimally providing data to that client. It follows, there will be multiple independent read models in any system that is non-trivial. For each read model you can design the "data structure (the read model in the strict sense of the term)", "projection/subscription mechanism", "persistence mechanism" and "hosting mechanism" independently.

To give you some extreme examples from my systems:

the "persistence mechanism" for historical info on some specific entity is "none". This means that such read models are projected on the fly. This is sensible since such requests are rare and involve specific event streams and therefore can be projected in a performant way.
"persistence mechanism" for statistical data is "in-memory" on a specific statistics hosting process. This is sensible since statistical data is usually a very compact data set, not requiring a data base. Projection takes some time but involves many events. Therefore a specific process, which can have a long uptime hosts this data set. Upon restart it needs to process nearly all events which may take some time, but use cases involving statistical data are not harmed by a longer downtime after maintenance (in contrast to many other use cases).
If this becomes to long, using a memento to store a snapshot of the data set is easy. If the structure of the read model changes (i.e. new statistics are defined), I delete the snapshot and take the one-time downtime hit.

See my previous message for other examples.

It is really important to mentally disentangle concepts such as read model, read model storage, read model projection, read model hosting, event publishing, event generation, event storage, subscription to events, retrieval of event data, etc.
Otherwise it's easy to end up with a bloated monolith again, only cqrs and event sourcing make it more bloated. Believe me, I've got some experience with that  :)

Cheers
Phil

Christian Setzkorn

unread,
Jun 11, 2014, 2:00:12 PM6/11/14
to ddd...@googlegroups.com

Excellent! Thanks very helpful reply.

Christian Setzkorn

unread,
Jun 11, 2014, 2:02:07 PM6/11/14
to ddd...@googlegroups.com

Sorry one more question. Am I right to assume that the write model only exists if there is no event sourcing?

On 11 Jun 2014 17:18, "Philip Jander" <jan...@janso.de> wrote:

Kijana Woodard

unread,
Jun 11, 2014, 2:55:32 PM6/11/14
to ddd...@googlegroups.com
"The event store is *not* the write model. Published events are the output of the write model operating in response to a command"

Christian Setzkorn

unread,
Jun 11, 2014, 3:08:55 PM6/11/14
to ddd...@googlegroups.com

I think I understand this. IMHO the write model is the persited domain model if there is no event store. However, if there is an event store the domain model's aggregate roots are hydrated via the events and/or the snapshots/mementos. The domain model also reacts to commands whilst generating domain events. Please correct me if I am wrong.

Maxim Kovtun

unread,
Jun 11, 2014, 3:43:43 PM6/11/14
to ddd...@googlegroups.com

In case of DDD, the domain model is the write model - the logic which generates events reacting on commands.

João Bragança

unread,
Jun 11, 2014, 3:47:23 PM6/11/14
to ddd...@googlegroups.com
In particular, I dislike the term "republish events". Events get published, after that they are just that - published (aka publicly available). 

+1

I think I understand this. IMHO the write model is the persited domain model if there is no event store. However, if there is an event store the domain model's aggregate roots are hydrated via the events and/or the snapshots/mementos. The domain model also reacts to commands whilst generating domain events. Please correct me if I am wrong.

basically

I can't stress this enough. CQRS only means using one interface for writing and another for reading. There is nothing that says you can use the same underlying persistence model for both. The point of separating the two is so that later if you want to go "full retard" on denormalization you can without a ton of effort.
 

João Bragança

unread,
Jun 11, 2014, 3:48:07 PM6/11/14
to ddd...@googlegroups.com
can = can't

Christian Setzkorn

unread,
Jun 11, 2014, 3:51:56 PM6/11/14
to ddd...@googlegroups.com

Thanks.

Philip Jander

unread,
Jun 11, 2014, 4:44:47 PM6/11/14
to ddd...@googlegroups.com

Am 11.06.2014 21:08, schrieb Christian Setzkorn:

I think I understand this. IMHO the write model is the persited domain model if there is no event store. However, if there is an event store the domain model's aggregate roots are hydrated via the events and/or the snapshots/mementos. The domain model also reacts to commands whilst generating domain events. Please correct me if I am wrong.


The "write model" is only about logic. It doesn't say anything about persistence. The read model is about structure of data, it also doesn't say anything about persistence. The basic idea of CQRS is that both don't go together well since "logic" and "structure of data" don't really have a lot in common. Hence one should stop forcing the write model be concerned with data structure needed only for answering queries but irrelavant for logic. And one should stop forcing the read model to execute logic only neccessary for maintaining invariants and actually quite a hindrance to performantly providing data. Therefore one should use two seperate models.

Again, nothing in this is about persistence. It's all about code. Equating "model" to "storage" is wrong (and takes some time to unlearn).

You cannot "persist" the write model. You can try to persist data representing the internal state of a write model. But (imho) that's a brittle approach since your storage now is tightly coupled to all its implementation details. This is (just one reason) why persistent OO is so difficult. And it blurs the boundary between logic and persistence. Unfortunately this approach is ubiquitous as a default.

Event sourcing gets around this by defining a language for describing the state of a model independently of its implementation. Actually it's more like the language is defined up front and the write model implementation uses that language. The language are the events in terms of the problem domain. The description is the history of the system.

I like to say that event sourcing is *the* generic form of persisting a system's state. Anything else is an optimization towards smaller data and faster persistence, for which one pays with reduced (sometimes to zero) changeability and hence maintainability of the software, as well as loss of information.

Cheers
Phil

Christian Setzkorn

unread,
Jun 11, 2014, 4:58:26 PM6/11/14
to ddd...@googlegroups.com

Excellent reply again makings much clearer. Thanks!. Yeah, I have to unlearn a few things (my ORM 'domain' models tended to be quite data centric/anemic so far with no real separation between reading and writing concerns). Thanks again.

Ashley Aitken

unread,
Jun 23, 2014, 8:32:55 AM6/23/14
to ddd...@googlegroups.com

Thanks Philip for very useful posts.

On Thursday, 12 June 2014 01:18:04 UTC+8, Philip Jander wrote:

As far as control-flow is concerned, a pull model decouples better: if you use a push strategy, the publishing side has to keep track of/ensure that all interested parties get all the events. Also, if you need to rerun a projection, the publishing side needs to be actively involved ("republish"). If distributing, you need heavy middleware to guarantee delivery. IMHO this is quite unfavourable design. In a pull model, the publishing side (here: event store) offers any events for retrieval and offers subscriptions to a pointer to the latest chunk of events/transaction/whatever you call it. Subscribers are responsible for knowing what events they processed to far. If a subscriber missed some notifications, it just retrieves the missing events lateron. If you have a new subscriber (or reset a subscriber's read model storage), you just retrieve all of the relevant history once for initial processing before resuming normal operations. Your publisher does not need to be aware of subscribers' states at all -> decoupling.

I can understand how pull for the read model better decouples the read from the write and avoids "heavy middleware" for publish-subscribe to events.  This is what Akka Persistence provides for CQRS.  However, I am interested in the "offer of subscriptions to a pointer to the latest chunk of events" and event store "notifications."  I guess this relates to a specific type of event store and removes the need for any polling etc.

Akka Persistence currently doesn't have such notifications (AFAIK).  They are adding event streams that can merge events from multiple sources and have multiple consumers but I don't feel this is equivalent.  Akka Persistence provides pluggable event stores that can be distributed, which does away with the need for heavy distributed message queues and publish subscribe, but I think they need event store notifications. 

Any thoughts on this?

Sorry to base this post around Akka Persistence and if anyone can correct me on my claims about Akka Persistence please do so - I am new to this approach and technologies and just trying to work out how to effectively and efficiently develop a read model using Akka Persistence.  

Thanks,
Ashley.
Reply all
Reply to author
Forward
0 new messages