Read model, aggregates and the book of record

David Reynolds

unread,

Jan 16, 2017, 5:05:10 AM1/16/17

to DDD/CQRS

Hi,

I've been reading some threads that are similar to a problem I'm having but haven't found a solution yet.

If I have a couple of Aggregates AggregateA and AggregateB. Each has an event that's emitted when they are created. These events are picked up by projection managers that put write the data into a read model. In order to denormalize some of the data to speed up reads details of AggregateA need to be written into the projection for AggregateB. Should this data be looked up in the read model at the time the projection writes it's data or should it be passed in the event? If it's passed in the event should an instance of AggregateA be passed to AggregateB and have some public properties?

Thanks

David Reynolds

unread,

Jan 16, 2017, 5:06:15 AM1/16/17

to DDD/CQRS

I meant to add that I have the same issue in process managers as projection managers. Where should the related aggregates data be retrieved from?

Boris Guéry

unread,

Jan 16, 2017, 5:32:37 AM1/16/17

to ddd...@googlegroups.com

If you are using a Process Manager, then the intermediate state in stored within the Process Manager itself.

The state will be updated depending on the transition you defined.

You define one possible initial state started by events of your choice (may it be Aggregate A or Aggregate B), build an intermediary state and wait for the second event, once you have it you should be able to get your final state which is nothing but your the data you want.

--
Boris Guéry - IT Consultant / Software Architect

twitter: @borisguery
mobile: +33686830312
skype: borisguery

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/dddcqrs.
For more options, visit https://groups.google.com/d/optout.

David Reynolds

unread,

Jan 16, 2017, 10:25:39 AM1/16/17

to DDD/CQRS

Thanks Boris. I get that with process managers. What about aggregates? Perhaps it's a slightly different problem but I think it's fairly similar. If I want to define a relationship between the AggregateA and AggregateB I could have a method on AggregateA such as AddRelationship(). Would you load an instance of A and B and pass B into A when calling AddRelationship(AggregateB aggregate) or would you somehow do this with events? One of the things that seems to be a "rule" is that aggregates shouldn't expose internal state so how do I get the data out of B to put into the event that gets emitted from AggregateA? Use something like a momento or is there a better way to do this with events?

To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

Danil Suits

unread,

Jan 16, 2017, 11:34:14 PM1/16/17

to DDD/CQRS

One of the things that seems to be a "rule" is that aggregates shouldn't expose internal state

Write only data stores aren't useful.

My mind currently resolves the contradiction this way: the responsibility of aggregates is to maintain the business invariant when changing state. Therefore, if you aren't changing state, you don't need the aggregate. Therefore, if you need some of the state that is managed by aggregateB to change the state of aggregateA, you just pass a copy of aggregateB's recent state as a command argument without involving the aggregate "aggregateB" at all.

That's possibly a bit heavy for something as simple as AddRelationship; depending on which invariant you need to satisfy you may be able to get away with passing the identifier of aggregateB without needing to include its recent state as well.

But if you do need aggregateB's recent state, then you load that state from the book of record without using it to rehydrate the aggregate. Copy it into a read only model, or a immutable value type, or something that makes suitably clear that the data is not subject to change in this transaction (you could even mark it to make clear to the consumer of the code that you are looking at a copy of aggregateB's state; that some other thread may be changing that state in the book of record while the aggregateA command is processing).

Boris Guéry

unread,

Jan 18, 2017, 4:18:36 AM1/18/17

to ddd...@googlegroups.com

I don't really understand what you are trying to achieve.

If you absolutely need to keep AR-A and AR-B mutually consistent in a common transaction, it may be the sign that something is wrong with your design.

Note that a ProcessManager may be nothing more but an Aggregate on its own, you may even want to simply event source it.

I think you have to define clearly what you are trying to achieve, is it enforcing an invariant (which is currently messed up in two aggregate) or you are trying to build a view model based on events emitted from two distinct aggregate which is not business a rule on its own.

--
Boris Guéry - IT Consultant / Software Architect

twitter: @borisguery
mobile: +33686830312
skype: borisguery

To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.

David Reynolds

unread,

Jan 18, 2017, 2:57:44 PM1/18/17

to DDD/CQRS

The problem I'm having with the whole cqrs pattern is how to make the right data available at the right times. As far as I know aggregates are the source of truth. They have an internal state that they use to enforce invariants and they emit events containing some of that data. They shouldn't expose their internal state. Process managers are similar but respond to events and after making decisions it sends commands. Projection managers handle events and write data to the read model.

In our project Location is an aggregate. It used a lot and if we made all locations entities in a larger aggregate we would run into significant concurrency issues. We do need to have relationships between locations. I have an AddRelationship method on the Location. The location then emits a RelationshipAddedToLocationEvent. The problem I have is where and when to get the data I need for the aggregate, process manager and projections.

If the aggregate needs to enforce an invariant about itself but needs data from the second location to do so then how does it get the data. Should the two locations be loaded in a command handler and the second passed to the method on the first? If aggregates aren't meant to expose their state then that doesn't make sense. Is it just that aggregates don't expose their data for modification? Or, should I query the read model for data about the second location?

What data should the RelationshipAddedToLocationEvent contain. Should it only have the two location ids or data from both locations? If a process manager handles this event and needs to make a decision about which command to send based on data from both locations how does it get the data. If only the ids are in the event then would it need to go to the read model? If all the data it needs is in the event then there is no need to do this.

It's the same for projections. If, for performance reasons I want to denormalise the data and write information about both locations to a single row then what's in the event is important. If it's only ids then I would need to query the read model again.

I've read lots on the subject and the advice seems to be inconsistent and contradictory. If the aggregates are the truth then surely I should be using them wherever I can in order to make business decisions. At the moment my events are full of ids and I'm constantly having to go to the read model in command handlers, events handlers, process managers and projections to get extra data I need.

While my example is about a specific case I'm having these issues on a more general level with my whole system. I have Contract as an aggregate which lots of parts of my system need in order to make decisions both in aggregates and process managers and I have similar problems with that. Maybe I'm missing something very obvious in all this but any advice would be gratefully accepted.

Danil Suits

unread,

Jan 18, 2017, 11:26:34 PM1/18/17

to DDD/CQRS

Thoughts from a novice

As far as I know aggregates are the source of truth

The "book of record" is the source of truth -- aggregates constrain the changes that can be made to the book of record.

Process managers are similar but respond to events and after making decisions it sends commands.

So far, I find that process managers align more closely with read models. Rinat's observation that a process manager is an evolution from a human being staring at a view influenced my thinking there quite a bit.

If the aggregate needs to enforce an invariant about itself but needs data from the second location to do so then how does it get the data.

The simple (simplistic) answer here is that the aggregate knows about its own state, and the state passed to it as part of the command. In other words, the data is an argument.

Should the two locations be loaded in a command handler and the second passed to the method on the first? If aggregates aren't meant to expose their state then that doesn't make sense. Is it just that aggregates don't expose their data for modification? Or, should I query the read model for data about the second location?

Possible answers include:

the producer of the original command message copies the required state from the read model into the command message.
the command handler finds an identifier in the command message, and queries the read model for the required state
the command handler finds an identifier in the command message, queries the event store for a history of the aggregate, and rehydrates the necessary state itself
the aggregate finds the identifier of the other aggregate, and queries a domain service to fetch the read model
the aggregate finds the identifier of the other aggregate, and queries the event store, and rehydrates as before
screw it, just assume that the command was already checked, run it and detect consistencies rather than than trying to prevent them.

The issue, of course, is that all but the last of these is vulnerable to the problem that while you are checking the current state, somebody else could be changing the copy of that state in the book of record.

So if you've got a graph, and you want the write model to ensure that the graph is immediately consistent after each write, then the graph needs to be contained within a single aggregate boundary.

In other words, there's a trade off here -- and you need to be weighing the costs of the alternatives (see Greg Young on Set Validation: http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/ ). There's no magic.

The closest answers I have seen to magic come from analyzing the state, and discovering that the aggregate boundaries don't have to be where you have placed them -- if the Location aggregate is separable into "state that must be immediately consistent with the graph", and "stuff that doesn't care about the graph at all", then maybe you have a graph aggregate (overlapping all of the locations), and NeoLocation aggregate that can be modified independently of each other.

If a process manager handles this event and needs to make a decision about which command to send based on data from both locations how does it get the data

All of my reading suggests that process managers don't make decisions; they are finite state machines - you query the state to discover what outstanding work is to be done. One part of the riddle is that state machines run in the past -- they are looking at events that have already happened; they don't know anything about the state of "now", only "then". The process manager may copy state it has read in events (again, from the past) into the command it dispatches, but you wouldn't normally expect it to include data from other sources.

One idea that may help -- command message and event messages are part of the API; it's perfectly normal and reasonable that a command handler might need to adapt a message to match the current implementation of the aggregate.

I've read lots on the subject and the advice seems to be inconsistent and contradictory.

Yup -- I'm with you there. Lots of people touching different parts of the elephant.

One idea to keep in a back pocket -- if you can't find a satisfactory way to model the business, it may be that either (a) you've over constrained the business or (b) the business itself is broken, and needs to be fixed.

Last thought -- if you can be specific about the problem you are trying to solve, and can go into detail about the problem space, and where your invariant comes from, and so on; this audience tends to want to dig pretty hard into that. What's a Location? is it a digital thing or a real world thing? Is your aggregate actually the final authority, or is it just recording messages from somewhere else? What kinds of relationships are there between Locations? What constrains the lifecycle of a relation? Much more effective, here, to ask for a solution to a very specific problem and discover generalizations from the discussion than to try to discuss a general solution that might apply to your specific problem.

David Reynolds

unread,

Jan 31, 2017, 1:32:44 PM1/31/17

to DDD/CQRS

Thanks for taking the time to answer.

One specific problem is to do with Locations. Locations are a representation of a physical location in the real world. In our case car dealerships head offices and a few other types. There are relationshis between dealerships and their head offices but not with other location types.

We have a screen to allow users to enter locations and a seperate screen to add the relationships. These can be between dealerships and other dealerships or dealerships and head offices. They are one way relationships and contain information about the type of relationship it is and some other data. The user selects a particular location then goes to a screen that lets them enter the other location and the data.

The command carries the locationid, relatedlocationid, relationship type, etc. The command handler loads the first location aggregate then needs to pass details of the second? Some business rules such as checking that the type is a valid type (head offices cannot be linked to other head offices, dealerships can only be linked to their dealerships or head offices, etc), checking that this relationship doesn't exist already, checking that the addresses of the locations are valid (they can't be in different countries), etc. In order to do these checks data is required about both aggregates but the only thing on the command is the ids. As far as I know these checks should be done in the aggregate?

So if the rules pass then the event that's raised in the aggregate should contain what data? We have several projections that are interested in this event in order to update read tables. They all want different parts of the data though. We have one report that needs bothe the relationships addresses, one that needs their ids and a display name to populate a drop down in the ui, one that needs info about the locations to show a list of all the relationships. Should all the data that all of these projections need be on the event? Should the aggregate emit several events? Should the projections look the data up in the read model? What if I add another projection that needs more of this data, I will need to change the event?

The last thing is that this event is part of a process manager. When a relationship is added we need to setup some payment information and do some work in other aggregates. Which other commands we send from the process manager is dependent on the data. If it's a relationship between two dealerships then different commands are sent than if it was a dealership and a head office. What happens if the data required to make Ts decision isn't available at the time the event is raised?

again, thanks for the reply and any help is appreciated. Maybe I just don't get the pattern but I definitely seem to be missing something.

Danil Suits

unread,

Feb 1, 2017, 12:09:13 AM2/1/17

to DDD/CQRS

Locations are a representation of a physical location in the real world

So something to notice here -- the real world is the book of record; you've got a version of a document that describes the real world, and you are trying to enforce business rules about the consistency of this document with others. How could your domain model, running in isolation from pretty much anything, possibly judge which document contains an error when the conflict? Preventing a user from entering correct information because someone else previously entered incorrect information seems to be an inversion of priorities.

(The same idea expressed another way: I don't believe that the domain model is the right place to be trying to validate data entry).

Treating a single document as an aggregate makes some sense to me; you can reasonably detect inconsistencies within a single document. With multiple documents, I think you want to be thinking about responding to inconsistencies detected on the read side, rather than trying to enforce consistency before the writes.

But setting that aside, I believe its reasonable to take a "thin" command message, and enrich it in the command handler before passing it along to the model (or equivalently, having the model call back out to the command handler to ask for more information). The events emitted by the aggregate describe the change to its own state -- that is to say, the state that needs to be saved so that the next version of the aggregate can make its next decision consistent with this one. The rest of the enriched data can be discarded.

Stay wary of the fact that the consistency you are enforcing here is an illusion -- while you are making your change here, another change could be running at the same time that invalidates your local assumptions.

Reply all

Reply to author

Forward