Remodelling aggregates, changing aggregate boundaries - in an event sourced system? And general questions

179 views
Skip to first unread message

urbanhusky

unread,
Jun 23, 2016, 6:18:54 AM6/23/16
to DDD/CQRS
I'm trying to get a better understanding of how DDD, CQRS and ES can (or should) be applied and have a few, somewhat interconnected, questions.
I've listed the questions at the end of this post, but I also appreciate any pointers or information you can give me.
My background here is that I'm responsible for designing a new architecture (strangler application for a data-centric big ball of mud; Domain: ERP) and I think that DDD/CQRS/ES would be a good fit for our use cases and solve many of the issues we've seen.

General questions:
First of all, when does DDD, especially thinking in terms of aggregates etc., make no sense? I found aggregates, bounded contexts etc. to be an invaluable concept so far.
I could see that it would not be as useful if the developers had no contact with the business experts, but aggregates, bounded contexts etc. still make sense to me even if you're not given information about how the business works.

The same question can be applied to CQRS and, especially, event sourcing - at least in my current project.
My motivation is to event source everything - partly to get an audit trail and to help us debug issues, be more flexible with our read side, performance (big point), ...
We currently spend way too much time trying to figure out what the user did or what happened that caused a specific state...

I'll try to give a brief overview of how my proposed architecture looks like to give you some context for my next set of questions - and hopefully get a few pointers if I did something glaringly wrong.

General:
Events have an absolute order over all streams (Event Store, $all stream) and within each stream (another name for this could also be topic, when looking at it like a publish/subscribe system).

Write side:
Commands load the current aggregate state (rehydration) from its corresponding event stream (such as "Order.{guid}") and perform the action, which corresponds to the command.
This would usually just be calling the appropriate method or setting a property etc. If this would violate invariants, exceptions are thrown.

Invoking methods or setting properties etc. on the aggregate raises events. The aggregate uses these events to update its state - any invariants are enforced on the method or property setter, not on the event handler.
After invoking the method (or setting the property), the raised events are propagated - assuming that no exceptions were thrown, otherwise the current aggregate is no longer used and deallocated (garbage collected at some point).

Propagation of the events appends them to the corresponding stream. Subscribers might get notified about the new events.
Other bounded contexts might subscribe to our stream and transform and publish integration events into their own context (effectively acting as an ACL?).

Read side:
Projections are subscribed to the entire event stream and update their read models. These read models can be in SQL, in a document database, in-memory etc. The current position of the projection is also stored in the same data store, so that we can continue if we have to suspend it (crash, restart etc.).
Queries are run against these read models.

If we have to change a read model, we reset the corresponding projection and its data store - for example: if there are pending entity framework code first migrations, we do a complete downgrade of that context followed by a full update.
This only affects the store of the single projection (e.g. a single context in EF).

More detailed questions:

At some point, we might remodel our aggregates. We might merge or split them or have other changes in our domain model.
This causes issues with the write side, because the new aggregates do not correspond to an event stream any more, but there are several event streams which contain the events that are needed to reconstitute the aggregates. We might not even know which event streams we have to read to get the state - and we only have ordering within streams.
If we look purely at the write side, an argument could be made for publishing migrated events, which are transformations of the existing events into the new event types that correspond to the new aggregates.
Such an operation would be append-only, because event sourcing. For pure aggregate reconstitution, this would work fine.
Granted, we would have multiple events that somewhat mean the same thing (or sub/superset), but we would still have the original events in case we would need some information from them later.

On the read side however, the projections run on the "all events" stream, in order of when the events were published.
Assume we have a sequence of {ItemCatalogue.1.Defined, Order.1.Placed, OrderableItem.1.Defined} - with ItemCatalogue being remodelled and one of the new aggregates being OrderableItem (ignore if this example makes sense in regards to the chosen terms etc.).
At some point we'll have to reset some of our projections.

The projection now has to be able to handle ItemCatalogueDefined and OrderableItemDefined events (i.e. be able to interpret any domain model representation we ever had).
But what if we were to process only the new versions of the events? Then our event order would break, because we would define items after they have been ordered...
I'm ignoring potential issues with temporal queries right now because this is enough of a headache already - but that is based on the same problem of interpreting the event order.

Let me elaborate further. In my projection I might need additional data, which is not available on the event - for example: I might need more information about a customer so that we can denormalise the order with all relevant fields.
For this I'm also projecting the related data as if it would be a more normalised read model - but instead of joining, I read the current related data when I process and event.
Generally, my approach would be to have as much information as we need - not just the aggregate Ids - on the event and build a lookup for any data I did not have on the event.
The lookup is generated in the same projection while consuming the event stream. If I need more information on an order about the item, the ItemCatalogueDefined event before the order updated the lookup so that it has the same state as when the order was placed.
I do not want to query other read models, because their projections might be at a completely different position in the event stream.

I'm not sure how good this approach is though.

This causes even more issues if we change projections for documents, which should not change. Invoices, delivery notes etc. Suddenly, an old delivery note looks completely different from back when it was generated...
What strategies are available to deal with such issues?

The other contexts also might need to be updated to be capable of dealing with the new events, which means that the relationship is more one of conformist than an anti-corruption layer? (On the other hand, an ACL is a conformist itself?)

So, to summarise:
  • When should I not use DDD, or aggregates? I see a benefit even if there are no invariants (yet), especially when doing event sourcing.
  • When should I not use event sourcing?
  • Are there any big issues in the outlined architecture?
  • In an event sourced system, how can I remodel the aggregate boundaries?
  • How should I deal with missing data in projections? Do you see issues with the approach of having a lookup for each projection?
  • How can documents be handled, which should not really change - yet they are generated through projections? (delivery notes, invoices etc.)
    • Create a new projection for each new document version? (and track version in the corresponding event) This would complicate reading though (we'd have to know which version of a document was used when searching for it)...
  • Should I publish translated events into my own context when another context publishes them? (i.e. how to solve bounded context integration in event sourcing)

Danil Suits

unread,
Jun 24, 2016, 2:06:23 AM6/24/16
to DDD/CQRS
Disclaimer: I'm still a novice....
 
First of all, when does DDD, especially thinking in terms of aggregates etc., make no sense?



The same question can be applied to CQRS


Propagation of the events appends them to the corresponding stream. Subscribers might get notified about the new events.

Persisting events and publishing events are different things.  That's not obvious in what you wrote.
 
Should I publish translated events into my own context when another context publishes them?

Taking some other context's message, and publishing a different representation of it, sounds nuts.  If my book of record changes as a response to the message from another context, then I will publish messages that describe how my book of record changed.  The meta data in the events describing my changes may reference messages published by other contexts.


This causes issues with the write side, because the new aggregates do not correspond to an event stream any more, but there are several event streams which contain the events that are needed to reconstitute the aggregates. We might not even know which event streams we have to read to get the state - and we only have ordering within streams.

"The streams are an illusion, Exile"  There are reasons to write events into different streams, but correctness isn't one of them.

That said, as far as I know, nobody every said the "read model" couldn't be another event store.  Get a process going that reads V1.$all, and figures out how to write each event/sequence into V2.store, with V2 projections subscribing to that store, and when you are "caught up", then last one out turn off the lights and cut over?  Not sure what happens with the event metadata.


urbanhusky

unread,
Jun 24, 2016, 3:09:40 AM6/24/16
to DDD/CQRS
Propagation of the events appends them to the corresponding stream. Subscribers might get notified about the new events.

Persisting events and publishing events are different things.  That's not obvious in what you wrote.

These aren't different concepts in Event Store (geteventstore.com). Event Store is both an event store and a pub/sub - which works out great for us because we can't deploy RabbitMQ etc. - and it's atomic iirc.
 
 
Should I publish translated events into my own context when another context publishes them?

Taking some other context's message, and publishing a different representation of it, sounds nuts.  If my book of record changes as a response to the message from another context, then I will publish messages that describe how my book of record changed.  The meta data in the events describing my changes may reference messages published by other contexts.

(Note: I mean translation as in "integration", not as in I18N/L11N)
I would argue that I have to translate them, because the terms and processes in the other bounded context have different meanings - i.e. the reason why we have BCs in the first place is to use the proper language per context.
An order in the order context could translate to simple demand on items in the stock/warehouse context. You don't even have the same aggregates in different contexts, so I don't see how this could be done without translation/integration on the context's border.
 

This causes issues with the write side, because the new aggregates do not correspond to an event stream any more, but there are several event streams which contain the events that are needed to reconstitute the aggregates. We might not even know which event streams we have to read to get the state - and we only have ordering within streams.

"The streams are an illusion, Exile"  There are reasons to write events into different streams, but correctness isn't one of them.

That said, as far as I know, nobody every said the "read model" couldn't be another event store.  Get a process going that reads V1.$all, and figures out how to write each event/sequence into V2.store, with V2 projections subscribing to that store, and when you are "caught up", then last one out turn off the lights and cut over?  Not sure what happens with the event metadata.

It seems a bit expensive/extensive to build up a new event store just to reorder the events. Also, that would enable us to manipulate events almost freely and will end up being abused.
Reply all
Reply to author
Forward
0 new messages