Are unique command and event IDs necessary?

Michael Ainsworth

unread,

Jul 14, 2016, 7:20:03 AM7/14/16

to DDD/CQRS

I've implemented an event store in a relational database, which has a unique UUID identifying the event (the primary key) and a unique constraint on the aggregate_type, aggregate_uuid and aggregate_version columns. This latter unique constraint is for optimistic concurrency - a duplicate indicates two processes were trying to save the same aggregate at the same time.

However, apart from the use of correlation and causation IDs, it is redundant to have two unique identifiers. In a simple scenario, what are the drawbacks of removing the event UUID?

Greg Young

unread,

Jul 14, 2016, 7:24:10 AM7/14/16

to ddd...@googlegroups.com

why is aggregate type part of your key?

usually the uuid "message id" on commands and events is used for idempotency

> --
> You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Studying for the Turing test

Danil Suits

unread,

Jul 14, 2016, 10:43:36 AM7/14/16

to DDD/CQRS

However, apart from the use of correlation and causation IDs, it is redundant to have two unique identifiers

Is it? It looks to me as though they identify different things -- the event has an identifier, and the position in the stream has an identifier. Eliding that distinction because the two happen to be interchangeable in your current design, are you sure you aren't over fitting?

If we started from a single event sourced model - one history, one stream of events, and decided later that we wanted to decompose that model into aggregates with a separate stream for each, would we expect the events to change?

If process managers are keeping track of which events they have processed, and you change the streams, how happy are you with position in stream identifier?

What about compensating events?

My feeling is that events are real, in the sense that they are part of the domain, but streams are not real; streams are an artifact of persistence. Using the ID's of the latter to identify the former strikes me as a case of false equivalence.

Michael Ainsworth

unread,

Jul 14, 2016, 6:50:17 PM7/14/16

to DDD/CQRS

The aggregate type is so that I can have two aggregates of different types with the same UUID. E.g., if there is an Order aggregate and an OrderProcessManager aggregate, they'll have the same UUID. A textual aggregate type allows for correlation between related aggregates.

In regards to idempotency, the version number caters for that. Commands also become idempotent when a version number is used.

Michael Ainsworth

unread,

Jul 14, 2016, 6:55:01 PM7/14/16

to DDD/CQRS

Wouldn't splitting an aggregate in two require some kind of transformation/migration process anyway?

Greg Young

unread,

Jul 15, 2016, 3:59:14 AM7/15/16

to ddd...@googlegroups.com

"The aggregate type is so that I can have two aggregates of different
types with the same UUID. E.g., if there is an Order aggregate and an
OrderProcessManager aggregate, they'll have the same UUID. A textual
aggregate type allows for correlation between related aggregates."

What makes this sound like a good idea? Why even use uuids at this point?

Michael Ainsworth

unread,

Jul 15, 2016, 8:38:26 AM7/15/16

to DDD/CQRS

I'm coming from an SQL background, so I'll try to explain my thought process.

There's two issues here. Firstly, aggregate type identification.

A table "users" has a unique integer identifier (serial/autoincrement). A table "orders" also has a unique integer identifier (again, serial/autoincrement). They have the same integer identifier, by they are located in different tables because they have different data structures and invariants on those data structures, and there is therefore no collision.

When moving to event sourcing, all the data is placed in a central event log to replay each aggregate's history. If the application can't tell which type of object (a "user" or an "order") a set of events belongs to, then how can the events be applied to them? E.g., say I have a "user" aggregate with an ID (not UUID for brevity) of "ABC" a single event "user registered". If a client issues a "cancel order ABC" command, the command handler will ask the order repository to replay all events for aggregate "ABC". Only "ABC" is not an order, but a user.

The second issue is correlation between aggregates of different types. E.g., say that the cancellation of an order does three things: 1) the "order" aggregate is cancelled (duh); 2) an "sorry you didn't order, how can we improve this?" email survey is sent using a process manager; 3) a "lead" aggregate is created (also a process manager), allowing staff to follow up with the customer.

The cancellation of the order triggers the creation of two other aggregates (also process managers). If aggregate UUIDs must be unique even across aggregate type boundaries, we can randomly generate the UUID, but that does not provide a correlation between each aggregate's primary unique identifier (it's UUID). In this scenario, we'd have to load related aggregates - e.g., the "lead" from the cancelled "order" - by using the lead's "order UUID" property. Alternatively, we could generate a lead UUID deterministically. This correlation allows an easy one-way translation from the (originally random) order UUID to the (deterministically generated based off the order) lead UUID. A third scenario is to allow two aggregates to have the same UUID provided that they are not of the same type, hence allowing a more direct correlation.

There's a number of scenarios where the same ID is used to identify aggregates of different types. E.g., a customer might be identified as "SPR16-40789". This customer identifier has their registration sale period ("Spring 2016" or "SPR16") embedded in it. Their invoices might be identifies as "SPR16-40789-12", again with the customer number embedded in the invoice number. Such "correlated primary identifiers" can get very ugly (formatting issues, changing conventions used by the business, etc, etc), but their best quality is that you can look at one object and be quickly able to identify related objects. E.g., you can look at an invoice and know which customer it was for and the sale period in which the customer first registered.

Just because UUIDs have "universal" in the name, that doesn't mean that we have to treat them universally, like some kind of global variable.

I hope the above explains my reasoning. I look forward to your reply.

Greg Young

unread,

Jul 15, 2016, 8:40:26 AM7/15/16

to ddd...@googlegroups.com

"When moving to event sourcing, all the data is placed in a central
event log to replay each aggregate's history. If the application can't
tell which type of object (a "user" or an "order") a set of events
belongs to, then how can the events be applied to them? E.g., say I
have a "user" aggregate with an ID (not UUID for brevity) of "ABC" a
single event "user registered". If a client issues a "cancel order
ABC" command, the command handler will ask the order repository to
replay all events for aggregate "ABC". Only "ABC" is not an order, but
a user."

Why is the UI issuing a command to cancel the order when ABC is a user?

On Fri, Jul 15, 2016 at 3:38 PM, Michael Ainsworth

Michael Ainsworth

unread,

Jul 15, 2016, 8:41:20 AM7/15/16

to DDD/CQRS

Just to be clear, the scenarios I was talking about are primarily in legacy systems.

Michael Ainsworth

unread,

Jul 15, 2016, 8:44:25 AM7/15/16

to DDD/CQRS

Because a script kiddy is trying to see how the system works. Because a third-party application developer has a bug. Because a bot is evaluating JavaScript in your web application and is randomly submitting forms.

I think the golden rule "never trust user input" still applies with CQRS/ES.

Greg Young

unread,

Jul 15, 2016, 8:47:50 AM7/15/16

to ddd...@googlegroups.com

ok so what would happen when your order tries to load from the events?

I really don't get your use case or worries here.

Something you are not considering is that there may be more than one
type of aggregate that can load from the same stream. A perfect
example of this:
http://codebetter.com/gregyoung/2010/03/09/state-pattern-misuse/

On Fri, Jul 15, 2016 at 3:44 PM, Michael Ainsworth

Danil Suits

unread,

Jul 15, 2016, 9:35:41 AM7/15/16

to DDD/CQRS

> Wouldn't splitting an aggregate in two require some kind of transformation/migration process anyway?

Sure. But don't we want the transformation to have finite scope?

Consider - my process is running in a separate bounded context from yours. I'm subscribed to your events. To prevent duplicate processing on my side, I need to track which of your events I have already seen. Since, in general, there may be more than one event with the same business data in it, I need a unique identifier to be able to tell which is which. To ensure the reliability of my process, my tracking of your events is made durable by writing it to my database.

So far in the story, everything is happy and normal; your micro service in your bounded context has your data durably written in your database, my micro service in my bounded context has my data durably written in my database.

Then your requirements change, and you need to refactor an aggregate. In doing so, the identifier for your events changes. And as a consequence, I need to migrate *my* database?

That doesn't seem right.

Put another way, I'm subscribing to the domain events that you publish -- that's part of your public interface. The fact that your aggregates are event sourced, rather than simply dumped into some O/RM, is one of your implementation details that I shouldn't need to know about.

Michael Ainsworth

unread,

Jul 15, 2016, 5:02:51 PM7/15/16

to DDD/CQRS

Thank you. I'll read the article.

Michael Ainsworth

unread,

Jul 15, 2016, 11:21:23 PM7/15/16

to DDD/CQRS

The argument regarding coupling between bounded contexts is persuasive. Thank you for taking the time to write an illustrative response.

I guess my perspective was influenced by the way I was designing my bounded contexts - instead of the "secondary" bounded context observing the events from the "primary", the primary would send commands to the secondary, essentially inversing the dependency.

Michael Ainsworth

unread,

Jul 15, 2016, 11:26:03 PM7/15/16

to DDD/CQRS

After more reading, I think the object oriented perspective is influencing me here. From a purely functional perspective, an "aggregate type" doesn't make much sense, but from an object oriented perspective, the "aggregate type" indicates the category/class of the object.

Reply all

Reply to author

Forward