EventStore partitioning - per aggregate type?

1,831 views
Skip to first unread message

Werner Clausen

unread,
Dec 22, 2011, 8:34:45 AM12/22/11
to ddd...@googlegroups.com
Hi,
 
I remember reading somwhere that an EventStore should be transactional per aggregate type (can't find the reference). What this suggests, is that you would have 1 table "Order_Events" that only holds orders. If we turn to file-based stores it would probably be even more partitioned with 1 file per aggregate. And if I turn to popular implementations like JOliviers "EventStore", it seems to either having all events in one single table, or splitting into having one aggregate-type per database.
 
There are 2 reasons for partitioning as I see it: The first is the technical reason; the more partitioned the less locking so to speak. The other reason is that you want to select from the EventStore per aggregate (e.g. "give me all orders"). But would you ever be interested in selecting events per aggregate type? I'm not even sure that is a valid reason.
 
Please give me your opinion on this matter. Perhaps you are using JOliviers EventStore and could share how you have partitioned your solution?
 
--
Werner
 

Greg Young

unread,
Dec 22, 2011, 8:38:13 AM12/22/11
to ddd...@googlegroups.com
you cant find this written anywhere because it shouldnt be that way.
aggregateid is the partition point. You should have one table for all
event types.

--
Le doute n'est pas une condition agréable, mais la certitude est absurde.

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 8:53:46 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 7:38 AM, Greg Young <gregor...@gmail.com> wrote:
you cant find this written anywhere because it shouldnt be that way.
aggregateid is the partition point. You should have one table for all
event types.

Are you saying it's unnecessary or wrong?

I currently have my events stored per aggregate type (one MongoDB collection per type) because it maps well to the repositories. While I know that it's not strictly necessary, it seems to work better and give me one less thing to select on (the aggregate type). Is there any reason I shouldn't be doing that?

Greg Young

unread,
Dec 22, 2011, 8:55:41 AM12/22/11
to ddd...@googlegroups.com
why would you need to select on aggregate type to get the events for a
given aggregate?

--

Rinat Abdullin

unread,
Dec 22, 2011, 8:57:34 AM12/22/11
to ddd...@googlegroups.com
Unnecessary, as long as your aggregate IDs are globally unique between all aggregate types.

Rinat

Werner Clausen

unread,
Dec 22, 2011, 8:58:12 AM12/22/11
to ddd...@googlegroups.com
Thanks Greg, that explains JOliviers implementation better.
And it is also better to grasp as a novice - just having one table for it all.
 
As a side note: Except for aggregate snapshotting, would (should) I ever come across the requirement to select per-aggregate from an EventStore (i.e. "give me all orders")? Not from a repository point of view, I know. But from IT/OPS (replaying events etc)?
 
--
Werner

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:08:04 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 7:55 AM, Greg Young <gregor...@gmail.com> wrote:
why would you need to select on aggregate type to get the events for a
given aggregate?

In cases where the event store acts as a queue, i.e. for replay. Probably for efficiency reasons, I've chosen to allow event listeners to segregate per AR, to avoid replaying unwanted events. 

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:08:47 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 7:57 AM, Rinat Abdullin <rinat.a...@gmail.com> wrote:
Unnecessary, as long as your aggregate IDs are globally unique between all aggregate types.

Not all replay is ID-specific.

Greg Young

unread,
Dec 22, 2011, 9:09:34 AM12/22/11
to ddd...@googlegroups.com
event handlers subscribe to AR type? :)

--

Greg Young

unread,
Dec 22, 2011, 9:17:05 AM12/22/11
to ddd...@googlegroups.com
In other words I have a handler Handles<SomethingHappened> you want to
replay this for only one type of aggregate?

Dan Normington

unread,
Dec 22, 2011, 9:21:06 AM12/22/11
to ddd...@googlegroups.com
Not sure if this is considered wrong, but I've created a different store per BC. The reason I do this is to eliminate processing of events during aggregate hydration that don't apply. Effectively I have aggregate Ids that span BC's and I don't want to load 10 events for hydration when I really only care about 2 or 3 of them hypothetically.

Greg Young

unread,
Dec 22, 2011, 9:22:19 AM12/22/11
to ddd...@googlegroups.com
There are pros and cons to this. It is a perfectly acceptable thing to do.

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:35:38 AM12/22/11
to ddd...@googlegroups.com
(If it's not clear already, I should preface that this is my first CQRS project and it's meant as a learning experience for me, so please consider it as such.)

I'm still working through this project, but I have partitioned my events by aggregate type, to more easily distinguish the origination of the event. 

Furthermore, it seems to me, without having a particular use case at hand, that in an event-centric architecture, there will be lots of event handlers that are only interested in a subset of the events. Given that assumption, it seems natural to split the events per aggregate type, if nothing else, to alleviate network traffic.

So while I understand that this may be strictly unnecessary, I hope I'm not shooting myself in the foot by this approach.

Rinat Abdullin

unread,
Dec 22, 2011, 9:36:50 AM12/22/11
to ddd...@googlegroups.com
Please differentiate between event stores that are used to "power" the write side of AR+ES and the stores that are used for other purposes.

AR+ES stores can be (and generally are) partitioned by aggregate IDs. The are used only for rebuilding these aggregates from their history.

For rebuilding views (replaying events towards projections) I tend to use completely different event stores, with different partitioning schemas. For instance, consider this scenario:

system A: account management app with a web UI. It will have 

* stream-1 .... stream-N stores - used to persist AR+ES entities 1...N (of different aggregate types)
* stream-domain-log - used to persist copy of all events from stream-1... stream-N PLUS all commands that went through. It is used for audit, ad-hoc reporting and replays for projections

system X: global dashboard for the entire company (aggregating multiple systems A, B, C etc):

* stream-from-A -  stream of events that have been delivered from system A
* stream-from-B - stream of events from B

In short, not all types of replays have to happen upon the same store with the same partitioning schema (this will just make things more complicated on a larger scale)

Best,
Rinat

Rinat Abdullin

unread,
Dec 22, 2011, 9:38:21 AM12/22/11
to ddd...@googlegroups.com
 it seems natural to split the events per aggregate type, if nothing else, to alleviate network traffic.

I suggest to delay decisions upon this optimization, till you really hit performance problems here.

Best,
Rinat

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:44:35 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 8:36 AM, Rinat Abdullin <rinat.a...@gmail.com> wrote:
Please differentiate between event stores that are used to "power" the write side of AR+ES and the stores that are used for other purposes.

AR+ES stores can be (and generally are) partitioned by aggregate IDs. The are used only for rebuilding these aggregates from their history.

For rebuilding views (replaying events towards projections) I tend to use completely different event stores, with different partitioning schemas. For instance, consider this scenario:

system A: account management app with a web UI. It will have 

* stream-1 .... stream-N stores - used to persist AR+ES entities 1...N (of different aggregate types)
* stream-domain-log - used to persist copy of all events from stream-1... stream-N PLUS all commands that went through. It is used for audit, ad-hoc reporting and replays for projections

system X: global dashboard for the entire company (aggregating multiple systems A, B, C etc):

* stream-from-A -  stream of events that have been delivered from system A
* stream-from-B - stream of events from B

In short, not all types of replays have to happen upon the same store with the same partitioning schema (this will just make things more complicated on a larger scale)

This is an interesting perspective, not one I've seen before. Seems a bit overkill for the normal isolated project, but I can see how it might be needed in a more mature organization.

Rinat Abdullin

unread,
Dec 22, 2011, 9:48:33 AM12/22/11
to ddd...@googlegroups.com
Actually I'm using this approach even for small projects that are supposed to be isolated from the rest of the world.

Store "specialization" really pays off in the simplicity of building blocks and the overall solution (plus the maintenance story).

Although I've seen developers shrug at the notion of duplicating data between different streams ("as if CQRS duplication was not enough"))

Best,
Rinat

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:49:24 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 8:38 AM, Rinat Abdullin <rinat.a...@gmail.com> wrote:
 it seems natural to split the events per aggregate type, if nothing else, to alleviate network traffic.

I suggest to delay decisions upon this optimization, till you really hit performance problems here.

That's normally prudent advice, however I'm not sure it fits in my case. While I did write "if nothing else", the real reason I did it was to partition the events by aggregate type, so it just came natural to use separate event stores. I'm prepared to change that if someone can give me a good reason why this will be problematic.

Nils Kilden-Pedersen

unread,
Dec 22, 2011, 9:55:04 AM12/22/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 8:48 AM, Rinat Abdullin <rinat.a...@gmail.com> wrote:
Actually I'm using this approach even for small projects that are supposed to be isolated from the rest of the world.

Store "specialization" really pays off in the simplicity of building blocks and the overall solution (plus the maintenance story).

Although I've seen developers shrug at the notion of duplicating data between different streams ("as if CQRS duplication was not enough"))

Yeah, those were my thoughts as well :-)

Seriously though, while I understand that there's potential value in storing more information (commands), I fail to see why you need other event stores for audit/reporting/projection. Doesn't that just mean that your AR rebuild-events are not rich enough?

Rinat Abdullin

unread,
Dec 22, 2011, 10:09:07 AM12/22/11
to ddd...@googlegroups.com
Reasons:

1. My stores are dead-simple and massively scalable (append-only files/blobs, one file for stream). I don't want to bother with writing logic to enumerate all these files, read events, somehow order them by time. This becomes even more important when you are dealing with cloud storage (where latency could be the killing factor). Besides, this way there is no need to bother with any databases (even NOSQL ones)

2. in the distributed world not all commands/events belong to AR+ES entities. Some come from stateless services (number crunching, email sending, 3rd party systems integration etc). Imagine a case where you have 10-100 servers consuming **stateless** work load in parallel from 3 queues. Their messages normally would not be available for audit (because they do not belong to any single AR+ES entity) unless they are aggregated and appended to the domain log.

3. AR+ES event streams store events in logical envelopes (one envelope per transaction, along with contextual logs and causality information). Domain logs not only store events individually (with additional out-of-band-data) they also keep all sorts of commands and system information (i.e. empty envelopes with configuration changes in headers). From practical standpoint it is much easier, when these logical models are isolated to separate stores.

Best,
Rinat

Werner Clausen

unread,
Dec 22, 2011, 11:06:08 AM12/22/11
to ddd...@googlegroups.com
@Rinat,


"AR+ES stores can be (and generally are) partitioned by aggregate IDs"

How do you define "partitioned" here?

Werner

Rinat Abdullin

unread,
Dec 22, 2011, 11:19:34 AM12/22/11
to ddd...@googlegroups.com
As in "decoupled", "isolated", belonging to different consistency boundaries. 

Data from different partitions can never be part of the same transaction. And you can't directly access data from different partitions in one operation. Here we are assuming that it would not matter to us if different aggregates are stored on the same box or in data centers on the opposite sides of Earth.

Pat Helland can describe partitioning and distributed systems much better than I: http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

NB: his "entities" map to "ARs".

Best,
Rinat

Werner Clausen

unread,
Dec 22, 2011, 11:58:43 AM12/22/11
to ddd...@googlegroups.com
@Rinat,

Yes that is what I thought. And I understand this - especially when you are dealing with file-based ES. And Greg says the same: "aggregateid is the partition point. You should have one table for all event types". You are saying the same here right?

But how does that translate to an EventStore in an SQL server? I'm confused as to how many tables I have here...please examplify.

Werner

Rinat Abdullin

unread,
Dec 22, 2011, 12:31:32 PM12/22/11
to ddd...@googlegroups.com
Storing events on a SQL server just adds an additional layer of complexity and brings constraints inherent to relational databases.

If you really have to store events in SQL, then the simplest approach is to store all events in the same table and make sure our reads are more or less efficient by having an appropriate index (by aggregate ID, ordered by version ASC). Putting AR events into more than one table, does not seem to bring any real benefit.

In short: Aggregate event streams - always stored and accessed by their ID. And:
  • When stored in files - one file per aggregate ID
  • When stored in SQL - it's easier to have all event streams go to the same table. Index is optimized for reads by aggregate ID.
Best,
Rinat

andho

unread,
Dec 22, 2011, 12:56:35 PM12/22/11
to ddd...@googlegroups.com
How safe is it to use files for the most important data (the single source of truth) of your domain? Can it be used to create a crash only design.

Rinat Abdullin

unread,
Dec 22, 2011, 1:10:19 PM12/22/11
to ddd...@googlegroups.com
Event sourcing with files is more reliable than SQL server because:
  • SQL server stores data in files as well (hence by default it is LESS reliable than file system)
  • there are numerous reliability and redundancy patterns that allow to create systems that can survive numerous crashes easily.
For example, you can consider event as written to event store (committed), only after it has been written to 2 slave stores in different availability zones. This adds a little bit of latency, but creates slaves available for immediate failover (and generally requires just a few lines of code). Compare this to SQL Server Replication or Microsoft Sync Framework...

Alternatively, since all files are just append-only structures, you can easily replicate them to dozens of datacenters around the globe with a second delay (delay depends on your hardware capabilities).

Same is with data corruption, self-healing and distribution strategies.

Best,
Rinat 

Werner Clausen

unread,
Dec 22, 2011, 1:29:59 PM12/22/11
to ddd...@googlegroups.com
Understood. Thanks for your patience :)

Werner


Tom Janssens

unread,
Dec 22, 2011, 2:14:30 PM12/22/11
to ddd...@googlegroups.com
+1

Simplicity is key here. Since storage is cheap and I usually do not have a lot of events I duplicate them a lot, to provide simpler/faster code

My current approach is:
- envelope the event
- extract relevant data from the event (all possible AR references/Id's) into the headers of the envelope
- write the enveloped event to a global event file using protobuf.Net
- write events to files partitioned by and named from header key and value using protobuf.Net

The result looks like this:

01/12/2011  17:40       321.352.624 GLOBAL.msgs
01/12/2011  17:40        77.779.616 HEADER_TaskId_Tasks.1.msgs
01/12/2011  17:40        86.990.360 HEADER_TaskId_Tasks.2.msgs
01/12/2011  17:40        81.873.280 HEADER_TaskId_Tasks.3.msgs
01/12/2011  17:40        74.709.368 HEADER_TaskId_Tasks.4.msgs
               5 bestand(en)      642.705.248 bytes

(example gist here: https://gist.github.com/1418094)

Deserializing all events for a single AR could not be more simple: 
- determine file name as "HEADER_[AR type]+"Id"+"_"+[AR id]+".msgs"
- open stream
- deserialize using protobuf and apply them to the AR.

This is IMO the simplest thing that can possibly work; it adds some overhead on the storage part, but it allows you to attach events to new AR's as long as the relevant id is in there; just move the handler from one AR type to another AR type, and you are good to go.

Greg Young

unread,
Dec 22, 2011, 4:10:37 PM12/22/11
to ddd...@googlegroups.com

There is a long document and discussion about this on dddcqrs.com

Werner Clausen

unread,
Dec 23, 2011, 5:35:38 AM12/23/11
to ddd...@googlegroups.com
 
"There is a long document and discussion about this on dddcqrs.com"
 
In the document that describes how to build an EventStore, they have a table specifically to hold the AggregateType. Wouldn't it be better to denormalize the Event table and include the "AggregateType" in that? Overhead in INSERT (possible two tables) and JOIN when reading just doesn't fit imo. In fact, this dilemma (overhead of insert+joins OR denormalized event table) is one of the reasons I found the "1 table per AggregateType" valid to begin with...
 
How do you guys see that?
 
Werner

Greg Young

unread,
Dec 23, 2011, 5:45:35 AM12/23/11
to ddd...@googlegroups.com

What join? On reading?

I think it specifies in document type is only used for admin purposes. I went back and forth on whether to not include it to avoid such confusions.

Werner Clausen

unread,
Dec 23, 2011, 6:35:52 AM12/23/11
to ddd...@googlegroups.com
Well the article/document talks about "for debugging" which I've missed so it seems. Thanks for clarification.

 
 
 
 
 

Nuno Lopes

unread,
Dec 23, 2011, 7:27:27 AM12/23/11
to ddd...@googlegroups.com
+1 

Nils Kilden-Pedersen

unread,
Dec 26, 2011, 1:36:21 PM12/26/11
to ddd...@googlegroups.com
On Thu, Dec 22, 2011 at 7:53 AM, Nils Kilden-Pedersen <nil...@gmail.com> wrote:
I currently have my events stored per aggregate type (one MongoDB collection per type) because it maps well to the repositories.

I've decided to back out of this and use a single event store. The reason being the comments made in this thread made me rethink the reasons and while I still think there's value in limiting the number of messages on replay, I think it's more uncommon than what I originally thought. Furthermore, the separation was originally also driven by use of sequential ids, which has since been replaced by guids.
 

Reply all
Reply to author
Forward
0 new messages