Event log vs event streams

1,115 views
Skip to first unread message

sherifsa...@gmail.com

unread,
Oct 31, 2014, 10:49:55 PM10/31/14
to ddd...@googlegroups.com
 In Rinat Abdullin's article about Event Sourcing - Projections

Quite often you would want to change your projections or add completely new ones to the system. Obviously we would want to go back in time and make everything look like these changes were there since the beginning of time.
 
This is where replaying events come into play.
 
In order to be able to do that, we should set up our system to record all passing events into a separate event log for the domain. This event log is completely separate from aggregate event streams (should these be used in the system). It’s sole purpose is to simplify event replays for projections.

Is this always true? Should I always keep 2 copies of any event in the event store?

Isn't the event stored in the aggregate's stream enough to construct a projection by replaying the stream the same way it would be replayed to construct the aggregate itself?

Rinat Abdullin

unread,
Nov 1, 2014, 12:26:48 AM11/1/14
to ddd...@googlegroups.com
I wrote that when I didn't really know what I was doing :)

If your event storage supports efficient replay of all events across aggregates, then you don't need a separate global event stream. Some stores like geteventstore.com and most databases even persist their changes as a global event stream by default.
--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Rinat Abdullin | Writer at Abdullin.com

sherifsa...@gmail.com

unread,
Nov 1, 2014, 11:23:36 AM11/1/14
to ddd...@googlegroups.com
Hi Rinat, 

Sorry I'm still trying to learn how an event store should be implemented and my only source is the blog posts scattered around the web.

To follow on my questions, if I have an event store backed by a MySql database what should the tables be in that database?

Let's say we have 3 aggregates Customer,Supplier and Order

Are the events from the 3 aggregates stored in a single table? or does each aggregate hold its events in a different table?
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Greg Young

unread,
Nov 1, 2014, 12:04:19 PM11/1/14
to ddd...@googlegroups.com
Same table with event data stored as blob


On Saturday, November 1, 2014, <sherifsa...@gmail.com> wrote:
Hi Rinat, 

Sorry I'm still trying to learn how an event store should be implemented and my only source is the blog posts scattered around the web.

To follow on my questions, if I have an event store backed by a MySql database what should the tables be in that database?

Let's say we have 3 aggregates Customer,Supplier and Order

Are the events from the 3 aggregates stored in a single table? or does each aggregate hold its events in a different table?

On Saturday, November 1, 2014 6:26:48 AM UTC+2, Rinat Abdullin wrote:
I wrote that when I didn't really know what I was doing :)

If your event storage supports efficient replay of all events across aggregates, then you don't need a separate global event stream. Some stores like geteventstore.com and most databases even persist their changes as a global event stream by default.

On Saturday, November 1, 2014, <sherifsa...@gmail.com> wrote:
 In Rinat Abdullin's article about Event Sourcing - Projections

Quite often you would want to change your projections or add completely new ones to the system. Obviously we would want to go back in time and make everything look like these changes were there since the beginning of time.
 
This is where replaying events come into play.
 
In order to be able to do that, we should set up our system to record all passing events into a separate event log for the domain. This event log is completely separate from aggregate event streams (should these be used in the system). It’s sole purpose is to simplify event replays for projections.

Is this always true? Should I always keep 2 copies of any event in the event store?

Isn't the event stored in the aggregate's stream enough to construct a projection by replaying the stream the same way it would be replayed to construct the aggregate itself?

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Rinat Abdullin | Writer at Abdullin.com

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Studying for the Turing test

sherifsa...@gmail.com

unread,
Nov 1, 2014, 12:32:19 PM11/1/14
to ddd...@googlegroups.com
So a single table will hold all events for all aggregates in all bounded contexts?! That sounds a little bit scary :)

I believe the next logical step will be to shard that table, right? Some of our aggregates produce millions of events.

If each aggregate type was stored in its separate table then I would have sharded by the aggregate id only. 
However, since all aggregates are stored in the same table, do I have to shard by the aggregate type in addition to the aggregate id?

Arjen Smits

unread,
Nov 1, 2014, 2:30:34 PM11/1/14
to ddd...@googlegroups.com
In ES the eventlog (if you will) is a singular append-only data stream of events in time. You don't want each aggregate type to be stored in its own table. Because replaying events to build new projections would become difficult. Since you want to replay your events in the same order as they are recorded.
Besides an EventStore is generally oblivious to the type of aggregates that are being stored. Because your not storing aggregates, your storing events. And its the sum of those events that make up an aggregate.

How many million events are you expecting in the short term? A relatively simple SQL Server instance can handle 100-200 million (and more) records/events with 2 fingers in its nose. Your only constraint is memory and disk-space.

However it would be interesting to hear how people are managing _large_ eventstores.

Op zaterdag 1 november 2014 17:32:19 UTC+1 schreef sherifsa...@gmail.com:

Greg Young

unread,
Nov 2, 2014, 4:23:45 AM11/2/14
to ddd...@googlegroups.com
Conceptually you have a log per aggregate instance this allows sharding. Instead of a full linearization you do linearization per aggregate instance.

Cheers,

Greg 
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ben Kloosterman

unread,
Nov 2, 2014, 5:31:28 AM11/2/14
to ddd...@googlegroups.com
Remember SQL is relational and your not really using that which is why SQL is a poor fit and you can do much better with a sequential file , no sql or custom store ( especially if you load all or most of your write domain into memory)  . Those millions of events should be a sequential disk read if that is the case than there is no reason to shard  until you reach massive scale .  LMAX is a good example they are event sourcing and almost CQRS  , they store event data in files and it does up to 6M transactions per second on a standard i7 PC mainly by ensuring sequential writes and no contention in the domain .

Think very careful on SQL with Guids as the primary key and even more so with entity framework. By default you get a GUID for your clustered index which is very bad  if your dealing with lots of small aggregates . Make sure your fill factor is set correctly .. unlike sql  the read to write ratio is normally much lower  in CQRS for the write domain and higher for the read domain. This has massive implications.

Ideally what you want is a table or file per aggregate ( and a global index of the events)  but that is not practical I would only do table per type it if it is completely automatic (including db schema changes and rollout).  A Type/Guid clustered key will give you something close to that representation anyway. . 

"If each aggregate type was stored in its separate table then I would have sharded by the aggregate id only. "

I would not do this , the costs of managing 1 table per type and your read model would push me to a more DDD style system. If your at the scale where you need sharding then i would NOT use SQL  for the write model , either files , event stores or noSQL. However the write domain can normally handle a lot more than standard sql before this is needed.


Ben 

To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

Greg Young

unread,
Nov 2, 2014, 5:33:38 AM11/2/14
to ddd...@googlegroups.com
Just as a slight correction disruptor was processing 6m messages/second in memory not disk based. Without super specialized hardware you will have a hard time getting an i7 to do 6m durable tps ;) 

sherifsa...@gmail.com

unread,
Nov 2, 2014, 5:53:29 AM11/2/14
to ddd...@googlegroups.com
I thought a concrete example may be good for the discussion. Here is the events log table which is used by Axon framework to store the events:

create table DomainEventEntry (
    aggregateIdentifier varchar2(255) not null,
    sequenceNumber number(19,0) not null,
    type varchar2(255) not null,  --aggregate class name
    eventIdentifier varchar2(255) not null,
    metaData blob,   
    payload blob not null, -- details
    payloadRevision varchar2(255),
    payloadType varchar2(255) not null, --event class name
    timeStamp varchar2(255) not null
);

alter table DomainEventEntry
    add constraint PK_DomainEventEntry primary key (aggregateIdentifier, sequenceNumber, type);
How would you approach sharding this table? I know I'm insisting of the sharding approach because it is a concern for the management and our architects, so I need a concrete approach if I'm to convince them to adapt Event Sourcing.

For example, one of our systems is used to track prices for different book ISBNs for different book sellers. The data is so massive that the team decided to use a database for each book seller. Telling them to forget about the 150 database servers and use a single database with a single table is a long shot.

what would be your approach?

Greg Young

unread,
Nov 2, 2014, 5:56:01 AM11/2/14
to ddd...@googlegroups.com
You have a column there aggregate identifier

That is your partition points you partition by aggregate (though personally I would call it stream identifier not aggregate). 

You can then introduce a function streamid-->vnode however you want.

Cheers,

Greg
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ben Kloosterman

unread,
Nov 2, 2014, 7:33:08 AM11/2/14
to ddd...@googlegroups.com
I do believe they streamed all events to a file as they occurred which  they read at start . Yes almost certainly they used a high end SAN .

If you use small binary event messages or proto buffers not heavy xml  I think you would get pretty impressive results. For 100 bytes an event  it would require a write speed of  600 MB / second which is beyond most cheap SSD raids  but quite possible on an i7. Single high end SSD can do 400 or so , so id say 300MB is possible .  So as it was a few years ago Lmax almost certainly hooked the i7 up to a SAN ( still technically an i7  but...)   . Also  ITCH and order data can often be very small 20 - 24 byte events so events based on it and packed in a similar manner allows a  big reduction for a  good percentage of the messages. If 50% of message are 30 bytes your looking at 65 * 6M = 390MB which is achievable on  mid range hw . If you have 1K Xml messages you need 6G / second write  - good luck getting that without breaking things into types - sharding / clustering  doesnt help much because everything is the latest row. When you get huge transaction counts its nearly always 1 or 2 things that are hot.   This re-minds me i had meant to ask peoples opinions on Binary vs Json vs protocol buffers vs no namespace xml vs xml which i will raise in another thread. 

Note you could stream different types to multiple files however to replay you have to merge or more likely have a complicated multiple read  , this maybe what they did because their replays took so long that they ran 2  live as the startup cost was too high.  

The biggest issue  is  the fact that the persist stream is fully asynchronous so getting errors back to the source is a pain. 

Regards, 

Ben

Greg Young

unread,
Nov 2, 2014, 7:38:43 AM11/2/14
to ddd...@googlegroups.com
Only 6m messages/second claim I saw from them were on disruptor and
had nothing to do with disk.

TBF they likely would not accept the latency disk involved anyways.

Ben Kloosterman

unread,
Nov 2, 2014, 7:45:46 AM11/2/14
to ddd...@googlegroups.com
"How would you approach sharding this table? I know I'm insisting of the sharding approach because it is a concern for the management and our architects, so I need a concrete approach if I'm to convince them to adapt Event Sourcing.

For example, one of our systems is used to track prices for different book ISBNs for different book sellers. The data is so massive that the team decided to use a database for each book seller. Telling them to forget about the 150 database servers and use a single database with a single table is a long shot."

I think you should try in a pro-type with real data  the DB management savings could justify the whole project.  
The real need for sharding is because in standard SQL you have reads and writes mixed in an unholy mess by different developers , creating contention which kills performance ( its the contention not the load / amount that does this) . I see sql die time and time again even under low load due to contention.  

With CQRS your write model  really does the following.
Write immutable data to the end eg add event to type,
Read all events for a type which is likely to be heavily cached, 

And in some cases the Read all events disappears due to heavy caching or memory write models . Remember your write model /events only has events that affect logic . Most string fields disappears.   For your domain with huge data from the tid bits i have seen its likely that the write model is likely to be stressed.

Ben
  

To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

sherifsa...@gmail.com

unread,
Nov 2, 2014, 8:02:51 AM11/2/14
to ddd...@googlegroups.com
For your domain with huge data from the tid bits i have seen its likely that the write model is likely to be stressed.

Did you mean "...it's unlikely to be stressed"?!

Greg Young

unread,
Nov 2, 2014, 8:05:43 AM11/2/14
to ddd...@googlegroups.com
btw:

"For example, one of our systems is used to track prices for different
book ISBNs for different book sellers. The data is so massive that the
team decided to use a database for each book seller. Telling them to
forget about the 150 database servers and use a single database with a
single table is a long shot."

How many reqs read/write / second are you looking at that you need 150
databases?
>>>>>>>> send an email to dddcqrs+u...@googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Rinat Abdullin | Writer at Abdullin.com
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "DDD/CQRS" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an email to dddcqrs+u...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Studying for the Turing test
>>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "DDD/CQRS" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to dddcqrs+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Studying for the Turing test
>>
> --
> You received this message because you are subscribed to the Google Groups
> "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dddcqrs+u...@googlegroups.com.

Kirill Wedens

unread,
Nov 2, 2014, 8:25:26 AM11/2/14
to ddd...@googlegroups.com
But how to replay multi-aggregate projections with linearization per AR?

Greg Young

unread,
Nov 2, 2014, 8:29:39 AM11/2/14
to ddd...@googlegroups.com
There are different ways of doing that. One assumption that goes out
the window is that you have global ordering. The question is therefore
how do you get a deterministic ordering?
>>>>>>>> send an email to dddcqrs+u...@googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Rinat Abdullin | Writer at Abdullin.com
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "DDD/CQRS" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an email to dddcqrs+u...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Studying for the Turing test
>>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "DDD/CQRS" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to dddcqrs+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Studying for the Turing test
>>
> --
> You received this message because you are subscribed to the Google Groups
> "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dddcqrs+u...@googlegroups.com.

Kirill Wedens

unread,
Nov 2, 2014, 8:46:21 AM11/2/14
to ddd...@googlegroups.com
What about ordering events by timestamp? But it'll require building some index (possible bottleneck, but better than incremental version).

Greg Young

unread,
Nov 2, 2014, 8:52:12 AM11/2/14
to ddd...@googlegroups.com
There are two cases you have to watch for. The first is a real time
subscribe thats pushing events two servers may have different
latencies etc. The second case is replays (which should be easily made
deterministic). There are lots of ways of providing ordering. The main
question is do you need ordering beyond causal relationships?

Ben Kloosterman

unread,
Nov 2, 2014, 8:56:28 AM11/2/14
to ddd...@googlegroups.com
Yes  oops

Ben Kloosterman

unread,
Nov 2, 2014, 9:07:40 AM11/2/14
to ddd...@googlegroups.com
Yes i checked and you are correct .

They do talk about their files on Lmax  but their the goal was 100K sustained at < 1ms. 

Still since its just asynch stream writing events  its hard to call it a "transaction" - certainly not ACID  ..its basically just running of memory and the stream is in the background for backup purposes ie the event gets added to the disruptor and another thread which does the disk just does a raw write likely streamed in bulk with 1000's of other events. Like eventual consistency /saga you need another mechanism to handle errors and tie them back to the command - ie till the disk stream notifies of success or failure .

I do think with such a system you could get millions of events per second to the disk provided they are small enough .. but handling failure is difficult. 

Regards, 

Ben 

Greg Young

unread,
Nov 2, 2014, 9:20:20 AM11/2/14
to ddd...@googlegroups.com
"I do think with such a system you could get millions of events per
second to the disk provided they are small enough .. but handling
failure is difficult."

Sure we could do it easily with one byte events :)

sherifsa...@gmail.com

unread,
Nov 2, 2014, 9:21:53 AM11/2/14
to ddd...@googlegroups.com
Currently we get about 20 million SQS messages every hour to process on each server. Since we only keep the current state for our prices the smallest database has 200 million records. If I start to track the history changes in prices my event log will grow gigantically.

Won't that affect the loading and saving of aggregates? I really wanna go the Event sourcing way, but I don't now how I can sell it to the management without tackling:

  1. The size of the event log table.
  2. The loading and saving of aggregates in acceptable time. 
I'm a sharding noob btw, so I might be tackling the wrong problem here.

Ben Kloosterman

unread,
Nov 2, 2014, 9:33:49 AM11/2/14
to ddd...@googlegroups.com
Thinking something useful like 20-30 byte  ITCH events

Ben

Greg Young

unread,
Nov 2, 2014, 9:36:36 AM11/2/14
to ddd...@googlegroups.com
is there value in anything outside of current state?

Ben Kloosterman

unread,
Nov 2, 2014, 9:42:40 AM11/2/14
to ddd...@googlegroups.com
On Mon, Nov 3, 2014 at 1:21 AM, <sherifsa...@gmail.com> wrote:
Currently we get about 20 million SQS messages every hour to process on each server. Since we only keep the current state for our prices the smallest database has 200 million records. If I start to track the history changes in prices my event log will grow gigantically.

What do these messages DO ..do they cause queries or  do they change the logic..  There is a massive difference here if 99% are queries for information that get returned and 1% are price changes   then your write domain  is doing 200,000 per hour which is 55 per second ie trivial.  If they are 99% writes then you have a different problem and CQRS may not be the answer and you certainly would not want event sourcing. 

If its typical ( which no app is) the break down is 10:1 550 , messages per second on the write domain is fine and would not justify sharding or multiple dbs for the write domain . We need to know this before going further. 

Regards, 

ben

sherifsa...@gmail.com

unread,
Nov 2, 2014, 10:47:17 AM11/2/14
to ddd...@googlegroups.com
Currently the application itself doesn't care about the old prices, but there has been a need to generate some reports that display some trends in the price changes.

sherifsa...@gmail.com

unread,
Nov 2, 2014, 10:51:17 AM11/2/14
to ddd...@googlegroups.com
These are write events. We update the the prices for each ISBN according to the incoming feed. As I said the previous post there is currently no apparent value in keeping the prices changes, but there has been a growing need to generate reports based on the trends in prices changes.

Greg Young

unread,
Nov 2, 2014, 10:55:07 AM11/2/14
to ddd...@googlegroups.com
this doesn't sound event sourced, it sounds like you only need the
latest price. You might want to log events up to some number but it
sounds like you really only care about the latest value from an price
perspective (I can't even see what the aggregate would actually be
beyond some lookup data and a current price?)

Robert Friberg

unread,
Nov 3, 2014, 4:37:35 PM11/3/14
to ddd...@googlegroups.com
ACID transactions are fully possible using batching and pipelining. Async or not doesn't matter as long as you sync at some point before sending an ack to the client. This adds latency (longer pipeline) without really affecting throughput which ultimately is i/o bound.

In my naive experiments with akka.net and ES3 I'm seeing near 50k acid tps and a bit lower using the TPL dataflow library. Best throughput is when the EventData byte array size is about 32k. 
The latency graph looks like this: |\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\
Average latency is 0.02 ms but worst case (first transaction of each batch is 16 ms) 300 batches/second with 160 events/batch and and event byte array size of 200

Test environment is my laptop with an SSD, running approximately this code:

but using this IJournalWriter implementation:

Rob the Slob

Shawn Hinsey

unread,
Nov 3, 2014, 8:12:47 PM11/3/14
to ddd...@googlegroups.com
Have you looked at StreamInsight? I think the sort of price trend analysis you're talking about is a focus.

--
Reply all
Reply to author
Forward
0 new messages