Eventsourced Data Sizing For Nosql DB

18 views
Skip to first unread message

Prakhyat Mallikarjun

unread,
Aug 15, 2014, 2:36:01 AM8/15/14
to events...@googlegroups.com
Hi,

Hi Team,

I an working on a solution involving eventsourcing and DDD/CQRS. The app is configured with cassandra journal plugin to source the events.Snapshots will also be stored in cassandra. Application is designed to have sharded single writers. These single writers will eventually write state to in memory datagrid. The state of the application is always maintained in in memroy data grid, this is to make the reads faster. 

The app has below layers,

Front End
    |
    |
Processing Layer
    |
    |
Persistence Layer
    |
    |
In memory Datagrid Layer
    |
    |
Cassandra Durable DB


Front end--> Takes the command requests from web
Processing Layer-->Process the commands and can also source the commands
Persistence Layer --> Sharded Single writer PersistentActor will persist event first into cassandra then will eventually update the domain state into in memory datagrid.

I accept the disks are very cheap. eventsourced/cqrs/DDD design requires to store commands(if required),store events,store snapshot, store domian state(read data and write data etc),  Don't you think we will end up storing lots and lots of objects? 

Tuning in data grid and cassandra....for durability, have to choose either replication/distribution/multiple copies etc. Further overhead of storing data and maintaining multiple copies. 

If the application is huge and highly OLTP with millions of transactions....data will grow in no time. Millions of transaction's means million of events and these needs to be saved. This storing will take up major disk space and will occupy space faster.

eventsourced/cqrs/DDD will lead to mammoth of data being saved. Planning on data sizing will end up requiring lot of disk space(including data and multiple copies for durability). Huge data means big big clusters.

What are your thoughts? Correct me if I am wrong.

-Prakhyat M M

Greg Young

unread,
Aug 15, 2014, 4:29:25 PM8/15/14
to events...@googlegroups.com
"If the application is huge and highly OLTP with millions of transactions....data will grow in no time. Millions of transaction's means million of events and these needs to be saved. This storing will take up major disk space and will occupy space faster.

eventsourced/cqrs/DDD will lead to mammoth of data being saved. Planning on data sizing will end up requiring lot of disk space(including data and multiple copies for durability). Huge data means big big clusters."

Millions of events * say 500 bytes/event = ?

How many events/second are you looking at? If its under a few thousand stop thinking about it.

"eventsourced/cqrs/DDD will lead to mammoth of data being saved."

Where are you in reference to moore's law in terms of your acquiring data is more important http://en.wikipedia.org/wiki/Mark_Kryder

"store domian state"

Why are you storing your domain state?


--
You received this message because you are subscribed to the Google Groups "Eventsourced User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eventsourced...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Studying for the Turing test

Prakhyat

unread,
Aug 16, 2014, 1:53:27 AM8/16/14
to events...@googlegroups.com, events...@googlegroups.com
Hi Greg,

Thanks.

How we create state?
Write side of event sourced will receive the events from client. These events will be handled to convert to domain state. 

Consider a domain object "Bank". We have designed 3 events fromcreated, fromedited and fromdeleted. From created will create first domain instance I.e "bank" object with some id and from edited event will edit domain object "bank" for given id. 

Why we store state always?
We always maintain state to make queries faster. Query side will always have the state ready for gets, reporting and searches.

We are highly oltp application. At read side we don't want to recreate state by querying events and rebuilding state every time.  Our queries for reporting will involve searching data for huge number of inter connected domain objects. 

As I understand reconstructing state from large set of events will take time. Also complexity will increase if the query involves huge number of domain objects and business specific reporting queries considering date duration, so we felt maintaing state is the right choice. 

-prakhyat m m

Sent from my iPhone
You received this message because you are subscribed to a topic in the Google Groups "Eventsourced User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eventsourced/Cx3-kZmKnf4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eventsourced...@googlegroups.com.

Greg Young

unread,
Aug 16, 2014, 7:45:09 AM8/16/14
to events...@googlegroups.com
But you mentioned read models separately from domain state as if they were two different things. For just events plus read model isn't this roughly the same data storage requirements as data plus audit table?

Prakhyat

unread,
Aug 16, 2014, 8:13:34 AM8/16/14
to events...@googlegroups.com, events...@googlegroups.com
Greg,

You are perfectly correct in observations.

We are in early stages of adapting event sourced and cqrs.

Currently the read and write model are same maintained in inmemory data grid. Later enhancement we are planning to replicate state to other data sources for read model and querying. Still we are in discussion stages.

The objective is maintain ready state for business specific querying/searches/complex reporting. 

We are expecting 5 to 10 tera bytes of data/state. Just imagine from querying/searching/complex reporting perspective if we are depending on events to recreate state. It will be complex.

But still one read/write source plus additional events is huge data. 

-prakhyat m m
Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages