Cassandra + kafka for event sourcing

2,721 views
Skip to first unread message

Hsen Monzer

unread,
Feb 21, 2016, 4:36:48 AM2/21/16
to DDD/CQRS
I've been reading for a while about event sourcing and CQRs and was trying to find posts/readings about using cassandra as my event store and kafka as the queue to publish events. most event sourcing posts refer to eventStore and don't give a clear discussion about cassandra+kafka combination.

Does anyone know good reading material regarding that stack? Or even answer advantages/disadvantages of using that combination? It seems cassandra is considered for read part of cqrs but no examples/data models about using it for persisting event sourced aggregates

Greg Young

unread,
Feb 21, 2016, 5:42:25 AM2/21/16
to ddd...@googlegroups.com
There is an event store library built on top of cassandra that I have
seen before (don't remember which on)

On Sun, Feb 21, 2016 at 11:36 AM, Hsen Monzer <hsen....@gmail.com> wrote:
> I've been reading for a while about event sourcing and CQRs and was trying to find posts/readings about using cassandra as my event store and kafka as the queue to publish events. most event sourcing posts refer to eventStore and don't give a clear discussion about cassandra+kafka combination.
>
> Does anyone know good reading material regarding that stack? Or even answer advantages/disadvantages of using that combination? It seems cassandra is considered for read part of cqrs but no examples/data models about using it for persisting event sourced aggregates
>
> --
> You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Studying for the Turing test

Jo Geraerts

unread,
Feb 22, 2016, 5:09:07 AM2/22/16
to DDD/CQRS
I think the axon framework has one. 

Op zondag 21 februari 2016 11:42:25 UTC+1 schreef Greg Young:

Hsen Monzer

unread,
Feb 22, 2016, 8:32:33 AM2/22/16
to DDD/CQRS
 Do we really need a library?
Can't it be a simple implementation of saving aggregates as rows (each column as an event) and then having kafka as the messaging system?
When adding events to cassandra, they get published using kafka. 
I'm assuming that with the high availability of both kafka and cassandra, it should be safe to consider saving events and publishing them as atomic.

Greg Young

unread,
Feb 22, 2016, 8:35:10 AM2/22/16
to ddd...@googlegroups.com
There are some more details...

As an example providing consistency, handling replays etc. Using
cassandra and kafka sounds like a spectacularly bad idea (how will you
replay a projection?)

Hsen Monzer

unread,
Feb 22, 2016, 9:42:54 AM2/22/16
to DDD/CQRS
Indeed, but maybe i got a wrong data model on Cassandra.
I implemented, ( i think i implemented), a small CQRS + ES service using MySQL and  just a table of events and rabbitMQ as messaging system
would it make sense in that case? and if using cassandra to store the events is a terrible idea, how akka and axon implemented event stores plugins for it?

Greg Young

unread,
Feb 22, 2016, 9:44:04 AM2/22/16
to ddd...@googlegroups.com
I didn't say anything at all about cass being a bad idea, I said using
both is a very bad idea. You will have the same issues with mysql and
rabbitmq. How to replay a projection/add a new one?

Hsen Monzer

unread,
Feb 22, 2016, 9:55:59 AM2/22/16
to DDD/CQRS
I always thought of this way of replaying projections as writing a process to re-run over the events in the store and populate the new read model.
I saw some posts where people are trying to replace EventStore with Kafka, and i thought that they have the same problem as well.

Greg Young

unread,
Feb 22, 2016, 10:02:32 AM2/22/16
to ddd...@googlegroups.com
OK Now do that with the system running + your "bus" (rabbitmq/kafka)
and you are suddenly writing some tricky code (need to synchronize
between them especially since you don't have assured ordering!).


The Cass ones work pretty well. Generally though they are poll based
(for clients). A secondary issue you will run into (there is a +- 100
post thread on the akka list about this) is that you essentially want
multiple indexes. When loading an "aggregate" you want to read the
events for that one item. When writing a projection (say to neo4j) you
tend to want all events of a given type regardless of which
"aggregate" they came from.

Hsen Monzer

unread,
Feb 22, 2016, 10:15:10 AM2/22/16
to DDD/CQRS
ok, first of all, thank you for being patient!!
What about using Cassandra to store events per aggregate and kafka's log structure can be used for writing new projections (assuming we retain the logs indefinitely in kafka)

Hsen Monzer

unread,
Feb 22, 2016, 1:52:05 PM2/22/16
to DDD/CQRS
I've been thinking about this over and over and would appreciate some feedback:
1- compared to CQRS without event sourcing, it should be ok to have 2pc while persisting the event to cassandra and publishing it over kafka. Handling failures in this 2pc should be easy considering it's the same event being dropped in 2 places; at max it should resolve to duplicate events which is ok
2- projections can be handled using apache storm or any other franework
3- cant we maintain the secondary index (or any other form of ordered events across streams) eventually consistent to be used for building new read models from scratch?

Greg Young

unread,
Feb 22, 2016, 4:40:04 PM2/22/16
to ddd...@googlegroups.com
I am sure you can get any of these things "working" but many will be a
lot of work (like adding distributed transactions to kafka)

Have you ever seen the code of a projection? Using storm is kind of
like using a bulldozer to eat your breakfast cereal.

Sure you could do #3 providing you understand you just lost any
ordering assurances on your projections which will increase their
complexity.


It sounds to me like you are more focused on tools than the problem
you are solving.

Hsen Monzer

unread,
Feb 23, 2016, 2:30:27 AM2/23/16
to DDD/CQRS
Can you elaborate more on the addition of distributed transactions to kafka? why would it be a problem assuming we're accepting eventual consistency?
Side note: i'm not suggesting to use kafka as topic per aggregate, but more like topic per aggregate type (or even topic per service if we adopt microservices).

Greg Young

unread,
Feb 23, 2016, 5:10:21 AM2/23/16
to ddd...@googlegroups.com
"> 1- compared to CQRS without event sourcing, it should be ok to
have 2pc while persisting the event to cassandra and publishing it
over kafka. Handling failures in this 2pc should be easy considering
it's the same event being dropped in 2 places; at max it should
resolve to duplicate events which is ok"

Kafka doesn't support 2pc (two phase commit). I think you meant
something other that what you wrote

Ben Kloosterman

unread,
Feb 23, 2016, 5:32:26 AM2/23/16
to ddd...@googlegroups.com
and 2phase commit quickly becomes a liability / falls apart pretty quick in truelly distributed environment.

Ben

Hsen Monzer

unread,
Feb 23, 2016, 5:52:14 AM2/23/16
to DDD/CQRS
What i meant by 2pc is that the Repository class can implement the logic of 2pc by saving the event to the DB and then publishing it to kafka. In case saving to DB fails, nothing is published and add a retry mechanism for publishing to kafka if it fails...
The reason i raised CQRS as a comparison point, in a CQRS system that doesn't implement event sourcing, publishing domain events after saving the state is a 2pc but with a much trickier/harder failure handling.

Which makes me think: is implementing CQRS and having aggregates persisted as series of events not event sourcing because we lost the ability to implement projections?

Greg Young

unread,
Feb 23, 2016, 5:58:21 AM2/23/16
to ddd...@googlegroups.com
This is not "2pc" in any way shape or form.

What happens when your app is closed down between the time it is
saving to db and publishing to the queue?

Hsen Monzer

unread,
Feb 23, 2016, 6:20:53 AM2/23/16
to DDD/CQRS
I was hoping you can help on that :)
How is it done in cqrs-ES? Saving the state of the object and then publishing to a bus?

Greg Young

unread,
Feb 23, 2016, 6:22:01 AM2/23/16
to ddd...@googlegroups.com
In ES your db is a queue any "event store" can be used as both.

On Tue, Feb 23, 2016 at 1:20 PM, Hsen Monzer <hsen....@gmail.com> wrote:
> I was hoping you can help on that :)
> How is it done in cqrs-ES? Saving the state of the object and then publishing to a bus?
>

Hsen Monzer

unread,
Feb 23, 2016, 6:27:38 AM2/23/16
to DDD/CQRS
What about CQRS without ES?

Hsen Monzer

unread,
Feb 23, 2016, 9:26:33 AM2/23/16
to DDD/CQRS
What i'm missing and asking for help clarifying it is how come some companies are implementing CQRS (using cassandra for both read and write) and publishing domain events (since they're implementing a microservice architecture) in an atomic way?
I just don't like having my central DB used as event store and being polled by subscribers: messaging systems are built & optimized for pub/sub, why not benefit from them?

I'm trying to see how amazon, facebook, linkedin, nike... are implementing CQRS without using the EventStore and seem to be using Cassandra, kafka, .... and still succeeding in running at such a large scale.

On Tuesday, February 23, 2016 at 1:27:38 PM UTC+2, Hsen Monzer wrote:
What about CQRS without ES?

Greg Young

unread,
Feb 23, 2016, 9:30:30 AM2/23/16
to ddd...@googlegroups.com
What is your scale? How many events / second?

You make a bunch of trade offs at different levels of scale likely
involving increased complexity.

Hsen Monzer

unread,
Feb 23, 2016, 9:45:22 AM2/23/16
to DDD/CQRS
Regardless of my scale (my project is yet to start, so currently i can get enough with pen and paper :) )
My question is how can they persist to DB and publish events on such a large scale and do it in an "atomic" way without risking loosing data and 2pc and all the other problems we're mentioning here?

Greg Young

unread,
Feb 23, 2016, 10:01:04 AM2/23/16
to ddd...@googlegroups.com
Write to db then async publish by reading from said "db"

Hsen Monzer

unread,
Feb 24, 2016, 3:09:00 AM2/24/16
to DDD/CQRS
Any reading material to suggest on such implementation and pattern?
Appreciate the replies :), always learning!

Ben Kloosterman

unread,
Feb 24, 2016, 6:42:13 AM2/24/16
to ddd...@googlegroups.com
I spend a fair bit of time trying this and my recommendation is don't unless you have experience in both - go for a pure DDD system not CQRS if not using ES.  This is mainly for soft  reasons / dev behavior  , if you go non ES with CQRS you will often get a bastardized CRUD system with the penalties from both, Especially if you have little experience in CQRS,

Far better of grabbing Simple CQRS  use it for some domains and CRUD for others , focus on the business problems and see where it leads.. 

Ben

On Tue, Feb 23, 2016 at 10:27 PM, Hsen Monzer <hsen....@gmail.com> wrote:
What about CQRS without ES?

Hsen Monzer

unread,
Feb 24, 2016, 7:13:44 AM2/24/16
to DDD/CQRS
Actually i like ES and would probably implement it. The argument is just checking alternative implementations of the event store. What i'm thinking lf is having at first a simple mysql events table and have kafka connect to it and publish new events after every poll. I was asking for tutorials/readings about implementing CDC (assuming polling the events table and publishing the events is CDC)

Greg Young

unread,
Feb 24, 2016, 8:50:22 AM2/24/16
to ddd...@googlegroups.com
there are bunches of libraries for this already see
http://blog.langer.eu/2014/09/02/event-store-for-java.html

though using kafka as a transport as you describe runs into problems i
dicuss with a bus here https://www.youtube.com/watch?v=GbM1ghLeweU

On Wed, Feb 24, 2016 at 2:13 PM, Hsen Monzer <hsen....@gmail.com> wrote:
> Actually i like ES and would probably implement it. The argument is just checking alternative implementations of the event store. What i'm thinking lf is having at first a simple mysql events table and have kafka connect to it and publish new events after every poll. I was asking for tutorials/readings about implementing CDC (assuming polling the events table and publishing the events is CDC)
>
> --
> You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



Reply all
Reply to author
Forward
0 new messages