Kafka journal

405 views
Skip to first unread message

Richard Rodseth

unread,
Jul 12, 2014, 9:35:48 AM7/12/14
to akka...@googlegroups.com
I saw a tweet from Martin Krasser that he was working on an Akka Persistence journal plug-in for Kafka. This puzzled me a bit since Kafka messages are "durable" rather than "persistent" - they are stored for a configurable time.

Could anyone comment on a typical usage? Assuming that your persistent actor is going to get recovered before the Kafka topic expires seems odd.

While the Akka/Kafka combination seems great, I always pictured it would just involve ordinary actors playing the role of Kafka producers and consumers.

Thomas Lockney

unread,
Jul 12, 2014, 11:52:38 PM7/12/14
to akka...@googlegroups.com
Hi Richard,

I saw that post of Martin's and even replied, as I'm very interested in this approach. One thing to keep in mind is that conceptually speaking the difference between durable and persistent is really not that huge. There is simply an assumption that something that's durable will eventually be disposed of, while something that is persistent will live forever. But how often is that later case really true? 

Further, given that this approach to persistence is built on the event sourcing model and the idea is to be able to replay a sequence of events into the mailbox of an actor, this is really just an (heavily) enhanced version of a durable mailbox. I've read Jay Kreps and others talk about setting retention times on the order of weeks or months for Kafka topics, so really I don't see that there's a huge discrepancy here. You can even see some evidence of similar in this excellent post: http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html

In the end, what we're talking about in both the persistent and durable cases are just long-term event logs -- the question is just what you consider long-term and which arbitrary time length you assign to the meaning of each. Personally, I'm excited about this possibility as it would allow reuse of an efficient storage mechanism for both event-sourced data and streamed event processing



--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Martin Krasser

unread,
Jul 13, 2014, 3:51:12 AM7/13/14
to akka...@googlegroups.com
Hi Richard,

when using the Kafka journal with default/typical retention times, your application is responsible for storing snapshots at intervals that are significantly smaller than the retention time (for example, with a retention time of 7 days, you may want to take snapshots of your persistent actors every 3 days or so). Alternatively, configure Kafka to keep messages "forever" (i.e. set the retention time to the maximum value) if needed. I don't go into Kafka partitioning details here but it is possible to implement the journal driver in a way that both a single persistent actor's data are partitioned *and* kept in order. However, with the initial implementation, all data for a single persistent actor must fit on a single Kafka node (different persistent actors are of course distributed over a Kafka cluster). Hence, deleting old data after a few weeks and taking snapshots at regular interval is the way to go (which is good enough for many applications I think).

The real value of the Kafka journal IMO comes with the many external integrations it supports. For example, you can can use the it as an input source for Spark streaming and can do (scalable) stream processing of events generated by persistent actors i.e. you can easily create Akka -> Kafka -> Spark Streaming pipelines. This is an alternative to Akka's PersistentView and even allows processing of events generated by several/all persistent actors with a single consumer such as a single Spark DStream (which is currently a limitation when using PersistentViews).

I just see this as a starting point for what akka-persistence may require from all journal implementations in later releases: provide a persistent event stream generated several persistent actors in a scalable way. This stream could then be consumed with akka-streams or Spark Streaming, using a generic connector rather than a journal-backend-specific, for example.

Initially I just wanted to implement the Kafka integration as interceptor for journal commands so that events are stored in Kafka in addition to another journal backend. This may be ok for some projects, others may think that operational complexity gets too high when you have to administer a Kafka/Zookeeper cluster in addition to a Cassandra or MongoDB cluster, for example.

Hope that clarifies things a bit.

Cheers,
Martin
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Richard Rodseth

unread,
Jul 13, 2014, 10:03:20 AM7/13/14
to akka...@googlegroups.com
Thanks for the detailed reply. I might have been forgetting that Akka persistence can be used for more than persisting DDD aggregates. I had also forgotten that the event store and snapshot store can be different.

Martin Krasser

unread,
Jul 13, 2014, 10:19:55 AM7/13/14
to akka...@googlegroups.com

On 13.07.14 16:03, Richard Rodseth wrote:
Thanks for the detailed reply. I might have been forgetting that Akka persistence can be used for more than persisting DDD aggregates. I had also forgotten that the event store and snapshot store can be different.

You can even use Kafka to implement a snapshot store. You just need to enable log compaction which will always keep the last snapshot (entry) for each persistent actor (key). I also plan to implement a snapshot store backed by Kafka but I'm not sure at the moment how well Kafka supports large log entries.

Martin Krasser

unread,
Jul 14, 2014, 10:36:27 AM7/14/14
to akka...@googlegroups.com, kras...@googlemail.com
There's now a first release of the Kafka journal. Details at https://github.com/krasserm/akka-persistence-kafka

Jonas Bonér

unread,
Jul 14, 2014, 10:47:13 AM7/14/14
to Akka User List, Martin Krasser
Great work Martin. 
Jonas Bonér
Phone: +46 733 777 123
Home: jonasboner.com
Twitter: @jboner

Martin Krasser

unread,
Jul 14, 2014, 10:51:19 AM7/14/14
to akka...@googlegroups.com

On 14.07.14 16:46, Jonas Bonér wrote:
Great work Martin.

Thanks!

Heiko Seeberger

unread,
Jul 14, 2014, 11:04:34 AM7/14/14
to akka...@googlegroups.com, kras...@googlemail.com
Fantastic!

Great work, Martin. Keep it coming!

Heiko

--

Heiko Seeberger
Twitter: @hseeberger
Web: heikoseeberger.de




Martin Krasser

unread,
Jul 14, 2014, 11:08:08 AM7/14/14
to akka...@googlegroups.com
Thanks Heiko, really hope to get some user feedback ...

Ashley Aitken

unread,
Jul 15, 2014, 11:25:33 PM7/15/14
to akka...@googlegroups.com, kras...@googlemail.com

I think this is a fantastic development (if I understand it correctly).

From my reading and basic understanding I had concerns about the need for two event infrastructures to 1) implement more complex view models in CQRS because currently Views can't follow more than one PersistentActor, and 2) integration of events with non-Akka based systems.

Please correct me if I am wrong but having one powerful event infrastructure (like Kafka) as the "event store" to use across applications will enable (2) and possibly (1) for now and in the future perhaps with akka-streams.  Particularly as Kafka provides publish-subscribe functionality.  

Event stores and streams seem so central to many contemporary systems. 

Martin Krasser

unread,
Jul 16, 2014, 2:09:06 AM7/16/14
to akka...@googlegroups.com

On 16.07.14 05:25, Ashley Aitken wrote:

I think this is a fantastic development (if I understand it correctly).

From my reading and basic understanding I had concerns about the need for two event infrastructures to 1) implement more complex view models in CQRS because currently Views can't follow more than one PersistentActor, and 2) integration of events with non-Akka based systems.

Please correct me if I am wrong but having one powerful event infrastructure (like Kafka) as the "event store" to use across applications will enable (2) and possibly (1) for now

At the moment, it is not possible that a PersistentView consumes from a user-defined topic (user-defined event stream), this requires an extension to Akka Persistence (both SPI and API). I think it makes sense to go into that direction and to require that all journal backend stores support that, so that user-defined event streams can be consumed both Akka-internally and by external consumers (that directly connect to the backend store) as well.


and in the future perhaps with akka-streams. 

There already a PersistentView -> akka-streams integration. Once PersistentViews can consume user-defined event streams, you can automatically expose them as Akka streams.

ahjohannessen

unread,
Jul 16, 2014, 9:31:23 AM7/16/14
to akka...@googlegroups.com, kras...@googlemail.com
On Wednesday, July 16, 2014 7:09:06 AM UTC+1, Martin Krasser wrote:
I think it makes sense to go into that direction and to require that all journal backend stores support that, so that user-defined event streams can be consumed both Akka-internally and by external consumers (that directly connect to the backend store) as well.

+1

Roland Kuhn

unread,
Jul 16, 2014, 10:51:34 AM7/16/14
to akka...@googlegroups.com
Hi Martin,

I agree, streams should be a first class citizen in akka persistence. I'm afk right now, could you ensure that this is adequately represented in the issue tracker?

Thanks,

Roland
Regards,

Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM
twitter: @rolandkuhn

Martin Krasser

unread,
Jul 16, 2014, 3:17:36 PM7/16/14
to akka...@googlegroups.com

Hi Roland,

I'm AFK as well until Sun, will do next week.

Cheers,
Martin

Christopher Hunt

unread,
Jan 27, 2018, 9:36:25 PM1/27/18
to Akka User List
Replying to an old thread. I’m interested to learn of any advancements in thinking on this topic over the past 3+ years.

‪If I have Kafka and want CQRS, how much do I need #akka-persistence? I’m starting to wonder if I need it at all and just treating CQRS as the architectural pattern that it is and implement as per: https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/

Thanks.

Cheers
C
Reply all
Reply to author
Forward
0 new messages