Best practices using Akka Persistence with long-running projects?

Martin Simons

unread,

Jul 17, 2014, 12:37:51 PM7/17/14

to akka...@googlegroups.com

This is essentially a follow-up to this thread I started over a year ago: https://groups.google.com/d/msg/akka-user/tvV-j07190k/sr_t-Uy9SqkJ

Since then, plenty of things have changed...the mentioned project has progressed quite a bit (but is still far from release), Akka Cluster is production-ready and Eventsourced has become Akka Persistence.

Back in 2013 and based in part on the outcome of the original thread I've decided to not use Eventsourced and instead treat every message to one of my "entity actors" as a database transaction (using PostgreSQL), not keeping any state in memory but reloading it (or at least the required bits) whenever a new message comes in. Turns out this does not scale at all, neither in terms of availability (RDBMSs are just not very good at this) nor in terms of performance (too many transactions, expensive reads and writes). The only advantage I still see is the relative "stateless-ness" which made distributing the actors in a cluster a tad easier (which is really overrated when the backend isn't distributable). More than a year later I could kick myself for this decision.

Consequently I'm thinking about rewriting the system using Akka Persistence with Cassandra as backend and using Cluster Sharding instead of my own, router-based approach for clustering. But to this day I am worried about some implications of this kind of system:

One of my longer running game projects has persistent game worlds that have been open for more than 6 years and it's a complex simulation as well. The amount of stored events for this timespan would probably go into the terrabytes. I know you can delete events, but is this recommended practice?
Are there any best practices concerning the problem of changing event structures? Persistent actors would need to be able to handle the case of the version of a stored event being different from the current implementation. Not only serialization/deserialization is problematic, but also changing logic within the actor. Since I'd assume that these are inherent problems of EventSourcing in general, I'm pretty sure there must be proven ways of dealing with this.
Last but not least: Most examples I found using Akka Persistence usually use very small amounts of actors, mostly for simple demonstration purposes. In my case, we're talking about many thousands of actors. I don't know the internals of Akka Persistence enough to know whether this might cause problems. Just imagine thousands of actors doing stuff at the same time, so thousands of events would have to be persisted at the same time as well. Is the bottleneck just my disk's IOP rate or do you see anything within Persistence that might cause problems? I know persisted events have sequence numbers which sounds like a bottleneck to me...or are these actor-specific?

I'm almost convinced already. In fact, aside from the things mentioned above and given the problems I struggle with right now, this model sounds extremely elegant to me. Please tell me that I don't have to worry about my questions and/or that there are proven solutions for them :-)

Konrad Malawski

unread,

Jul 21, 2014, 7:28:27 AM7/21/14

to Akka User List

Hi Martin,

Glad to see you back and still hakking on it :-)

I do think your use-case fits persistence, but let's have a shot at your questions first:

One of my longer running game projects has persistent game worlds that have been open for more than 6 years and it's a complex simulation as well. The amount of stored events for this timespan would probably go into the terrabytes. I know you can delete events, but is this recommended practice?

Deleting is always a trade off... On one hand, if you have so much events, I'm sure you could use these to fine-tune and tweak the game's balance (as riot games do -- http://www.slideshare.net/Hadoop_Summit/big-data-at-riot-games ).

Perhaps if a decent serialiser is used it wouldn't blow up too quickly?

On the other hand, if you really feel you need to delete you can of course. Depends on the kinds of events / domain.

In your case perhaps performing an export export to some cheaper storage before doing the `deleteUntilSeqNr=N` would be viable?

This can be easily done as - replay to N, while doing so perform backup of these events, snapshot, delete-until N, done).

It depends on your needs of course, maybe deleting these events is perfectly not an option (hey, disk is cheap!), or maybe it's fine - you'll have to decide based on your domain and possible future uses of that data.

Are there any best practices concerning the problem of changing event structures? Persistent actors would need to be able to handle the case of the version of a stored event being different from the current implementation. Not only serialization/deserialization is problematic, but also changing logic within the actor. Since I'd assume that these are inherent problems of EventSourcing in general, I'm pretty sure there must be proven ways of dealing with this.

For starters I'd recommend hooking in a custom serialiser http://doc.akka.io/docs/akka/2.3.4/scala/persistence.html#custom-serialization

In general I've seen two methods around.

One is versioning events + keeping all impls around + dispatching based on event version.

This bloats the model i believe, because you keep your legacy event model around too (in terms of, you keep the code that can work on it).

The other method I like more personally is promoting events.

So an event comes in it's in version 2. You're running your app on version 5 events. You have "promotions" ready in your system and can promote an v2 event to v3, then to v4, then to v5, and after promoting it to

your current runtime's version, you can give it to your actors. What's the wins? You don't keep the old implementations around and you and you can hook in the promotions into the custom serialiser in akka. Read in old events, promote them, return them to be used.

Not all data can be promoted like this of course, so you might end up with defaults in many places. Anyhow - this again all depends on how you model your domain.

Last but not least: Most examples I found using Akka Persistence usually use very small amounts of actors, mostly for simple demonstration purposes. In my case, we're talking about many thousands of actors. I don't know the internals of Akka Persistence enough to know whether this might cause problems. Just imagine thousands of actors doing stuff at the same time, so thousands of events would have to be persisted at the same time as well. Is the bottleneck just my disk's IOP rate or do you see anything within Persistence that might cause problems? I know persisted events have sequence numbers which sounds like a bottleneck to me...or are these actor-specific?

I'm glad you ask! :-)

Journals used in such clusters usually would be something like Cassandra (Martin Krasser's impl) or HBase (My impl) and these can scale writes very very well (long story short, you know which server a write will go to up-front so you only hit 1 region-server and not some "master" etc).

If you're into cassandra and hbase you know that they completely hate sequential ids as row keys as it kills the load spreading.

See also hbase docs 6.3.1 - rowkey design or better an analysis of the so called hotspotting problem), so in theory sequenceNr indexed values sound terrible for these datastores - which is true if the row key is designed naively.

Thank goodness both the cassandra and hbase plugins (I am not familiar with the other implementations to judge them) do have their row keys optimised for akka-persistence access patterns and can leverage the entire hbase/cassandra cluster for taking writes (awesome scalability).

The basic concept of optimising the rowkeys is described on this blog (same link as above) or for a more visual guide you can check out my lightning talk about akka-persistence-hbase rowkey design (skip to slide 41 for the design part).

In general - yes it will handle tens thousands actors if your journal can keep up, and these datastores really should be able to.

But of course, we assume you do scale your datastore accordingly. If you slam millions of persistent actors against 1 small node on EC2, well, that's not going to keep up :-)

I hope this helps, and feel free to ask follow up questions if you have any!

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Cheers,

Konrad 'ktoso' Malawski

hAkker @ Typesafe

Martin Simons

unread,

Jul 23, 2014, 11:37:20 AM7/23/14

to akka...@googlegroups.com

First of all, thank you very much for the elaborate reply. I guess I'll get hacking away at some prototypes soon and so some performance testing close to our use-case. Looking forward to it, too :-)

One question remains:

Am Montag, 21. Juli 2014 13:28:27 UTC+2 schrieb Konrad Malawski:

The other method I like more personally is promoting events.
So an event comes in it's in version 2. You're running your app on version 5 events. You have "promotions" ready in your system and can promote an v2 event to v3, then to v4, then to v5, and after promoting it to

your current runtime's version, you can give it to your actors. What's the wins? You don't keep the old implementations around and you and you can hook in the promotions into the custom serialiser in akka. Read in old events, promote them, return them to be used.

So what you suggest is that the serializer first deserializes a stored event "forgivingly", without throwing an exception for for example dropping removed field and initializing new fields with 0/null/default values, and then I compare the version and apply additional conversion logic? Or do you mean something more complex like storing a "dehydrated" version of the event and re-hydrating it every time an event is loaded from storage, promoting it to the latest version in this process?

Konrad 'ktoso' Malawski

unread,

Jul 24, 2014, 5:51:57 AM7/24/14

to akka...@googlegroups.com, Martin Simons

I had a reply prepped yesterday but lost the draft (argh offline editing!).

Back to the topic though:

First of all, thank you very much for the elaborate reply. I guess I'll get hacking away at some prototypes soon and so some performance testing close to our use-case. Looking forward to it, too :-)

Great to hear, let us know about your findings please :-)

So what you suggest is that the serializer first deserializes a stored event "forgivingly", without throwing an exception for for example dropping removed field and initializing new fields with 0/null/default values, and then I compare the version and apply additional conversion logic? Or do you mean something more complex like storing a "dehydrated" version of the event and re-hydrating it every time an event is loaded from storage, promoting it to the latest version in this process?

This very much depends on what serialiser you use, because they inherently support some kind of schema evolution.

I’ve been mostly using protocol buffers in my career. There, what you do is: mostly use optional fields so these can be null, and you have to be prepped for this anyway.

And then you can reserve some space for extensions, using that you’re able to apply this trick: http://www.indelible.org/ink/protobuf-polymorphism/

Because of how proto works, renames are safe for example - because it’s not using the name in the binary format, but the ID. etc etc…

This should give you a feel how “bound” your evolution strategies will become to the serialisation format.

Other basic tips are nicely summarised here: http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

Again though, dig deeper into the details of each serialisation format before you decide to go for it (we don’t really have a “use that one” opinion on this).

Other serialisation formats, such as Simple Binary Encoding (designed to be low latency envs - check benchmarks) support this via keeping version numbers in the message header:

https://github.com/real-logic/simple-binary-encoding/wiki/Message-Versioning

In essence - version number => some number of extension fields, you cannot delete fields in this one.

One other format I’d like to point to here is Cap n’ Proto (same author as original protobufs AFAIK), and their description on this (as it’s listing it up quite nicely what is safe and what is not):

http://kentonv.github.io/capnproto/language.html#evolving_your_protocol

Summing up… These strategies allow you to change your events and stay compatible with the same type - to still be able to deserialise.

If you however at some point make some huge change, and need to get events B but you have events A serialised, or you need to add additional data you can add another layer

that would promote A events to B events, and then emit them to the actor. This gets more complicated, but the point is - you’re not doomed even if you have to do bigger changes (try to avoid this ofc ;-)).

I know I’ve just grown your to-read list by quite a bit, but I hope this helps! :-)

--

Konrad 'ktoso' Malawski

hAkker @ typesafe

http://akka.io

Martin Simons

unread,

Jul 25, 2014, 12:42:11 PM7/25/14

to akka...@googlegroups.com, mar...@lunikon.net

Am Donnerstag, 24. Juli 2014 11:51:57 UTC+2 schrieb Konrad Malawski:

I know I’ve just grown your to-read list by quite a bit, but I hope this helps! :-)

Glad you did ;-)

I at least skimmed all the articles you posted and decided that Protobuf is probably the most common thing to use. I went on to hack together a small example project, trying to keep the amount of boilerplate code per class as low as possible as it will be required for each and every event and state class that might be added in the future. No matter how one does it, it will always look more or less messy when translating Protobuf Java objects to Scala case classes, I'm afraid.

The result of my humble attempts can be found here: https://github.com/lunikon/akka-persistence-serialization The code is still a mess, obviously, but do you see a general problem with this approach (the approach being that one defines an "adapter" for every case class and registers it with a static registry which is then used by the Akka Serializer implementation)?

Konrad Malawski

unread,

Jul 28, 2014, 5:03:54 PM7/28/14

to Akka User List, Martin Simons

Hello Martin,

yeah, that looks like what I had in mind.

And yeah, translating to/from proto is always a bit of verboseness somewhere, here's how we try to minimise the verboseness (serializer impl):

https://github.com/akka/akka/blob/release-2.3/akka-persistence/src/main/scala/akka/persistence/serialization/MessageSerializer.scala#L51

Sadly, the serializers that are slick and nice to work with, don't usually have versioning/evolution built in from what I see available...

Kryo is fast and using it is great, but no versioning / anything built in => no-go for this use case (IMO at least).

--

>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Cheers,

Konrad 'ktoso' Malawski

hAkker @ Typesafe

Sander Mak

unread,

Jul 30, 2014, 9:00:07 AM7/30/14

to akka...@googlegroups.com, mar...@lunikon.net

Hi Martin,

Have you also look at Apache Avro? I haven't used in it conjunction with Akka (Persistence) yet, but have used it in other projects with success. It does have a strong notion of schema evolution. Checkout http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html for an in-depth treatment.