Akka persistence in production - recommended journals?

Dobrin Ivanov

unread,

Nov 29, 2017, 11:14:01 AM11/29/17

to Akka User List

Hi,

I'm trying to investigate whether to use akka persistence in production using java.

I've been looking at https://doc.akka.io/docs/akka/snapshot/persistence.html

And also tried this example : https://github.com/akka/akka-samples/tree/2.5/akka-sample-persistence-java

The article refers to (native LevelDB) https://github.com/fusesource/leveldbjni while the example refers to (some java LevelDB port) https://github.com/dain/leveldb ... not sure why?

Simply changing to use the native one in the example does not work: java.lang.NoClassDefFoundError: org/iq80/leveldb/impl/Iq80DBFactory

Q1: So if I do not need a cluster/failover/replication (or i my want to do it myself for example) then can I use LevelDB in production? And if yes - I guess it should be the native one?

Q2: Then the article recommends replicated journals. So does anybody use akka-persistence-jdbc OR okumin/akka-persistence-sql-async in production for example?
Are there any recommendations?
I guess they can be used in java too and not only in scala but please correct me if I'm wrong.

Thanks!
(inexperienced akka user)

Konrad “ktoso” Malawski

unread,

Nov 29, 2017, 11:22:39 AM11/29/17

to akka...@googlegroups.com, Dobrin Ivanov

LevelDB is not, in any case, intended for production systems.

Production ready journals would be:

- the cassandra one, maintained by the akka team, it is most mature and has most features

- the jdbc one, community maintained but seems to work well

- we’ve heard of people using the mongo one, but I can’t say if that’s a good idea, likely not?

Happy hakking

--

Cheers,

Konrad 'ktoso' Malawski

Akka @ Lightbend

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Daniel Stoner

unread,

Nov 30, 2017, 5:00:19 AM11/30/17

to Akka User List

I would second Konrads suggestion of going with the Cassandra version. I have had great experiences with that, and the AWS DynamoDB community version (Albeit it lacked a lot of features and we ended up porting our own version and bug fixing/extending it. Surprisingly this ended up being a lot easier than I was expecting and we began by rethinking the testing on the library first to verify our understanding of its behaviour).

The real question is - Have you tried using Akka persistence in your test environment? Or even locally on your own PC? (It's easy to spin up Cassandra on your PC from my memory of doing it).

If your considering going towards Event Sourcing then be aware that it can be tricky, long term in production (Having used it for 3 years) event adaptation and migration becomes mandatory to understand and integral. You really wish you'd knew about it on day one when you designed your persisted events.

If you run a cluster topology then transient errors in your cluster which for all intents and purposes may even go unnoticed - can mess with your stored events and cause critical failure at the point you try to restart the application - several days after that network blip manifested!

Will you choose Protobuf for your database entities and then have problems interrogating your production data if your system fails to start up? Will you write your own serializer/use something like JSON so that you can debug this one day?

There are many topics to understand before you should be making the call on which one of these journals to use in production and I suggest you try out running a cluster somewhere with persistence implemented and a continuous integration pipeline that deploys on every commit without downtimes. See if you can support keeping it alive and have the necessary practices in place to remedy issues with it. Load test it heavily with tools such as Gatling or Apache JMeter - nothing causes nodes to drop out of the cluster quite like huge stop the world garbage collections when you run at scale!

I don't mean at all to scare you off - writing applications which scale well horizontally and utilize advanced patterns such as Event Sourcing (And receive all the benefits which that comes with!) is hugely rewarding and gives your software a definite edge over other architectures. But do try to make sure that you understand what the cost is of having them and the extra architectural, maintenance and support implications it may have on your product.

Perhaps have a read of: https://martinfowler.com/eaaDev/EventSourcing.html If your thinking of heading in that direction - then try to understand how it is Akka designed great solutions for each of the problems that this architecture poses.

Dobrin Ivanov

unread,

Nov 30, 2017, 8:22:35 AM11/30/17

to Akka User List

Thanks to Konrad for pointing out the recommended journals! (I guess I would like to try java+akka persistence+PostgreSQL now)
And thanks to Daniel too for all the suggestions and sharing experience!

This may not be related to this topic but since you are sharing some other things I would share some too:

I have experience with DDD in multiple projects in production and I have even tried CQRS in a small project once in the past too. On the other hand I tend to use DomainEvents a lot (EDA) but persist/update Aggregates as snapshots instead of rebuilding them from events mostly. Its seems easier to me and nobody was interested in keeping any history in the form of events or any other form so far in my projects.

I have also experience with Cassandra/DDD at scale using LWTs/CAS. I know they say this can be 3 times slower and we do not recommend it but we did it and it works but you need experienced Cassandra administrators, you need to monitor it carefully because if it gets overloaded a little especially if you use LWTs very bad things can happen like downtime for long periods and so on.. there are many people that had bad experience with it in the past like big clusters slowly and steadily going down and they were not able to do anything to stop them.
And you know that many people tend to manually shard/build clusters on top of RDBMS like Facebook/Youtube/Pinterest/...

And now it seems that Akka is the best tool if you want both of these (DDD+CQRS), right?

I have been looking at Akka in the past and if I remember correctly the only way to get "actor per Aggregate per Akka cluster" was to use cluster singletons which is not a solution. Now as I'm reading the docs it seems that cluster sharding improved and this is supported, right?

https://doc.akka.io/docs/akka/snapshot/cluster-sharding.html
>It could for example be actors representing Aggregate Roots in Domain-Driven Design terminology. Here we call these actors “entities”. These actors typically have persistent (durable) state, but this feature is not limited to actors with persistent state.
>
>Cluster sharding is typically used when you have many stateful actors that together consume more resources (e.g. memory) than fit on one machine. If you only have a few stateful actors it might be easier to run them on a Cluster Singleton node.

By supported I mean that Aggregates can now be spread across the cluster (and not sitting on the oldest node like when using cluster singletons) and also you cannot get the same Aggregate/actor on more then one cluster node at a time even during split brain or any other kind of failures, right?

Daniel Stoner

unread,

Nov 30, 2017, 8:52:57 AM11/30/17

to akka...@googlegroups.com

Hi Dobrin,

Cluster Sharding can indeed do most of the things you suggest - and we use it for a much neater fit than Cluster Singletons. ClusterSingletons should be for situations where you want a single 'DatabaseAccessActor' for a whole application. ClusterSharding is better when you need lots of 'ProductActors' but ensuring you only have 1 instance of ProductActor per given Product. You do this via the 'shardId' - and on a single cluster you can guarantee only a single Actor exists for a given shardId.

In terms of whether your 'shards' will move around the cluster sensibly - I couldn't comment factually. I've not personally seen a shard move around - for instance get re-distributed in the situation more nodes are started in your cluster - except if an actor fails and is restarted somewhere else. It is something we will have to investigate one day - and I am pretty sure I saw a very old research paper about getting Akka to do it - but no idea if this became reality. If all else fails just put in some deterministic arbitrary suicide in your actors (Once every 10k events) and redistribution will happen naturally and at predictable moments anyway :)

I would however draw attention to your mention of:
"even during split brain"

As you have to be aware that your perception of 'I have 1 cluster and it has a split brain' differs massively from a node's perspective on one half of that brain which is 'I'm in a seemingly small cluster'.

It is absolutely possible during a split brain that is poorly handled - that both sides of the split believe themselves to be fully functioning clusters and your singletons get duplicated. This is what I refer to in terms of finding out your persistence got ruined and you can no longer deploy to new nodes (Akka persists cluster sharding information in an event sourcing style constantly saying 'ShardX live on nodeY' and the likes - On startup of a node this information is read so as to understand if another node already owns a shardId or not and build the routing table for 'should a request come in for ShardX I need to route to nodeY').

Lots of really easy ways to fix it like setting your min number of clustered instances in application.conf == your cluster size - but this only works if you have statically sized clusters.

Having implemented dynamically sizing clusters in AWS we went through countless strategies for split brain awareness.

What we ended up settling on and having robust characteristics with was:
'If the cluster splits - and your not with the oldest node by uptime [which everyone can at least agree on] - then you should commit suicide'

The implementation of such was far far from trivial however and we were fortunate that AWS was able to inform us about what nodes should exist in the cluster even if nodes cannot communicate with each other (Albeit with a ~20second lag on this information - which is why we had to rely on 'oldest node' and not 'newest node').

Lightbend provided a number of Split Brain awareness/handling solutions for paid for subscribers which my team use nowadays - I haven't heard any complaints but not being close enough to when that got changed I couldn't tell you which of the solutions from Lightbend they used.

You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/f_Yi4h_RmDI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+unsubscribe@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Daniel Stoner | Senior Software Engineer BSSCS | Ocado Technology

daniel...@ocado.com | Ext 7969 | www.ocadotechnology.com

Buildings One & Two, Trident Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL

Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.

If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.

Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.

References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL.

Dobrin Ivanov

unread,

Dec 5, 2017, 2:43:06 PM12/5/17

to Akka User List

Hi Daniel,

Thanks for sharing this information, lots of interesting stuff!
Seems that I need to dig more .. as usual :)

Just saw this presentation https://www.youtube.com/watch?v=qfchx7y6c3c and seems that there is a payed "Static Quorum" split brain strategy that looks very similar to your suggestion "min number of clustered instances in application.conf == your cluster size - but this only works if you have statically sized clusters."

To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Daniel Stoner | Senior Software Engineer BSSCS | Ocado Technology
daniel...@ocado.com | Ext 7969 | www.ocadotechnology.com
Buildings One & Two, Trident Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL

Daniel Stoner

unread,

Dec 11, 2017, 5:03:21 AM12/11/17

to akka...@googlegroups.com

FYI There seems to be community (free) split brain resolvers too!

Checkout this blog I spotted in the Lagom community (which runs Akka behind the scenes - and this blog in particular focuses on)
https://medium.com/stashaway-engineering/running-a-lagom-microservice-on-akka-cluster-with-split-brain-resolver-2a1c301659bd

It covers usage of:
https://github.com/mbilski/akka-reasonable-downing

Which on the face of it seems a relatively sensible and simple approach if you have a static sized cluster.

To unsubscribe from this group and all its topics, send an email to akka-user+unsubscribe@googlegroups.com.

To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--

Daniel Stoner | Senior Software Engineer BSSCS | Ocado Technology

daniel...@ocado.com | Ext 7969 | www.ocadotechnology.com

Buildings One & Two, Trident Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL

Reply all

Reply to author

Forward