Akka persistence: can you use it as the main storage system for you application's data?

488 views
Skip to first unread message

José González Gómez

unread,
Jan 31, 2017, 10:17:53 AM1/31/17
to Akka User List
Hi!

I've been reading about Akka persistence, and it seems the way to go to persist data in a reactive application, using event sourcing and immutable data models. I have no experience doing this, so I'd love to hear about your experience. Any way, after reading the docs, I have the following doubts (please, correct any false assumption I may have made):

First of all, Akka persistence stores data using a journal. Data in that journal (both events and snapshots) are stored after being serialized. This seems to pose several problems:
  • You can't have access to data as you may have in a SQL or NoSQL database, so it seems to be hard to diagnose corrupt data or relationships among that data. Am I missing anything here?
  • Storing data in a journal seems great for storing immutable data and recovering events for event sourced data, but... how do you manage to do queries on that data? I mean, if you store orders using Akka persistence, how do you get orders for example between two dates? The only thing I can think of here is to use PersistenceQueries in order to keep a traditional database in sync with the journal, but that kind of defeats the purpose of Akka persistence, doesn't it?
Regarding how to keep actors in memory, you may have several requests that access the same entity (persistent actor) so I guess they have a different life cycle than request specific actors, that can be destroyed after serving the request. The point is that you seem to end up with the whole database in memory if you don't provide a mechanism to shut down actors that hasn't been used recently. But then you may be shutting down actors that may be used in a short amount of time and have a negative impact on performance due to the actor reloading its state. I guess this is somehow alleviated with the use of snapshots. Again, am I missing anything here?

Thanks!!
José

Alan Burlison

unread,
Jan 31, 2017, 12:53:50 PM1/31/17
to akka...@googlegroups.com
On 30/01/17 18:00, José González Gómez wrote:

> - You can't have access to data as you may have in a SQL or NoSQL
> database, so it seems to be hard to diagnose corrupt data or relationships
> among that data. Am I missing anything here?

Not that I know of, which is why I'm hacking together a journal
implementation that stores in text files using spray JSON to
marshall/unmarshal the data. It won't be distributed, it may lose data
if the JVM crashes and it won't be particularly fast, but it will allow
you to manipulate the journals using tools like jq. My use case is for
debugging and for a simulation application, where the journal files will
be post-processed outside of the simulation.

--
Alan Burlison
--

Justin du coeur

unread,
Jan 31, 2017, 12:57:25 PM1/31/17
to akka...@googlegroups.com
On Mon, Jan 30, 2017 at 1:00 PM, José González Gómez <jose.g...@openinput.com> wrote:
First of all, Akka persistence stores data using a journal. Data in that journal (both events and snapshots) are stored after being serialized. This seems to pose several problems:
  • You can't have access to data as you may have in a SQL or NoSQL database, so it seems to be hard to diagnose corrupt data or relationships among that data. Am I missing anything here?
Not much, but keep in mind that the journal is the *high* level view of things.  Underneath, that journal is usually *implemented* using a SQL or NoSQL database.  (Most often Cassandra, but there are a bunch of options.)  So it is *possible* to introspect on the data using ordinary DB tools, although I wouldn't necessarily recommend it: the schema is peculiar to the Persistence implementation, and you'd have to deserialize in order to make sense of it.
  • Storing data in a journal seems great for storing immutable data and recovering events for event sourced data, but... how do you manage to do queries on that data? I mean, if you store orders using Akka persistence, how do you get orders for example between two dates? The only thing I can think of here is to use PersistenceQueries in order to keep a traditional database in sync with the journal, but that kind of defeats the purpose of Akka persistence, doesn't it?
That comes down to how you structure your application.  Keep in mind that a typical Akka program just plain doesn't *think* like a SQL-centric one, and you're generally working without the usual sorts of ad-hoc queries you do in SQL.  So you need to think about the sorts of queries you're likely to want, and structure the Actors to support them.  (This really isn't a persistence thing, it's an Actors thing.)  In the case you're describing, the answer might be to, eg, denormalize things a bit, and maintain an Actor (or even just a Cassandra log) indexing the orders by Date.  There's no one-size-fits-all answer.

It's also possible to do *some* cross-cutting queries using Tags, but that's a limited mechanism; personally, I haven't done much with it.

Regarding how to keep actors in memory, you may have several requests that access the same entity (persistent actor) so I guess they have a different life cycle than request specific actors, that can be destroyed after serving the request. The point is that you seem to end up with the whole database in memory if you don't provide a mechanism to shut down actors that hasn't been used recently. But then you may be shutting down actors that may be used in a short amount of time and have a negative impact on performance due to the actor reloading its state. I guess this is somehow alleviated with the use of snapshots. Again, am I missing anything here?

Usually, the situation you're describing is handled with a combination of:
  • Cluster Sharding, so that you can just send messages to Actors, and wake them up automatically if they're passivated, and
  • Passivation -- having each Actor on a timeout or some such, so that it goes away when you're not actively using it.
This particular combination is *very* common, and in practice works well if you tune your timeouts to match your circumstances.

lutzh

unread,
Feb 1, 2017, 5:31:13 PM2/1/17
to Akka User List


On Tuesday, 31 January 2017 16:17:53 UTC+1, José González Gómez wrote:
Hi!

I've been reading about Akka persistence, and it seems the way to go to persist data in a reactive application, using event sourcing and immutable data models. I have no experience doing this, so I'd love to hear about your experience. Any way, after reading the docs, I have the following doubts (please, correct any false assumption I may have made):

First of all, Akka persistence stores data using a journal. Data in that journal (both events and snapshots) are stored after being serialized. This seems to pose several problems:
  • You can't have access to data as you may have in a SQL or NoSQL database, so it seems to be hard to diagnose corrupt data or relationships among that data. Am I missing anything here?
No, I don't think you are.
  • Storing data in a journal seems great for storing immutable data and recovering events for event sourced data, but... how do you manage to do queries on that data? I mean, if you store orders using Akka persistence, how do you get orders for example between two dates? The only thing I can think of here is to use PersistenceQueries in order to keep a traditional database in sync with the journal, but that kind of defeats the purpose of Akka persistence, doesn't it?

No, it don't think it defeats the purpose, I think the way you describe is how it's supposed to be done. Have you had a look at the CQRS pattern (Command Query Responsibility Segregation, https://martinfowler.com/bliki/CQRS.html)? The basic idea is that you write to the event log for durable storage, to be able to recreate entities, but for querying you have a whole other data store, the read side. You can use the same technology as for the write side (e.g. you could use a NoSQL DB for both), but you could also have, as you say, a traditional database. If you google some combinations of Akka, CQRS, event sourcing, you should find a couple of presentations about it. You could also check the "Lagom Persistence" documentation - http://www.lagomframework.com/documentation/1.2.x/java/PersistentEntity.html -, Lagom Persistence is an Event Sourcing/CQRS layer/DSL on top of Akka Persistence. Even if you end up not using it, it should give you some inspiration.

Regarding how to keep actors in memory, you may have several requests that access the same entity (persistent actor) so I guess they have a different life cycle than request specific actors, that can be destroyed after serving the request. The point is that you seem to end up with the whole database in memory if you don't provide a mechanism to shut down actors that hasn't been used recently. But then you may be shutting down actors that may be used in a short amount of time and have a negative impact on performance due to the actor reloading its state. I guess this is somehow alleviated with the use of snapshots. Again, am I missing anything here?

I've only seen Akka Persistence in combination with Akka Cluster Sharding, and then the cluster sharding takes care of the lifecycle for you. It will also passivate actors that haven't been used for a while, and activate them if you send a message again. Activating a persistent actor indeed means replaying the events, but possibly only since the last snapshot. So yes, fundamentally, the architectural idea is to have the whole database in memory (https://www.martinfowler.com/bliki/MemoryImage.html) and to recreate entities by replaying all events. But thanks to some optimizations (actication/passivation, snapshots) you only have a "working set" in memory, and only replay events since the last snapshot.

 

Thanks!!
José
Reply all
Reply to author
Forward
0 new messages