Re: [akka-user] Discrete Event Simulation using Akka (cluster, persistence)

Roland Kuhn

unread,

Mar 7, 2013, 7:33:43 AM3/7/13

to akka...@googlegroups.com

Hi Martin,

6 mar 2013 kl. 15:54 skrev Martin Simons:

Dear hAkkers,

last year I've used Akka (Java API) in a typical "scale up" scenario to fix a performance bottleneck in a business simulation game I'm working on. We haven't had any issues with this core part of the game ever since. Working with Akka has been a pleasure so far, especially after I got a hang of "thinking in actors and messages".

Thanks for the kind words!

Currently I'm in the prototyping phase for our next game and - naturally - Akka is on top of the technology list. The project will be business simulation again, allowing thousands of players to produce, trade and transport goods in a single persistent game world. My current approach to the architecture of the game's core (that is, the actual simulation) is to model every entity as an actor. Since the data model of the game is rather complex this would free me from most to all locking issues I'd have with a classic action-based, one-session-per-request kind of setup. Entities just send their messages out to other entities and their internal state remains consistent at all times.

Yes, that sounds like a great fit for actors.

But this brings me straight to question #1: Every entity has state and every entity has an identity. This makes their actors singletons…lots of them. In a cluster or any distributed setup, every entity would have to be represented by exactly one actor which I assume to be a tough problem to manage. So I'm wondering whether there are any best practices for this case I should know of. I figure Akka is well-suited for the described kind of discrete event simulation, but how well does it work in a distributed setup? It might be worth mentioning that in my particular case, clustering mostly serves availability purposes. Performance isn't that much of an issue although it would be a nice side-effect to gain scale-out capability "for free".

This is where the cluster membership service and consistent hashing come into play: you can scale out your set of actors across multiple nodes and deterministically place (and thereby find) them according to a hashing scheme. The actors will be talking to each other transparently, that is what location transparency in Akka means. When a node dies you will have to spawn the actors it contained on a new node (or distribute across the remaining ones), which can be quite automatic using event sourcing (see below). Since your entities are persisted I would think that running multiple copies should not be necessary unless you are worried about the latency in case of node crash; if you run multiple copied, make sure that only one of them generates the changes and the others only apply them (I’d have to dig into eventsourced to see how that is supported already).

Question #2 refers to the entity state itself: I intend to keep the state of each entity "cached" as long as the entity is "awake" as an actor. That is, some managing actor creates them once some other entity sends them a message and they get discarded again when they've been idle for a certain period of time. Whenever the internal state of an actor has changed (a message has been handled) it persists its updated state to a database immediately. This way, should the actor system shut down, all that's needed to rebuild it are the entity IDs and each entity actor reloads its state on wake-up.

Exactly.

I'm wondering whether there are ready-made and proven ways to deal with this kind of "actor persistence". I've taken a brief look at eventsourced but as I understand it, each and every event would have to be journaled which I doubt to be sustainable in my case. We're talking about thousands if not millions of actors.

Whether you persist the “latest” state or only snapshots (from time to time) and deltas is a trade-off: having a storage for mutable entities means that concurrent writes are limited to how fast the storage engine can switch between entities, whereas storing events means that the storage engine will be optimized for append-only behavior which allows higher performance. Storing the modifications instead of applying them to the storage also allows you to rewind to any time in history and restart from there, which can be handy in error recovery scenarios. You pay for that by having to buy more disk space, but that is extremely cheap (compared to the rest of the infrastructure costs) unless your players produce terabytes per day by their actions. Do you have an estimate of how frequently entities change? Multiply that by a year and with the average change size (which should be rather small I guess) then you have the storage requirements per user.

I would be grateful for your input. I'm still pretty much in the brainstorming phase, so if you see any fundamental issues with my current approach, please let me know.

Your sketch goes exactly into the right direction as far as I can see.

Regards,

Roland

Thanks a lot,
Martin

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Empowering professional developers to build amazing apps.
twitter: @rolandkuhn

Roland Kuhn

unread,

Mar 8, 2013, 5:06:43 PM3/8/13

to akka...@googlegroups.com

Hi Martin,

8 mar 2013 kl. 09:59 skrev Martin Simons:

Hi Roland,

thanks for your elaborate reply.

This is where the cluster membership service and consistent hashing come into play: you can scale out your set of actors across multiple nodes and deterministically place (and thereby find) them according to a hashing scheme. The actors will be talking to each other transparently, that is what location transparency in Akka means. When a node dies you will have to spawn the actors it contained on a new node (or distribute across the remaining ones), which can be quite automatic using event sourcing (see below).

I'm not quite sure whether you are referring to a feature that Akka already (at least partly) provides or whether I'd have to build a DHT or something similar myself. The only part of Akka that "automatically" distributes actors among nodes right now are cluster-aware routres, right? Should I have a look at these for inspiration or are you referring to something else?

I’m referring to http://doc.akka.io/docs/akka/2.1.1/cluster/cluster-usage-java.html#cluster-aware-routers and http://doc.akka.io/api/akka/2.1.1/#akka.routing.ConsistentHashingRouter; concrete proposal would be to use the router to talk to one coordinator on each cluster node which manages the life-cycle of the entity actors beneath it. The standard ConsistentHashingRouter does not signal hand-off, however, so depending on your precise requirements you will have to have the coordinators talk to each other in order to not start an entity while it is still alive on a different node (e.g. after a node ring change).

Whether you persist the “latest” state or only snapshots (from time to time) and deltas is a trade-off: having a storage for mutable entities means that concurrent writes are limited to how fast the storage engine can switch between entities, whereas storing events means that the storage engine will be optimized for append-only behavior which allows higher performance. Storing the modifications instead of applying them to the storage also allows you to rewind to any time in history and restart from there, which can be handy in error recovery scenarios. You pay for that by having to buy more disk space, but that is extremely cheap (compared to the rest of the infrastructure costs) unless your players produce terabytes per day by their actions. Do you have an estimate of how frequently entities change? Multiply that by a year and with the average change size (which should be rather small I guess) then you have the storage requirements per user.

I've read a bit more about eventsourced and my main concern is now that changing actor implementations might be a problem in the long term. It's a game and not a nuclear powerplant, so behavior and implementation are comparably volatile. But it still makes a lot of sense as a short term safety net for rebooting the game after a restart or failure. So I might try to come up with some kind of hybrid solution. As a side-note, eventsourced's journaling itself is does not support clustering at this point in time, making it a SPOF.

That seems true, although Martin can probably say more on that.

Regards,

Roland

Kind regards,

Martin

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Martin Krasser

unread,

Mar 10, 2013, 3:56:26 AM3/10/13

to akka...@googlegroups.com

Am 08.03.13 23:06, schrieb Roland Kuhn:

Hi Martin,

8 mar 2013 kl. 09:59 skrev Martin Simons:

Hi Roland,

thanks for your elaborate reply.

This is where the cluster membership service and consistent hashing come into play: you can scale out your set of actors across multiple nodes and deterministically place (and thereby find) them according to a hashing scheme. The actors will be talking to each other transparently, that is what location transparency in Akka means. When a node dies you will have to spawn the actors it contained on a new node (or distribute across the remaining ones), which can be quite automatic using event sourcing (see below).

I'm not quite sure whether you are referring to a feature that Akka already (at least partly) provides or whether I'd have to build a DHT or something similar myself. The only part of Akka that "automatically" distributes actors among nodes right now are cluster-aware routres, right? Should I have a look at these for inspiration or are you referring to something else?

I’m referring to http://doc.akka.io/docs/akka/2.1.1/cluster/cluster-usage-java.html#cluster-aware-routers and http://doc.akka.io/api/akka/2.1.1/#akka.routing.ConsistentHashingRouter; concrete proposal would be to use the router to talk to one coordinator on each cluster node which manages the life-cycle of the entity actors beneath it. The standard ConsistentHashingRouter does not signal hand-off, however, so depending on your precise requirements you will have to have the coordinators talk to each other in order to not start an entity while it is still alive on a different node (e.g. after a node ring change).

Whether you persist the “latest” state or only snapshots (from time to time) and deltas is a trade-off: having a storage for mutable entities means that concurrent writes are limited to how fast the storage engine can switch between entities, whereas storing events means that the storage engine will be optimized for append-only behavior which allows higher performance. Storing the modifications instead of applying them to the storage also allows you to rewind to any time in history and restart from there, which can be handy in error recovery scenarios. You pay for that by having to buy more disk space, but that is extremely cheap (compared to the rest of the infrastructure costs) unless your players produce terabytes per day by their actions. Do you have an estimate of how frequently entities change? Multiply that by a year and with the average change size (which should be rather small I guess) then you have the storage requirements per user.

I've read a bit more about eventsourced and my main concern is now that changing actor implementations might be a problem in the long term. It's a game and not a nuclear powerplant, so behavior and implementation are comparably volatile. But it still makes a lot of sense as a short term safety net for rebooting the game after a restart or failure. So I might try to come up with some kind of hybrid solution. As a side-note, eventsourced's journaling itself is does not support clustering at this point in time, making it a SPOF.

That seems true, although Martin can probably say more on that.

Currently yes but the next release will come with HA event logs based on HBase and DynamoDB. This is work in progress. With them you'll also be able to scale write throughput by adding nodes.

-- 
Martin Krasser

blog:    http://krasserm.blogspot.com
code:    http://github.com/krasserm
twitter: http://twitter.com/mrt1nz

Jonas Bonér

unread,

Mar 14, 2013, 6:22:26 AM3/14/13

to akka...@googlegroups.com

On Sun, Mar 10, 2013 at 8:56 AM, Martin Krasser <kras...@googlemail.com> wrote:

Am 08.03.13 23:06, schrieb Roland Kuhn:

Hi Martin,

8 mar 2013 kl. 09:59 skrev Martin Simons:

Hi Roland,

thanks for your elaborate reply.

This is where the cluster membership service and consistent hashing come into play: you can scale out your set of actors across multiple nodes and deterministically place (and thereby find) them according to a hashing scheme. The actors will be talking to each other transparently, that is what location transparency in Akka means. When a node dies you will have to spawn the actors it contained on a new node (or distribute across the remaining ones), which can be quite automatic using event sourcing (see below).

I'm not quite sure whether you are referring to a feature that Akka already (at least partly) provides or whether I'd have to build a DHT or something similar myself. The only part of Akka that "automatically" distributes actors among nodes right now are cluster-aware routres, right? Should I have a look at these for inspiration or are you referring to something else?

I’m referring to http://doc.akka.io/docs/akka/2.1.1/cluster/cluster-usage-java.html#cluster-aware-routers and http://doc.akka.io/api/akka/2.1.1/#akka.routing.ConsistentHashingRouter; concrete proposal would be to use the router to talk to one coordinator on each cluster node which manages the life-cycle of the entity actors beneath it. The standard ConsistentHashingRouter does not signal hand-off, however, so depending on your precise requirements you will have to have the coordinators talk to each other in order to not start an entity while it is still alive on a different node (e.g. after a node ring change).

Whether you persist the “latest” state or only snapshots (from time to time) and deltas is a trade-off: having a storage for mutable entities means that concurrent writes are limited to how fast the storage engine can switch between entities, whereas storing events means that the storage engine will be optimized for append-only behavior which allows higher performance. Storing the modifications instead of applying them to the storage also allows you to rewind to any time in history and restart from there, which can be handy in error recovery scenarios. You pay for that by having to buy more disk space, but that is extremely cheap (compared to the rest of the infrastructure costs) unless your players produce terabytes per day by their actions. Do you have an estimate of how frequently entities change? Multiply that by a year and with the average change size (which should be rather small I guess) then you have the storage requirements per user.

I've read a bit more about eventsourced and my main concern is now that changing actor implementations might be a problem in the long term. It's a game and not a nuclear powerplant, so behavior and implementation are comparably volatile. But it still makes a lot of sense as a short term safety net for rebooting the game after a restart or failure. So I might try to come up with some kind of hybrid solution. As a side-note, eventsourced's journaling itself is does not support clustering at this point in time, making it a SPOF.

That seems true, although Martin can probably say more on that.

Currently yes but the next release will come with HA event logs based on HBase and DynamoDB. This is work in progress. With them you'll also be able to scale write throughput by adding nodes.

Track progress here: https://github.com/eligosource/eventsourced/issues/67

--
Jonas Bonér
Phone: +46 733 777 123
Home: jonasboner.com
Twitter: @jboner

Reply all

Reply to author

Forward