Whether to go for a 1:1 approach or a 1:n approach (or a partitioned
m:n approach where m << n) really depends on the concrete use
case and non-functional requirements. Your example might be a good
candidate for a 1:1 approach (see also further comments inline) but
there are also examples for which a 1:n or m:n approach is a better
choice. Here are some general influencing factors:
- length of event history required to recover state: bank accounts
need the full event history to be recovered but order management is
an example where this is often not the case. Orders (trade orders in
finance, lab orders during medical treatments, ...) usually have a
limited validity so that you can recover active orders from a
limited event history (last 10 days, for example) which should make
migrations after code changes rather painless. BTW, having only a
single persistent actor (or a few) that maintains state is
comparable to role of a "Business Logic Processor" in the LMAX
architecture which originated from the high frequency trading
domain.
- latency requirements: creating a new persistent actor has some
overhead, not only memory but also bootstrap as its creation
requires a roundtrip to the backend store. Re-activation of
passivated actors that have been designed around a 1:1 approach, may
also be in conflict with low latency requirements. Good compromises
can often be found by following an m:n approach in this case.
- write throughput: high write throughput can only be achieved by
batching writes and batching is currently implemented on a per
persistent actor basis. Throughput therefore scales better when
having a small(er) number of actors. A large number of actors will
create more but smaller batches, reducing throughput. This is
however more a limitation of the current implementation of
akka-persistence. Maybe a switch to batching on journal level is a
good idea, so that a single write batch can contain events from
several actors.
- ...
Even if you need to replay a long event history (for example after a
code change), you can always do that in the background on a separate
node until the new version of the persistent actor caught up and
switch the application to it when done. You could even have both
versions running at the same time for A/B testing for example. With
a replay rate of 100k/sec you can replay a billion events within a
few hours.
Further comments inline ...
On 26.08.14 20:34, Greg Young wrote:
OK for bank accounts there is some amount of state
needed to verify a transaction. Let's propose that for now its
the branch you opened your account at, your current balance,your
address and a risk classification as well as a customer
profitability/loyalty score (these are all reasonable things to
track in terms of deciding if a transaction should be accepted
or not)
When validating commands, you only need to keep that part of
application state within persistent actors for which you have strict
consistency requirements. In context of bank accounts, this is for
sure the case for the balance, but not necessarily for customer
profitability, loyality score or whatever. These metrics may be
calculated in the background, hence, having eventual read
consistency for them should be sufficient. Consequently this state
can be maintained elsewhere (as part of a separate read model) and
requested from persistent actors during transaction validation. If
you need further metrics in the future, new read models can be added
and included into the validation workflow initiated by a persistent
actor.
I could keep millions of these inside of a single actor.
A few problems come up though:
Replaying this actor from events is very painful (millions
possibly hundreds of millions of events and they must be
processes serially) solution->snapshots?
Snapshots have all the same versioning issues people are
used to with keeping state around. What happens when the state
I am keeping changes say now I also need to keep avg+stddev of
transaction amount or we found a bug in how we were
maintaining the loyalty score (back to #1) this will
invalidate my snapshot
See above, there's no need to keep all of that inside the persistent
actor for strict read consistency. Allowing eventual consistency
during command validation where possible not only makes the
validation process more flexible (by just including new read models
if required) but also reduces snapshot migration efforts (by
simplifying the state structure inside persistent actors).
Furthermore, ensuring strict consistency for persistent actor state
requires usage of persist() instead of persistAsync() which reduces
throughput at least by a factor of 10. That may again be in conflict
with write throughput requirements.
To conclude, I think there are use cases where a 1:1 approach makes
sense but this shouldn't be a general recommendation IMO. It really
depends on the specific functional and non-functional requirements
for finding the best compromise.