Improving Akka Persistence wrt CQRS/ES/DDD

3,058 views
Skip to first unread message

ahjohannessen

unread,
Jul 25, 2014, 11:05:40 AM7/25/14
to akka...@googlegroups.com
I think that Akka Persistence lacks in some areas that are essential for using CQRS
with DDD aggregates effectively (without a lot of pain):

 1) Ability to read persistent actor messages by type of Actor. This can be as simple
    as a key prefix that represents a "category/tag/topic",
    e.g. <topic>-<persistenceId>. Essentially an embedded secondary index.

 2) As there is no global sequence number for events across all messages in the same
    way there is for messages of a single persistent actor; there should be
    a way to track the logical positions of messages. GetEventStore uses a similar
    approach.

 1: Gives us ability to read across all events from a persistent actor family, e.g. customers.

 2: Gives us ability, by way of 1, to do catch-up-subscription of all events from persistent
    actors of same type, say customers, and keep track of position. This is very useful for
    maintaining a read model and integration with other systems.

I have tried many approaches and to mention a few:

 a) One persistent actor for a "topic".
 b) Join streams using AtLeastOnceDelivery in individual persistent actors to
    a "topic" aggregator.
 c) Using a persistent actor as a journal for a "topic" and replaying that
    into views, filtering events produced by "topic" siblings.

All of the above introduce unnecessary complications and work-around gymnastics such as
dealing with state size and archival (a), worrying about out-of-order (b) and using
background workers that move views by way of snapshotting forward (c). All of this
because AP does not provide a way to achieve 1) and 2).

From my point of view it is evident that AP lacks some fundamental and essential 
capabilities in order to get a decent CQRS/DDD/ES setup.

Konrad 'ktoso' Malawski

unread,
Jul 25, 2014, 12:20:02 PM7/25/14
to akka...@googlegroups.com, ahjohannessen
Hello there,
Ah, much better when able to communicate in full sentences without 140 chars limit! ;-)

So, now that it’s spelled out as full sentences, I’ll gladly dig into your points:

1) 
Has been already proposed and accepted in https://github.com/akka/akka/issues/15004,
including your +1, so I guess you’re aware that it’s in our backlog.

The module is experimental and published “early” exactly in order to gather,
and implement these features before stabilising the APIs.

So it’s coming, we simply have not yet gotten to implementing it - it’s holiday season, which isn’t helping development speed :-)

2) 
For the benefit of the discussion, example in EventStore: http://geteventstore.com/blog/20130707/catch-up-subscriptions-with-the-event-store/

One thing to keep in mind here is that some Journals would have no problem implementing this, such as Kafka or EventStore - because it’s a built in mechanism to “subscribe to something” in them… See Martin’s Kafka journal and how one can subscribe to a event stream there: https://github.com/krasserm/akka-persistence-kafka#journal-topics On the other hand implementing this in other Journals would be pretty painful / inefficient (cassandra, hbase, …).

We were playing around with some ideas to expose optional db specific journal functionalities. This would be a good candidate for this.

This request seems to depend on these things by the way: 
* changes in the journal plugins (we some changes there anyway https://github.com/krasserm/akka-persistence-kafka#journal-topics ),
* views over “tags" (as this would essentially be a view over “all”),
* and lastly reactive-streams (views exposed as streams of events).


Thanks for your feedback and keep in mind that no-one said that this module is “done”.
It’s still experimental and this kind of feature requests are exacly what we need and will have to provide to make it stable and usable in all kinds of apps.

Lastly, would you mind creating a ticket for the 2) feature?
Thanks in advance, have a nice weekend :-)

-- 

Konrad 'ktoso' Malawski
hAkker @ typesafe

Martin Krasser

unread,
Jul 26, 2014, 4:42:53 AM7/26/14
to akka...@googlegroups.com

On 25.07.14 18:19, Konrad 'ktoso' Malawski wrote:
Hello there,
Ah, much better when able to communicate in full sentences without 140 chars limit! ;-)

So, now that it’s spelled out as full sentences, I’ll gladly dig into your points:

1) 
Has been already proposed and accepted in https://github.com/akka/akka/issues/15004,
including your +1, so I guess you’re aware that it’s in our backlog.

The module is experimental and published “early” exactly in order to gather,
and implement these features before stabilising the APIs.

So it’s coming, we simply have not yet gotten to implementing it - it’s holiday season, which isn’t helping development speed :-)

2) 
For the benefit of the discussion, example in EventStore: http://geteventstore.com/blog/20130707/catch-up-subscriptions-with-the-event-store/

One thing to keep in mind here is that some Journals would have no problem implementing this, such as Kafka or EventStore - because it’s a built in mechanism to “subscribe to something” in them… See Martin’s Kafka journal and how one can subscribe to a event stream there: https://github.com/krasserm/akka-persistence-kafka#journal-topics On the other hand implementing this in other Journals would be pretty painful / inefficient (cassandra, hbase, …).

A general approach for supporting this in other journals would require:

- a materialized view (on journal entries) in the backend store (compare to user-defined topics in the Kafka journal). Updating this view (or several) can either be done by the plugin itself or by backend store triggers, for example. In the case of the Kafka journal, the plugin updates the materialized view (= user-defined topic). Alternatively, user-defined views don't necessarily need to be materialized, if a backend store can generate them on-the-fly in a very efficient way.

- a consumer that keeps track of what has already been consumed from the materialized view. Kafka consumers, for example, do this automatically (the consumer position can optionally be written to Zookeeper for fault-tolerance). PersistentViews (in Akka Persistence) also track the consumer position on the client side. Storing that position is only possible via snapshotting at the moment. Extending PersistentViews to track the position in a more efficient way (instead of snapshotting) could make sense.

IMO, materialized views together with pull-based consumers that can write checkpoints shouldn't be a big deal for any backend store to support. What Akka Persistence should offer in the future is an abstraction over how custom (materialized) views can be defined and over pull-based consumers. Furthermore, an a.p.PersistentView could then abstract over processor journals and custom views (which would finally cover issue #15004).

Thoughts?


We were playing around with some ideas to expose optional db specific journal functionalities. This would be a good candidate for this.

This request seems to depend on these things by the way: 
* changes in the journal plugins (we some changes there anyway https://github.com/krasserm/akka-persistence-kafka#journal-topics ),
* views over “tags" (as this would essentially be a view over “all”),
* and lastly reactive-streams (views exposed as streams of events).


Thanks for your feedback and keep in mind that no-one said that this module is “done”.
It’s still experimental and this kind of feature requests are exacly what we need and will have to provide to make it stable and usable in all kinds of apps.

Lastly, would you mind creating a ticket for the 2) feature?
Thanks in advance, have a nice weekend :-)

-- 

Konrad 'ktoso' Malawski
hAkker @ typesafe
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

-- 
Martin Krasser

blog:    http://krasserm.blogspot.com
code:    http://github.com/krasserm
twitter: http://twitter.com/mrt1nz

Greg Young

unread,
Jul 26, 2014, 12:03:13 PM7/26/14
to akka...@googlegroups.com, ahjoha...@gmail.com
I think whats being missed here is that Event Store doesn't only support this on a single stream/topic.

Its very useful when building read models as example to be able to read events from many streams (or all streams) joined in a way that gives global ordering (if on a single replication group) or deterministic ordering (historical) if on multiple replication groups.

As example you can imagine when writing you have 500,000 streams (one per inventory item). 
When building a projection you are interested in 5 event types:

InventoryItemCreated
InventoryItemAuditted
InventoryItemRenamed
InventoryCheckedIn
InvnetoryCheckedOut

Regardless of the initial stream they were written to. This is very common in these types of systems.

Cheers,

Greg

Ashley Aitken

unread,
Jul 27, 2014, 12:28:55 PM7/27/14
to akka...@googlegroups.com

Thank you @ahjohannessen for your post, I have had this feeling as well but as I am a novice here I've been trying to learn by asking questions (rather ineptly I might add) as compared to your intelligent post.  

I would like, however, to express some further (probably confused) thoughts. I understand these APIs are experimental and I am not criticising the amazing work of the Akka team, just trying to help and to learn more.  

Some rambling thoughts:

A. I don't know how Akka currently stores its messages for each (Persistent) Actor but it seems to me there may be a possibility to store these messages and persistent events in a combined message/event store (like Kafka, either temporarily or permanently). 

B. There needs to be some way for events (or messages) to activate a passivated Persistent View (or Persistent Actor) with these events (or messages) being filtered from one or many Persistent (or regular) Actors (e.g. a set or type). 

I know the second part of B. has been discussed and an issue raised for it but here I am suggesting it may be broadened to include messages, enabling broadcast messages of a sort - it seems a bit odd to me that Actor messaging is only one-to-one.  

C. There needs to be some way for other applications to easily interface with this message / event store even if they are not Akka or Scala or even JVM applications.  As suggested elsewhere, this is something the store itself can perhaps enable via APIs.

I think that article about logs by the LinkedIn architect strongly supports the fact that logs needs to be at the centre of enterprise applications / systems.  Logs can store persistent events (not just for persistence of Actors) and store message (i.e. data) streams as well.

As Greg's post seems to be alluding, I don't see how we can effectively setup Akka Streams for all the possible processing needed for CQRS Views. Persistent Views need be subscription-based and need a powerful ability to filter what they receive and from which Persistent Actor(s).

In summary, I imagine a (distributed) message store that is used for sending messages to one or more Actors using prescribe or subscribe where some of these messages may be persistent events (used to restore a passivated Persistent Actor or update a Persistent View).

[ I suggest that prescribe is where the actor indicates which actor or actors the messages are to be sent to whereas subscribe is like in traditional pub-sub where other actors indicate they wish to receive messages from one or more actors].

The producer of the messages / events can decide how long the messages / events are kept for.  So for example, for persistence events they can be kept forever, but for regular messaging they may be kept for a shorter period.  

Again, please remember I know very little about what I am trying to talk sensibly about so I apologise if none of this makes sense :-)

Cheers,
Ashley.


Suggested Generalised Actor / Messaging Model (for a laugh)

All messages (some of which main contain events) are persisted in a store (that may be distributed) for a duration that is customisable.  There is some sort of partial ordering of these messages.  

An actor may prescribe to which other actor(s) they are sending a message (or messages).  The destination actor(s) will process the messages in their virtual mailbox one after the other (and be activated if a message arrives whilst they are passivated).

An actor may subscribe to receive all or a subset of the messages from one or more other actors.  The receiving actor will process the messages one after the other (and be activated if messages arrive whilst they are passivated).

Some mechanism may exist to allow the pace of the production of messages by an actor(s) to be controlled by the consumer(s) of those messages.  Actors may replay any set of messages that it has been sent or subscribed to (as far back as the messages are stored).



ahjohannessen

unread,
Jul 28, 2014, 10:08:12 AM7/28/14
to akka...@googlegroups.com, ahjoha...@gmail.com

Lastly, would you mind creating a ticket for the 2) feature?
Thanks in advance, have a nice weekend :-)

Sure Konrad, however I think I'll wait a little bit because getting input from the likes of Greg and Martin helps formulate the issue more clearly. 
It would be awesome to get the opinion from people like Vaughn, Roland, Patrik, Viktor and so on, as well, please share your thoughts guys :).
I believe this "missing link" is very important for Akka Persistence being *generally* useful for CQRS/DDD/ES.

Martin, your suggestion makes very much sense in my book, clever and creative as usual.

ahjohannessen

unread,
Jul 28, 2014, 10:31:30 AM7/28/14
to akka...@googlegroups.com, ahjoha...@gmail.com
Greg, exactly. I think Akka Persistence having such capabilities would make it even more awesome and useful. 
Thanks for chiming in, your opinion on this is very much appreciated.

Ashley, thanks for the kind words. Glad to learn that my worries are not completely mental and just my own nitpicking :)

Konrad 'ktoso' Malawski

unread,
Jul 28, 2014, 4:49:00 PM7/28/14
to akka...@googlegroups.com, ahjohannessen, ahjoha...@gmail.com
Hi everyone,
thanks for your feedback and ideas.

So the stream / view on multiple persistentIds (or “tags” - would solve Greg’s example case) is coming, we just have not yet have had the time to work on it.
One thing that ties in into them is reactive streams. We would like to expose these event streams as akka streams.
Esp. since they provide they provide things like merge / filter / tee which I believe would help a lot in these kinds of event streams :-)

From the streams point of view abstracting if it’s polling or DB-side initiated events the APIs won’t have to change.
I do agree / like Martin’s suggestion that in “normal dbs” (no events when someone does an insert) we should be able to implement this with some housekeeping done by the plugins.

One question about EventStore, in the case of reading from multiple replication groups is the ordering based simply on write-timestramp not-descending order?
The timestamp is obviously skewed a bit (multiple servers/clocks do writes) but in the apps you work with would this be ok as source of ordering in case of the “all events” stream?


PS: Most of the team is on holiday this week, it’s reasonable to expect they’ll chime in some time next week.

Konrad Malawski

unread,
Jul 28, 2014, 5:40:41 PM7/28/14
to akka...@googlegroups.com, ahjoha...@gmail.com
Rephrasing my ordering question actually (it started out as something else and ended up as weirdly worded):
I was thinking if the guarantees should be "time in system" or happens before as known by sequence numbers in concreten ids's (A-1 before A-2, but B-1 before B-2, but I don't care about A and B relation).
Curious about your real world use cases in other words.
Less caring about ordering makes way for faster replays of course - so that's what I'm after here (perhaps thinking to far ahead though).

-- k

Martin Krasser

unread,
Jul 29, 2014, 2:25:50 AM7/29/14
to akka...@googlegroups.com

On 28.07.14 23:40, Konrad Malawski wrote:
Rephrasing my ordering question actually (it started out as something else and ended up as weirdly worded):
I was thinking if the guarantees should be "time in system" or happens before as known by sequence numbers in concreten ids's (A-1 before A-2, but B-1 before B-2, but I don't care about A and B relation).

- a total order per persistenceId based on sequence numbers (= partial ordering in the "all events" stream) is a must have IMO.
- ordering based on timestamps should be an application level concern (= timestamps in application-defined events and (re-)ordering done by application)
- mid/long-term goal: causal ordering (allows moving from eventual consistency to causal consistency). See also Don't Settle For Eventual Consistency.

Curious about your real world use cases in other words.
Less caring about ordering makes way for faster replays of course - so that's what I'm after here (perhaps thinking to far ahead though).

-- k

W dniu poniedziałek, 28 lipca 2014 22:49:00 UTC+2 użytkownik Konrad Malawski napisał:
Hi everyone,
thanks for your feedback and ideas.

So the stream / view on multiple persistentIds (or “tags” - would solve Greg’s example case) is coming, we just have not yet have had the time to work on it.
One thing that ties in into them is reactive streams. We would like to expose these event streams as akka streams.
Esp. since they provide they provide things like merge / filter / tee which I believe would help a lot in these kinds of event streams :-)

From the streams point of view abstracting if it’s polling or DB-side initiated events the APIs won’t have to change.
I do agree / like Martin’s suggestion that in “normal dbs” (no events when someone does an insert) we should be able to implement this with some housekeeping done by the plugins.

One question about EventStore, in the case of reading from multiple replication groups is the ordering based simply on write-timestramp not-descending order?
The timestamp is obviously skewed a bit (multiple servers/clocks do writes) but in the apps you work with would this be ok as source of ordering in case of the “all events” stream?


PS: Most of the team is on holiday this week, it’s reasonable to expect they’ll chime in some time next week.

-- 

Konrad 'ktoso' Malawski
hAkker @ typesafe
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Konrad 'ktoso' Malawski

unread,
Jul 29, 2014, 4:47:07 AM7/29/14
to akka...@googlegroups.com, Martin Krasser

- a total order per persistenceId based on sequence numbers (= partial ordering in the "all events" stream) is a must have IMO.
- ordering based on timestamps should be an application level concern (= timestamps in application-defined events and (re-)ordering done by application)

Agreed, seqNrs are at our core and we’ll stick to them (partial ordering will be in for sure, was thinking if more was needed by app implementors). Bringing in timestamps “in some case” would be inconsistent with the rest of persistence anyway, so pushing it into user land sounds good. 

Actually, since we want to expose this as reactive streams the timestamp ordering could be expressed as `merge(streams, userLandProvidedOrdering)` (we don’t have this yet)… Tempting idea, looking forward to trying out different things there.


- mid/long-term goal: causal ordering (allows moving from eventual consistency to causal consistency). See also Don't Settle For Eventual Consistency.
Thanks! Have not read that one yet - looks very interesting. 
Will catch up with it today :-)

— k

Patrik Nordwall

unread,
Aug 5, 2014, 9:30:43 AM8/5/14
to akka...@googlegroups.com
Hi all,

I fully agree that it is a valid feature.

Would the following sketch work?
- Keep PersistentView poll based, but let it query for events with a specified "tag" instead of a single persistenceId. This requires a new query in journal api.
- Add optional tags parameter to persist and persistAsync, i.e. when storing an event it can be marked with zero or more tags. This requires adding tags field to PersistentRepr.
- Streams on top of PersistentView

The new view query would have to include a Map of persistenceId -> seqNr. The journal actor is supposed to be local, so the size of this Map should not be a problem.

Is it overkill to tag individual events? Should the tags be per PersistentActor instance instead?

As a second step (or immediately?), we could let the journal push events to the view. Most journal implementations will have to implement this with polling anyway, but then we make it possible for a journal to take advantage of native push from the store.
The view would have to request number of events from the journal to handle backpressure (similar to reactive streams).

Regards,
Patrik



--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Ashley Aitken

unread,
Aug 5, 2014, 10:59:24 AM8/5/14
to akka...@googlegroups.com

My use-case is for denormalised PersistentViews (possibly stored in a Document Store).

Will it be possible for a persistent view to follow all persistent actors of a specific type?  For example, CustomerViewManager to follow all events for all Customers so that when an event comes in for a Customer it can rehydrate (if needed) that CustomerView so that it can be updated?

Will it be possible for a persistent view to follow a set of persistent actors?  For example, a denormalised CustomerView following events for the set of songs the customer has purchased so that if details for one of them changes the denormalised view can be updated?

I can't follow the logic for tagging specific events rather than using types of events or how using streams can scale for any of these sort of requirements. I'm thinking of using Kafka for the journal and having Actors that subscribe to topics to do some of the above (is that plausible?).

Finally, I am all for anything that makes views more push-based (even if it is implemented through polling in some journals) because I believe this is central to a system being reactive.  Standard disclaimers apply (re my lack of knowledge and understanding ;-).

Cheers,
Ashley.

PS I enjoyed the paper on causal consistency.

ahjohannessen

unread,
Aug 5, 2014, 11:18:36 AM8/5/14
to akka...@googlegroups.com
Hi Patrik,

I think it is enough to use same "tag" for all events of a particular type of persistent actor instead of allowing different tags per event. What is important is that it is possible to track the logical position (offset) for all events with the same "tag".

Patrik Nordwall

unread,
Aug 5, 2014, 12:57:10 PM8/5/14
to akka...@googlegroups.com



> On Aug 5, 2014, at 17:18, ahjohannessen <ahjoha...@gmail.com> wrote:
>
> Hi Patrik,
>
> I think it is enough to use same "tag" for all events of a particular type of persistent actor instead of allowing different tags per event.
Ok, noted
> What is important is that it is possible to track the logical position (offset) for all events with the same "tag".
I'm not sure I understand this requirement. Do you ask for full ordering among all events with the same tag? That is not possible for scalability reasons.

Time stamps, logical clocks, or such, have to be added to your own event data.

The events will be in order per persistenceId (and there is a sequence number per persistenceId).

/Patrik

ahjohannessen

unread,
Aug 5, 2014, 1:33:55 PM8/5/14
to akka...@googlegroups.com
"I'm not sure I understand this requirement. Do you ask for full ordering among all events with the same tag? That is not possible for scalability reasons."

So, you do not think it is possible for a journal to maintain a logical position per topic / tag ?

"Time stamps, logical clocks, or such, have to be added to your own event data."

That requirement is exactly what makes akka persistence painful, almost useless, in a DDD / ES / CQRS setup, because read side / catchup-subscriptions currently require to maintain snr per persistentId.

"The events will be in order per persistenceId (and there is a sequence number per persistenceId)."

That I am aware of and that is not enough in a DDD/ES/CQRS setup.

I see no value in tags without ability to track offset pr topic / tag.

Patrik Nordwall

unread,
Aug 5, 2014, 3:11:34 PM8/5/14
to akka...@googlegroups.com


> On Aug 5, 2014, at 19:33, ahjohannessen <ahjoha...@gmail.com> wrote:
>
> "I'm not sure I understand this requirement. Do you ask for full ordering among all events with the same tag? That is not possible for scalability reasons."
>
> So, you do not think it is possible for a journal to maintain a logical position per topic / tag ?

How do you define the order? Is it based on time stamps in the persistent actors? Is it based on some feature in the backend store?

/Patrik
>
> "Time stamps, logical clocks, or such, have to be added to your own event data."
>
> That requirement is exactly what makes akka persistence painful, almost useless, in a DDD / ES / CQRS setup, because read side / catchup-subscriptions currently require to maintain snr per persistentId.
>
> "The events will be in order per persistenceId (and there is a sequence number per persistenceId)."
>
> That I am aware of and that is not enough in a DDD/ES/CQRS setup.
>
> I see no value in tags without ability to track offset pr topic / tag.
>

ahjohannessen

unread,
Aug 5, 2014, 3:25:13 PM8/5/14
to akka...@googlegroups.com
"How do you define the order? Is it based on time stamps in the persistent actors? Is it based on some feature in the backend store?"

I do not think time stamp precision is important for this, I would imagine a logical position / offset as EventStore/Kafka do. I imagine those are based on integers / longs.

I think it depends on the journal, something simple like leveldb would need help from journal, whereas something like kafka / eventstore would probably have something that one could adapt and get easier implemented.

Martin Krasser

unread,
Aug 6, 2014, 4:08:14 AM8/6/14
to akka...@googlegroups.com

On 05.08.14 21:25, ahjohannessen wrote:
> "How do you define the order? Is it based on time stamps in the persistent actors? Is it based on some feature in the backend store?"
>
> I do not think time stamp precision is important for this, I would imagine a logical position / offset as EventStore/Kafka do.

Kafka maintains an offset for each partition separately and a partition
is bound to a single node (disregarding replication). For example, if a
Kafka topic is configured to have 2 partitions, each partition starts
with offset=0, and, if you consume from that topic you only obtain a
partially ordered stream because Kafka doesn't define any ordering
across partitions (see Kafka docs for details). This situation is
comparable to other distributed datastores. For example, Cassandra only
maintains an ordering for entries with the same partition key (i.e. for
entries that reside on the same node).

In general, if you want to maintain an ordering of entries, you either
have to use

- a single writer in the whole cluster (which is the case for persistent
actors) or
- keep entries (that are generated by multiple producers) on a single
node so that the server is able to maintain a local counter (which is
what Kafka does with offsets for each partition separately)

Both limits scalability (as already mentioned by Patrik) for both write
throughput and data volume. It may well be that some applications are
fine with these limitations and benefit from a total ordering of entries
per "tag" but this should not be the default in akka-persistence. IMO,
it would make sense if akka-persistence allows applications to configure
an optional ordering per "tag" so that users can decide to sacrifice
scalability if total ordering is needed for a given tag (and it is up to
journal implementations how to implement that ordering).

As already mentioned in a previous post, causal ordering could be a
later extension to akka-persistence that goes beyond the limits of a
single writer or co-located storage *and* allows for better scalability.
I wish I had more time for hacking on a prototype that tracks causalities :)

> I imagine those are based on integers / longs.
>
> I think it depends on the journal, something simple like leveldb would need help from journal, whereas something like kafka / eventstore would probably have something that one could adapt and get easier implemented.
>

ahjohannessen

unread,
Aug 7, 2014, 2:01:26 PM8/7/14
to akka...@googlegroups.com
Interesting idea, especially since not all of us think in terms of scalability.
I agree with your opinion wrt optional ordering per tag. It would be nice to
find a middle-ground that made sense wrt scalability *and* the need of
being able to use offset per group of persistent actors of same type.
 
As already mentioned in a previous post, causal ordering could be a 
later extension to akka-persistence that goes beyond the limits of a 
single writer or co-located storage *and* allows for better scalability. 
I wish I had more time for hacking on a prototype that tracks causalities :) 

That seems to be a great solution, but something tells me that this is 
not going to happen soon.

So, my conclusion is that I will go with using a single persistent actor as a journal, 
thus getting a "tag" + seqNr, and use persistent views to what I otherwise would 
use a normal persistent actor for, e.g. as in this sketch: 
and use snapshotting to reduce recovery time of persistent views.

Patrik, on a second thought, perhaps tags per event would not be that silly, at
least it gives more flexibility. However, I suppose it all depends on what you guys
want to use a "tag" for.

Vaughn Vernon

unread,
Aug 7, 2014, 2:34:15 PM8/7/14
to akka...@googlegroups.com
Can someone (Martin?) please post some rough performance and scalability numbers per backing storage type? I see these DDD/ES/CQRS discussions lead to consumer-developer limitations based on performance and scalability, but I have not seen any actual numbers. So please post numbers in events per-second as I would prefer not trying to hunt down such numbers in old posts.

I keep saying this, but it seems without much success, but akka-persistence, even at its slowest, probably still performs 5x-10x better than most relational stores, and at its best perhaps 500x better. I often poll my IDDD Workhop students for realistic transactions per-second numbers and most of the time it is at or below 100 tps. Some of the higher ones are around 1000 tps. A few students identify with 10,000 tps. As far as I know, akka-persistence can regularly perform in the 30,000 tps. With some store tops out at, what, 500,000 tps? The point I am trying to make is, even if you put a singleton sequence in front of your slowest possible store, let's assume that it could cost 30%. That would still leave performance at 20,000 tps on your slowest store, which is 20x faster than many, many enterprise applications. (There are faster ways of producing incremental sequences than using a singleton atomic long.)

I vote that you need to have a single sequence across all events in an event store. This is going to cover probably 99% of all actor persistence needs and it is going to make using akka-persistence way easier.

A suggestion: rather than looking so carefully at akka-persistence for performance and scalability increases, I think a lot could be gained by looking at false sharing analysis and padding solutions.

Vaughn

ahjohannessen

unread,
Aug 7, 2014, 2:57:10 PM8/7/14
to akka...@googlegroups.com
On Thursday, August 7, 2014 7:34:15 PM UTC+1, Vaughn Vernon wrote:

I vote that you need to have a single sequence across all events in an event store. This is going to cover probably 99% of all actor persistence needs and it is going to make using akka-persistence way easier.

If that was made optional + tag facility, then those that see it hurts scalability would opt-out and others would opt-in and pay the extra penalty.

Patrik Nordwall

unread,
Aug 7, 2014, 3:29:33 PM8/7/14
to akka...@googlegroups.com
Ok, I think it's a good idea to leave it to the journal plugins to implement the full ordering as good as is possible with the specific data store. We will only require exact order of events per persistenceId.

Any other feedback on the requirements or proposed solution of the improved PersistentView?

/Patrik

Vaughn Vernon

unread,
Aug 7, 2014, 6:21:14 PM8/7/14
to akka...@googlegroups.com
I am sure you have already thought of this, Patrik, but if you leave full ordering to the store implementation, it could still have unnecessary limitations if the implementor chooses to support sequence only for persistenceId. One very big limitation is, if the store doesn't support single sequence you still can't play catch-up over the entire store if you are dependent on interleaved events across types. You can only re-play all events properly if using a global sequence. Well, you could also do so using casual consistency, but (a) that's kinda difficult, and (b) it's not supported at this time.

Vaughn

Gary Malouf

unread,
Aug 7, 2014, 8:53:14 PM8/7/14
to akka...@googlegroups.com
I don't see it mentioned on this particular thread, but I feel creating reliable sagas across processors (Aggregates) is a real challenge right now as well.  Having a clearly documented way to do this is critical IMO to creating a more complex and reliable CQRS-based apps.

Patrik Nordwall

unread,
Aug 8, 2014, 4:45:30 AM8/8/14
to akka...@googlegroups.com
On Fri, Aug 8, 2014 at 12:21 AM, Vaughn Vernon <vve...@shiftmethod.com> wrote:
I am sure you have already thought of this, Patrik, but if you leave full ordering to the store implementation, it could still have unnecessary limitations if the implementor chooses to support sequence only for persistenceId.

As a user you would have to pick a journal that supports your needs in this regard.

/Patrik
 

Ashley Aitken

unread,
Aug 8, 2014, 6:00:13 AM8/8/14
to akka...@googlegroups.com

On Friday, 8 August 2014 08:53:14 UTC+8, Gary Malouf wrote:
I don't see it mentioned on this particular thread, but I feel creating reliable sagas across processors (Aggregates) is a real challenge right now as well.  Having a clearly documented way to do this is critical IMO to creating a more complex and reliable CQRS-based apps.

+1  I have mentioned this before in another thread.  IMHO, CQRS depends heavily on an event queue for the read model, sagas and for integration with other systems, which subscribe to the events etc.  A journal just for processor persistence is not enough, again in my opinion.   

This is why I am excited about Kafka.  It seems to be able to be a journal (with some restrictions or configured to maintain events forever) and an event queue that other processes and systems can subscribe to.  

Ashley Aitken

unread,
Aug 8, 2014, 6:10:42 AM8/8/14
to akka...@googlegroups.com


On Friday, 8 August 2014 16:45:30 UTC+8, Patrik Nordwall wrote:

On Fri, Aug 8, 2014 at 12:21 AM, Vaughn Vernon <vve...@shiftmethod.com> wrote:
I am sure you have already thought of this, Patrik, but if you leave full ordering to the store implementation, it could still have unnecessary limitations if the implementor chooses to support sequence only for persistenceId.

As a user you would have to pick a journal that supports your needs in this regard.

I agree with you both.  With Vaughn I agree that we need a global sequence (although I understand this is very impractical within distributed systems) and with Patrik that it should be up to the store implementation (with the possibility of store configuration determining this).  It would be up to the store (and the developer's choice in configuring that store) to determine how close to causal or total ordering the sequence will be.

So for example, with general use of Kafka the store could provide events from each partition for a topic (if I understand correctly how Kafka works) in a round-robin fashion, which wouldn't be properly sequenced, but it may be manageable for some requirements.  If a developer wanted more strict "global" sequencing then they could configure the store to have a single partition,with the scaling implications that would have. 


Gary Malouf

unread,
Aug 8, 2014, 7:15:28 AM8/8/14
to akka...@googlegroups.com
One of the arguments for CQRS/Event Sourcing combo has been that it allows you to optimize reads and writes independently for high throughput.  For many people however (including us) we want the command/query separation + the sequence of events for just the design benefits.  Sagas are one of the critical pieces of this, but there need to be guarantees that if one event occurs out of one aggregate/processor + 3 other aggregates/processors are listening for it, they will get it barring a catastrophe.

Unless one simply polls all of the processor persistent views manually today, this guarantee just is not there out of the box.


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/SL5vEVW7aTo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Prakhyat Mallikarjun

unread,
Aug 10, 2014, 9:20:21 AM8/10/14
to akka...@googlegroups.com
How the performance of event sourcing will be impacted in a highly oltp application which generates million of events?

There is a big overhead to store these events and also state of system in durable db's for read or other purposes. Also don't you think event sourcing apps will consume huge disk spaces and also it consume space faster wrt highly oltp application.

Prakhyat Mallikarjun

unread,
Aug 10, 2014, 10:29:02 AM8/10/14
to akka...@googlegroups.com
Hi Gary/akka team,

I have requirement in my app that changes to one aggregate root affects many more aggregate roots and all have to be in sync. I keep seeing in discussions name of sagas being refered. Will really sagas help to resolve this condition? Can I find any articles in this regard?

Are there any other design approachs?

Gary Malouf

unread,
Aug 10, 2014, 11:36:13 AM8/10/14
to akka...@googlegroups.com
Hi Prakhyat,

We are building a CQRS/DDD-oriented configuration system based on akka persistence and are running into the same modeling issues.  A few characteristics of our specific case:

1) We do not expect a high volume of commands to be submitted (they are generated via a task-based user interface that will have on the order of 30-50 users).

2) We have a number of cases where the output events of one aggregate must eventually trigger a change on another aggregate.  This use case is what I am referring to as 'sagas'.  There are two concerns that need to be addressed: guarantee that the messages will eventually get delivered in the event of system error/failure and the ability of the receiving aggregates to be able to order/handle them.

3) We use the cassandra connector for akka persistence with a 'quorum' consistency level for writing and reading.


Since we are not dealing with high throughputs, a less performant but a safer solution to addressing the concerns in (2) are possible for us without introducing another system to an already complicated infrastructure.  We can have the aggregates that may receive events from others reliably query the views for the aggregates they depend on (reading from Cassandra) directly to ensure messages are not missed and come in order.  

In our view, putting the weight on the consumer to deal with out of order messaging was painful for us.  I've read the blogs arguing for being able to deal with this, but it just felt like something the framework should handle for you in the end.

The reliable, in-order messaging concern also extends to 'stream consumers' in general.  For this, we are looking at building a service that reads from all views (ordering across processors/aggregates by timestamp), assigns a 'global' sequence number to the event, and persists this in a stream.  We then can have our consumers read from this stream with confidence that events will arrive in order and not be missing.  That service could run as a singleton in an akka cluster for reliability - performance is not a concern for us at our expected traffic.

Both of the cases, highlight the need to have a reliable messaging integration to avoid the hoops we will be jumping through.



ahjohannessen

unread,
Aug 10, 2014, 11:37:23 AM8/10/14
to akka...@googlegroups.com
Prakhyat, please stay on topic or start a new thread.

Thanks.

Prakhyat

unread,
Aug 10, 2014, 12:20:25 PM8/10/14