Grouping events, like git?

337 views
Skip to first unread message

David Leangen

unread,
Dec 20, 2019, 4:14:39 PM12/20/19
to ddd...@googlegroups.com

Hello,

Just wondering if anybody has ever done anything like this.

One thing I really like about how git works is that it persists changes as commits. In each commit is several changes.

In the past, I have used git as my "event store". At first it was just as a poor man’s solution to get going with something “close enough” to event sourcing. However, the approach really grew on me.

I would make a number of changes (via “commands”) to the system. When I was pleased with my work, I would commit the changes.

What I particularly like is that a commit is akin to a “session”. From within a session, there can be several commands. The system determines what the commands are, but me, as the user, I have control over what I want contained in a session.

From the model’s perspective, just like how files are agnostic to commits (they only care about deltas), an event-sourced model is agnostic to sessions. It only “cares” about the events in the sessions and does not care about session boundaries. However, from the perspective of managing work, the user has better control of the work by being able to manage sessions, like like one can manage commits in git.

As the user, I can rollback my sessions, or even play out alternate scenarios (git branch) and decide whether or not to integrate the scenarios into the model (merge).

It is really practical, and I want to try to be able to maintain these practices even as I move away from my poor man’s solution.

I already considered having events like “session started / session ended”, but I don’t think that will cut it. Those are just simple entries in the event stream. I am more interested in something like how git works, whereby the events are structurally part of a session, and the session is a first class citizen of the event stream.

A potential hack could be simply to append a session ID to each even in the session, but I’m not sure how effective that would be.


Has anybody ever tried something like this? I am very interested in learning about how you implemented your solution.


Thanks!
=David


Lee Hambley

unread,
Dec 21, 2019, 3:36:55 AM12/21/19
to ddd...@googlegroups.com
Hi David,

It's certainly not ready to use, but it's also not very far away (I have a branch with some missing pieces implemented), but this is my WIP data store for evented architecture.


The README is pretty comprehensive, I think the model is what you are describing.

I lost motivation a bit when I changed jobs to a shop that uses exclusively Avro and picked up a bit of a negative attitude towards JSON, Go also makes iterating over mixed type collections tricky (but the answer isn't "generics"), and the right "resolution" of aggregates becomes pretty important to make a system like this usable.

Happy to say more - just on mobile right now so keeping things brief.

- Lee

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dddcqrs/859AF45D-7467-4F3D-9258-5A3CF5078FEA%40gmail.com.

Danil Suits

unread,
Dec 21, 2019, 10:18:16 AM12/21/19
to DDD/CQRS
One thing I really like about how git works is that it persists changes as commits. In each commit is several changes.

My understanding is that this isn't quite right.  A commit doesn't have changes, it has a reference to the root of the working tree, and references to "parent" commits, if any.  Which is to say, commits hold snapshots, not events.  A demonstration: find a blob object in your .git/objects folder, and then zlib inflate it to observe the contents.  I believe you will find that the blobs are complete copies of your files at some point in their lifetime (note: things get a bit weirder once the objects have been packed).

Having immutable blobs and trees in an object store is great - reasoning about things is simple, and sharing information about "the world" is really easy.

What I haven't found yet is a natural alignment of the semantics - how do we apply the semantics of blobs and trees to the things we are interested in, such that things become easier/more efficient.  Are blobs events? are they streams? what does it mean when we have blobs and trees as siblings in the hierarchy?  Does every committer have authority to modify any tree?  What does it mean to change the name of a tree entry?

Danil

Lee Hambley

unread,
Dec 21, 2019, 4:21:57 PM12/21/19
to ddd...@googlegroups.com
In case my README isn't super clear, let me say a few words about my model, it's heavily modelled on Git, but also my ~7 years of building Event Sourced systems in various guises.

Every interaction with the system goes through at least two APIs:

- startSession() (can and will be anonymous, session is an aggregate in its own right)
- apply()

what follows is awful shot-from-the-hip pseudo code, because the Go code in the repos is quite verbose.

A session is considered ended when you stop sending to it. Because a session is a first-class concept you ALWAYS need to have one, but you can operate on it:

01 session = startSession()                                                // session/0xffff
02 session.apply(session.Path, Authenticate({some params here}))           // you write and register an "Authenticate" command targeting the "Session" aggregate
03 result = session.apply(shoppingBasketPath, AddItem({some item params})) // generate a random one, or pass empty string, you'll get one assigned

At point 1 a session is created, all entities have URN-like path names in my system, `session/0xffff`
At Points 2&3 a "checkpoint" is created (analog "commit").
Each checkpoint has headers akin to a Git commit, starting a session but not using it causes no side effect.
Each `apply()` call takes a n aggregate path to target (the aggregate root) and a payload of params.

A successful call to `apply()` returns a new collection of events, the "repository" model here takes those events and checksums them and puts them into CAS (content addressable storage)- same events such as "accept ToS" or "viewed own profile" or whatever checksum the same, and thus take up no duplicate storage. Events exist in the object database without timestamps/etc.

The "collection" of events is turned into an "affix" by the repository, a file containing lines like:

session/0xffff authenticate 9b3b63adc3fb2ffc382ab1f66eb85ff3eb92c6c0
session/0xffff activateUserProfile db0759782ad1ab4f2715530defc99fd595173f04

in theory an affix can contain events for multiple aggregates, incase you have some command that creates a user, and a forum, and a forum membership, or whatever, they become events in one affix.

In the object model you then get a Checkpoint which points to an Affix which points to one or more events. All the timing data, and data about the current session is stored in the Checkpoint headers. Event storage is efficient as because samey events are never stored twice. The affix is flexible enough to allow

For getting data out of the system:

In the synchronous "write" side of the model where you cannot afford eventual consistency, entering a call to `Apply(<aggregate path>, ...)` will get a lock on the object DB for that name, and return you a "rehydrated" object which will be the "context" of applying your `Authenticate` or `AddItem` command. The repository pattern entrypoint keeps indexes which affixes have events for an aggregate, and what order those commits go in (transparent to the end user) so the lookups are quite quick.   (this is part of what I was working on when I lost momentum)

Consumers who are subscribed to something like `session/*` downstream will must handle two kinds of "callbacks", one when discovering a new matching aggregate, and one when receiving a new batch of events from an affix for an object they are already tracking.  So when you write something to say.... project user friendships lists into Redis so you can do a fast "are these people friends" lookup in your profile rendering projection, you can subscribe to `user/*:{Add|Remove}Friend`, and you will get the repository feeding you multiple "threads" (channels, in Go) of events, so you would quickly ramp up to N concurrent workers where N is the number of `user/*` in the database, but you'll also get `+1` for every *new* user that comes into existence, and you (can't remember if filtering by event name pattern works on that part of the code but it should have been pretty trivial to add).

Interesting thing in the design (not implemented yet) - is that I plan for the server to track the cursors, so you can know the latency of all the consumers, so if they begin to fall behind that's something the repository server can register as "back pressure", and maybe alert on, or at least keep metrics on, so clients can stay simple.

There's some interesting behaviour modelled (but incomplete) about "restarting" a consumer from the server, safely. Because consumers have to deal with "this is a new event for a new thing" and "this is a new event for an existing thing" anyway, in the rebase case, or just a simple "cursor reset", the consumer client would go back into the "this is a new event for a new thing" mode, so if you deployed a broken projection, you could simply redeploy, go to a web interface and hit restart, and it'd start from 0.

Concurrency controls for consumers isn't built yet, and unfortunately all of this only works within one process in Go (extensive test suite) -- but I did write a WIP app in it, and it basically worked.

I desperately want to finish it, and allow the clients to be written in something other than Go, because dealing with generics in Go is really, really freaking painful.



--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

Johan 't Hart

unread,
Dec 23, 2019, 7:44:26 PM12/23/19
to DDD/CQRS
Hi David,

I think the analogue of ES with git is that, a git commit would be equivalent to a single event. Both are the transaction boundary. Both describe a single complete unit of change in the data.
I think, if you feel the need to group different events together in one session or transaction if you will, then that is a smell that you modelled your events/aggregates wrongly.
The art is to design your events such that every single event stands for one complete transaction.
Or you may think of using the saga pattern when you think the transaction really needs to cross aggregates.
Not saying that is an easy job. But I think what will help you is what you learned when you were working with git. Maybe you can start with basing the name of the events on what you would put in the commit text.

Greetings,
Johan

Johan 't Hart

unread,
Dec 23, 2019, 7:55:30 PM12/23/19
to DDD/CQRS
Just realized that I didn't cover your wish to try out different scenarios and be able to undo then again.

In ES it usually is like, just as in the task world, what is done is done. If you want to undo something then make a new event that does just that, if that is possible.

Thing with git is that you can view it also as being a single projection that is updated even before the commit. But in ES the projections are usually updated after the commit. So in that world it is hard to undo something that is committed, or try things out before you committed it. The later is probably possible though...
Anyway, why leave git in the first place if it works for you?

david....@gmail.com

unread,
Oct 6, 2022, 8:20:00 PM10/6/22
to DDD/CQRS
Time just kinda got away from me. I must have slipped into a pandemic wormhole... but I am still interested in this topic. Thanks for the replies.

(BTW, I don't see much activity on this list. Have people moved away to something newer and shinier? The stats show 4620 members! I have only counted 4 posts so far this year, not including this one.)

--> Lee Hambley... wow!

> It's certainly not ready to use, but it's also not very far away (I have a branch with some missing pieces implemented), but this is my WIP data store for evented architecture.
> The README is pretty comprehensive, I think the model is what you are describing.

Impressive! This looks really interesting, but the last commit was 4 years ago. Are you still keeping up with this project? I will contact you directly offline, if you don't mind.

--> Johan 't Hart, thanks for the thoughts.

> I think the analogue of ES with git is that, a git commit would be equivalent to a single event. Both are the transaction boundary. Both describe a single complete unit of change in the data.

I am thinking of the commit as something a little different. It is related to transactions, but it is more like a grouping of transactions.

An analogy would be when I make a purchase. I could decide to purchase items A, B, and C as 3 separate purchases, as two purchases (one of AB+C, AC+B, BC+A), or as a single purchase. In other words, any purchase I make has a number of permutations derived from the number of items I want to buy. In this analogy, I would call the acquisition of an individual item a “transaction” in this analogy, and each purchase a “session”. The way that each item is acquired is a well-known business transaction, as so is “understood” by the system. How I decide to group the transaction is up to me, the “user” of the system.

(In anticipation: I am not trying to model the well-trodden webstore domain, so please do interpret my analogy that way, as it is not helpful. I am using the analogy to express my idea, not to debate the proper way to model a webstore. Thanks!)

If you understand the idea of what I am trying to express by this analogy, then it is clear that when doing work, there are numerous possible permutations of how to group individual business transactions. In your approach, you are in effect suggesting that every possible permutation be modeled as a transaction. This would potentially mean a huge number of possible transactions. I do not think that this would be practical.

Where to draw the boundaries of a transaction is part of the ongoing debates relating to domain design. We don’t want to try to model reality, we just want to make the model “good enough” so that it is practical: not too costly to implement, and as useful as it ought to be. I would not want to even try to model all of these permutations.

Let the user of the system decide what their intent is, based on a composition of agreed, modeled, and implemented transactions. Group this intent into a session. The system will not know the difference.

> I think, if you feel the need to group different events together in one session or transaction if you will, then that is a smell that you modelled your events/aggregates wrongly.

For the reasons I mentioned above, I disagree. It is not always practical or desirable to try to model every permutation in the universe, just because it is ideally correct. I don’t think that trying to make a domain “pure” is the right objective. The objective should be “communication and mutual understanding” in a way that is practical (and thus achievable).

Modeling a given permutation because a user *may* work that way, or needs to perform that exact permutation exactly once, that sounds more like a design smell to me. :-)

Transactions are the common denominator. Sessions are decided purely by the user.

> The art is to design your events such that every single event stands for one complete transaction.

I agree! That is exactly the definition of a "transaction". I am talking about something a little different, though, that I am calling a "session". (Perhaps there is a better word?)


> Or you may think of using the saga pattern when you think the transaction really needs to cross aggregates.

I believe that a saga is only necessary to coordinate a long-running transaction. That is not quite what I am trying to express here.


> Not saying that is an easy job. But I think what will help you is what you learned when you were working with git. Maybe you can start with basing the name of the events on what you would put in the commit text.

You mean like “Updated Dealings with Customer A, including adding an item to the offer, adding a new member to the purchasing team, and added a discount”? Yuck. My session grouped together several transactions that were very specific to my dealings with this particular customer. I think it is safe to assume that each grouping is custom and unique. There may indeed be patterns, but in that case the patterns could be 

Perhaps there's an idea... ad-hoc transaction? I.e., a transaction that is not modeled by the business, but is performed by the user. Not sure this goes in the right direction, though...
You may be right. Incorporating git into the system could be an interesting approach.


Now, let's see if this list is still alive. 😀

Rickard Öberg

unread,
Oct 6, 2022, 9:52:49 PM10/6/22
to ddd...@googlegroups.com
Still here!

So, I’m currently working on a new Java library, Xorcery, which includes (among many many other things) a DDD/CQRS/ES part. The model I am using for commands and events is Command->List of events, so what gets committed for each command is a list of events. If the transactional boundary is the equivalent of a repository (i.e. “large”) then that would correspond, I think, to what you are asking for.


You’ll want to look at the domain events, eventstore, and neo4j projections, to get a sense of what’s going on related to your question.

This is also WIP, but we’re using it for a customer project, so actively worked on.

/Rickard

Sent from my iPad

On 7 Oct 2022, at 08:20, david....@gmail.com <david....@gmail.com> wrote:

Time just kinda got away from me. I must have slipped into a pandemic wormhole... but I am still interested in this topic. Thanks for the replies.
--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

David Leangen

unread,
Oct 9, 2022, 2:13:44 AM10/9/22
to ddd...@googlegroups.com
Hey Rickard!

Been a long time. Thanks for the message.

The framework looks interesting, though with the sparse documentation I had a bit of trouble wrapping my head around it. The examples look really declarative and minimalistic, which is pretty neat.

How do you pronounce “xorcery” anyway?


Cheers,
=David

Rickard Öberg

unread,
Oct 9, 2022, 4:11:01 AM10/9/22
to ddd...@googlegroups.com
Hey David!

So, this is still in “make it work” territory (i.e. first phase), so things are changed around constantly. I’m updating it all for the JPMS system now, so expect things to move around quite a bit. Not suitable for anyone else to use, other than as possibly inspiration for your own experiments. Hence the lack of docs, tests, etc. It’s a preview, at best.

Pronounced as “sorcery”, but with a light “ks” in the beginning.

Hope you’re well!

cheers, Rickard

Sent from my iPad

> On 9 Oct 2022, at 14:13, David Leangen <david....@gmail.com> wrote:
>
> Hey Rickard!
> --
> You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dddcqrs/601E1FE5-ECE6-4653-B64D-0BBE5D484734%40gmail.com.

Alexey Raga

unread,
Oct 10, 2022, 6:12:06 AM10/10/22
to DDD/CQRS
I _believe_ something like that was done by Jet when they used Azure CosmosDB as an event store. If I understood correctly, their Event Store is a series of CosmosDB documents, every "write" for them is strictly one CosmosDB document, and each document it may contain 1 or more events in it.

Maybe it can echo to your Git analogy: a "commit" is a document, that contains 1+ "changes" as events?

I do not know, however, if they were doing it for perf/optimisation reasons or for "logical" reasons like your "session/transaction" example. I _think_ that it was mostly optimisation, though. But it seems that it can be done for whatever reason.

P.S. I haven't really looked at their code, but it is open sourced, so...
Reply all
Reply to author
Forward
0 new messages