Dynamic Schema wiith CQRS/DDD/Event Store possible?

663 views

Skip to first unread message

Patrick Heeney

unread,

Mar 26, 2015, 10:54:13 PM3/26/15

to ddd...@googlegroups.com

I am working on building my first project using as much CQRS/DDD/Event Sourcing as I can. I am building an api based backend that stores data similar to firebase (schema less, key/value I believe). I am struggling how to implement the Domain Event aspect since the schema is dynamic and created via the API. Here is what I am thinking so far:

Event sourcing sounds perfect since I would have a full history of the changes and evolution of not just the schema but the data as well. I am thinking of using postgres or relational db for the event store for atomic transactions. The issue is I don't know the context of what the data is, so I can't generate a meaningful DomainEvent. I could do something like {Entity}ChangedEvent and store a json patch of the change set. Then when creating a projection I could analyze the change set and handle it like any other event. So a product example might be:

{fields: {productId: integer, name: text, quantity: integer}}

POST /data/product {data: {productID: 1234-1234-1234, name: Test, quantity: 1}}

POST /data/product/1234-1234-1234 {data: {quantity: 2}}

I could also generate an event off this like {Entity}ChangedQuantityField and generate one for each of the changed items? But when replaying these events I could still get the business knowledge about how many times items were removed, changed quantity, etc and all the other benefits of event sourcing. It just seems like it will be more difficult to get the meaning from the data until we know the context.

The other issue is that the admin interface is primarily CRUD based since we don't have any context. We know the schema so we can perform basic validation and even some basic business rules, but they would be editing fields on a screen and submitting them. Without a task based UI and having a dynamic schema, I fear the only events I can have are CRUD based ones, which I am not sure how it would impact the Event Sourcing. It seems like I would have to replay through more events because they would all be centered around a few for each entity.

I don't know how much DDD I could really implement this way since the schema is dynamic and the business rules are somewhat dynamically generated as well. So there would not be any $entity->setQuantity(2) as it would be API based. They may not have a quantity field or they may have one for each warehouse.

To implement more of the CQRS side, the API would dispatch the commands and the handlers would take care of the event sourcing aspect. I am then thinking I have something that listens to the event stores and generates the models into a NoSql database like MongoDB. So the product example above would pull in any relation fields and store it all in a single document and other relevant view data. This would make it quick and trivial to generate the read aspect as long as I can keep them in sync.

I have yet to figure out the best way to handle the eventual consistency from the admin GUI since traditional CRUD based apps are used to seeing it right away. Maybe something like gmail where you fire and forget it, maybe require the user to refresh, or setup a timer to fetch for any new values in the list view. I have seen some suggestions to fake it, which could be easily enough with a ReactJS admin where you just need to concat the new item to the store and update the state.

Does anyone have any suggestions for improvements, concerns, things to look out for?

Patrick Heeney

unread,

Mar 27, 2015, 3:21:41 PM3/27/15

to ddd...@googlegroups.com

I keep looking at this thinking I am maybe over engineering it. I am trying to see if this can all be simplified since I want a history of changes and to be able to analyze that history to determine the why.

I originally thought I would just store the data in a nosql database and add a validFrom validTo date for every change. This way I could easily pull up the latest, but also have the history of that object. The full version of the json object would be stored for each version. This appears to be essentially building an Audit Log but for that specific entity. Intent can still be gathered from analyzing the diff chang set between versions.

However in other areas of the API, we have the ability to "checkout" some of these entities. So they would fall more in line with DDD by being able to Entity.setAddress(), Entity.setQuantity(1), Entity.setPaymentMethod(), Entity.Pay() which could trigger the appropriate domain events, and we can no longer just store a "Version" of the entity and need the events.

I keep coming back to needing state and events, so it seems my choice is to make a synchronous transaction that does both, push the event sourcing to the secondary store, or build the state from the events aka event sourcing.

It seems clear the event sourcing is a solution that is the best scenario in this case. My aversion to the idea just stems from not having a clear direction in how all the pieces fit and lack of real world examples of all the pieces. I still don't know if document based data like cms pages is an ideal candidate for event sourcing, but if the rest of the system needs it, I would rather focus on applying it to everything.

Thorsten Krüger

unread,

Mar 30, 2015, 2:04:02 AM3/30/15

to ddd...@googlegroups.com

Interesting, we're currently doing something quite like that. It's basically a key-value store that attaches values to fields organized in an open schema, using namespaces to organize access. What made me wonder whether Event Sourcing and DDD are concepts that fit was the size of the data, being some hundred million instances of essentially only two to three Aggregates, which will often be updated and accumulate history fast. Plus, it is a backend service, not designed to attach a UI to it.

Maybe you can still pick up something useful from a short report on how it feels like, and we can ping pong some ideas on the cruddiness of things.

If I look at it now, the CQRS part of it still makes me happy. The flexibility we get from the decoupling has already helped on several occasions, on changing requirements as people thought of downstream use cases for the store. I still struggle here and there, for example the possibilities and different patterns of handling data with Cassandra allow to implement CQRS differently than what my reading made me think is "normal". Still, knowing that I can whip up new read model projections when new use cases pop up make me sleep better. At least as soon as I find a decent way to replay the events from a select * in Cassandra while stuff still getting inserted...

Being CRUD oriented might be the biggest hindrance in also adding DDD to the mix. We started with more fine grained events, like AttributeAdded (declarative, open schema), AttributeValueSet. However, with all folks around us being used to CRUD and data driven applications, this now transforms to being more like Create and Update events, with the finer grained changes embedded. This seems to fit the mental model of our users and data scientists nicely, so I guess it's a good thing.

Just having an automated audit log and history built in was an eye opener for some, and you can really see them delving into the new options there. Sometimes, the biggest benefit for others comes in parts where you didn't expect it.

So in the end, CQRS seems very useful, giving scalability and flexibility of implementation. The DDD part of it is mainly a source for events, and currently has almost no logic in it. Still, it is soothing to know we have a place where we can add it, and we can rip it out if complexity doesn't arrive soon. My gut feeling is that if I had something like "checkout" coming, which smells like a more involved process than updating a couple fields, I would either think of putting it in the domain, or in a downstream service, if you have that option. Unfortunately, I am just beginning to get experience here.

ES, the possibilities it gives you, I wouldn't want to miss. For me, being event-driven is one of the best approaches to link services together. More interesting is the tension between a do-it-yourself implementation and just piggybacking on something like spark, if it's only about CRUD feeding into read models.

I wouldn't worry too much about having to sell the eventual consistency in the UI. Often, people are more used to it than they themselves know, in their daily work, and if you have a way to detecting concurrency issues, reporting them a second or two later than right now is fine, too. Having a conversation about this with some of the potential users should give you a good impression of how much trouble they will have accepting it.