Would good CQRS middleware need a pub-sub system to and from the datastore?

64 views
Skip to first unread message

Gordon Hutchison

unread,
Feb 19, 2019, 2:24:07 AM2/19/19
to Eclipse MicroProfile

My motive in asking this question is to help answer the more abstract question:

   Do we need a MicroProfile Reactive Datastore spec?

The term 'Reactive' came from the 'reactive extensions' for .net and was then fundamentally a pub-sub interface. Back-pressure and Java generics (allowing functional stream composition) were layered on top.

CQRS

In a CQRS system we might have streams of events that represent commands arriving
from a remote system on a kafka queue or state change requests from a CRUD application's backend-for-frontend for example.

These will change the local state stored in the datastore.

We can also serve query requests, often with pre-canned and maintained 'views'
that might fit well with individual queries. (cf https://martinfowler.com/bliki/ReportingDatabase.html )

In designing such system where the services hang together one needs to be able to answer:

   When commands change the state which views (reporting databases) need to get updated?
  
   When commands change the state which kafka events/topics need to get posted?

Is knowing these and actioning them the responsibility of each application 'command' microservice?

The obvious answers is no, that is a crazy loss of encapsulation in a microservice architecture.

So that would currently leave us with having to write a "backend-for-backend" above the middleware
but below any microservice - a sort of cross service data monolith.

Doesn't it make sense to push on from tailable cursors  and  database triggers to allow for the database to be the source of events - both towards CQRS query supporting views and to generate application (Kafka) events?

In order to help with CQRS we either need to build an ecosystem of datastores (via a shared spi)
that can be the source of such subscribable events or explore the more complex task trying to write
event based 'backend-for-backed' CQRS hubs without the support of datastore callbacks/events.

I believe that doing the latter is not best for customers as it would have to get 'in-between' applications and datastores for all updates. Whereas if we push the ability to generate events into the datastore this allows for
heterogeneous systems and thus more flexible evolution and competition between MicroProfile/Webflux/Helidon/Jarkarta and this will lead to better systems for users in the end.

The best way I know to try to cultivate a datastore SPI that does what CQRS ideally needs
(subscribable events/triggers/callbacks etc.) is to create a MicroProfile Reactive Datastore
specification and to use this as a means to work with R2DBC, ADBA, Mongo etc SPIs.
There are already forward thinking people doing good work in this area, for example here
but this is not yet possible in a cross vendor form (currently Pivotal-Mongo) which restricts
user choice.

Pushing subscribable events into R2DBC (or ADBA etc) and then MicroProfile (and others)
building streams on top of that and CQRS frameworks (named aggregates etc)
being built on top of that seems like a sensible plan to me.

Gordon

PS: I realise that I could have appended this onto the thread here
but I felt that a CQRS topic might be a clearer place to show the real customer
value.

James Roper

unread,
Feb 19, 2019, 4:45:11 AM2/19/19
to MicroProfile
On Tue, 19 Feb 2019 at 18:24, Gordon Hutchison <gordon.h...@gmail.com> wrote:

My motive in asking this question is to help answer the more abstract question:

   Do we need a MicroProfile Reactive Datastore spec?

The term 'Reactive' came from the 'reactive extensions' for .net and was then fundamentally a pub-sub interface. Back-pressure and Java generics (allowing functional stream composition) were layered on top.

CQRS

In a CQRS system we might have streams of events that represent commands arriving
from a remote system on a kafka queue or state change requests from a CRUD application's backend-for-frontend for example.

These will change the local state stored in the datastore.

We can also serve query requests, often with pre-canned and maintained 'views'
that might fit well with individual queries. (cf https://martinfowler.com/bliki/ReportingDatabase.html )

In designing such system where the services hang together one needs to be able to answer:

   When commands change the state which views (reporting databases) need to get updated?
  
   When commands change the state which kafka events/topics need to get posted?

Is knowing these and actioning them the responsibility of each application 'command' microservice? 

The obvious answers is no, that is a crazy loss of encapsulation in a microservice architecture.

For the first, which views need to be updated, no, the "command" microservice shouldn't know. But the second - if you consider the interface that a microservice offers to be the union of its REST API and the events it publishes, then yes, the command microservice does know which kafka events/topics need to get published, since that's part of the interface that it offers and is in control of.
 
So that would currently leave us with having to write a "backend-for-backend" above the middleware
but below any microservice - a sort of cross service data monolith.

Doesn't it make sense to push on from tailable cursors  and  database triggers to allow for the database to be the source of events - both towards CQRS query supporting views and to generate application (Kafka) events?

It can make sense, and in fact tailable cursors and triggers are probably the most robust way of materializing events from updates to a CRUD data store (though, they are not needed for an event source data store). That said, tailable cursors and triggers have a major drawback, and that is that you lose the intent of the event. For example, consider a state machine, a transition from one state to another could be triggered by multiple different events. How, by inspecting the state change (given to you by a database trigger or tailable cursor), do you know which event triggered it, and therefore which event to publish to consuming services? You don't. Now in many cases, there's only one event that can cause a particular transition, but even then, you still have to write code that manually reserve engineers events from state changes, and this can be an error prone and tedious process. So, at Lightbend, we don't recommend this approach, we recommend explicitly appending events to an event table, as part of the same transaction as the CRUD update, as this is far simpler to implement, and far easier to get right - at the end of the day, you're going to have to create these events anyway, may as well do it at the point in the code where you know exactly what the event is because you're handling the command, rather than at the opaque end. The downside to this approach of course is that you need to be disciplined about never doing an update to your CRUD schema without also appending a corresponding event.
 
In order to help with CQRS we either need to build an ecosystem of datastores (via a shared spi)
that can be the source of such subscribable events or explore the more complex task trying to write
event based 'backend-for-backed' CQRS hubs without the support of datastore callbacks/events.

I believe that doing the latter is not best for customers as it would have to get 'in-between' applications and datastores for all updates. Whereas if we push the ability to generate events into the datastore this allows for
heterogeneous systems and thus more flexible evolution and competition between MicroProfile/Webflux/Helidon/Jarkarta and this will lead to better systems for users in the end.

The best way I know to try to cultivate a datastore SPI that does what CQRS ideally needs
(subscribable events/triggers/callbacks etc.) is to create a MicroProfile Reactive Datastore
specification and to use this as a means to work with R2DBC, ADBA, Mongo etc SPIs.
There are already forward thinking people doing good work in this area, for example here
but this is not yet possible in a cross vendor form (currently Pivotal-Mongo) which restricts
user choice.

Pushing subscribable events into R2DBC (or ADBA etc) and then MicroProfile (and others)
building streams on top of that and CQRS frameworks (named aggregates etc)
being built on top of that seems like a sensible plan to me.

Gordon

PS: I realise that I could have appended this onto the thread here
but I felt that a CQRS topic might be a clearer place to show the real customer
value.

--
You received this message because you are subscribed to the Google Groups "Eclipse MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microprofile/5d817e86-667e-403e-a3ed-55be73b3186d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
James Roper
Architect, Office of the CTO, Lightbend, Inc.
@jroper

Gordon Hutchison

unread,
Feb 20, 2019, 10:13:59 AM2/20/19
to Eclipse MicroProfile
Hi James,

On reading your note, here is what came up for me

If I assume one of the features of a CQRS system is that it can support 'Queries'
that improve on performance by maintaining Reporting Databases
that back those queries that are tailored to them.

and as you say:

"For the first, which views need to be updated, no, the "command" microservice shouldn't know"

Then, for a traditional relational DB, it is 'normal' to define a view using SQL and as they are merely
functions over what is stored rather than additional copies, having multiple views of the same
data does not introduce any new issues. However it also does not give us the performance
boost we hope for. What we are thinking about is to do the filtering/joining/arranging up front
in an eager manner so that when the query comes in it runs quickly.
To get that we would need some sort of
maintain-view-as-a-cache-table option in the datastore.

We also might desire to make these read only views distributed/shared (via Events)
and maintained and scaled 'locally' to the query microservices.

If that maintain-view-as-a-cache-table option did not exist then we are down to
user code to maintain that Reporting Database cache table for the Query.

When does that code get run?
...when 'poked' by an Event.

There is no point in the event
just saying 'something has changed' so the Event has to be particular to the row/tuple that has
changed. 

There are choices for what domain the Event is in terms of, i.e. how 'raw' it is, it could be at the level of:


1.The Command service's triggering API Point of View  [This is the command+parms that occurred]
2.The Command internally generating a data model event [So not necessarily Command specific but still in the users app data model domain of entities rather than raw tables]
3.The datastore table changes [So particular raw 'normalised' values have changed]
4.Something close to the tuple that is present in the view that is consuming the event

Notes:

Level 1. If the Command was event driven 1. is easily available but consuming would mean duplicating the
early logic of the command so seems like a bad option.

Level 2. Sounds like me rephrasing what you were saying - in that this could be considered
part of the Command services interface that it emits these events based on the Command
logic's outcome/path.

Level 3. How would this work if a command updated multiple elements (columns or rows) -
it would only be sensible to generate these in 'lumps' - perhaps demarcated by the end of
command event processing or some micro-txn? Views would have to subscribe to these.

Level 4. This is similar to 3 but the system does the subscription based on the submitted
view definitions. This is the Event equivalent of a ReportingDatabase as there could be >1
view based synthetic event, each 'sent' to remote event sourced view databases.

On pondering this, it appears to me that only 2 and 4 can be argued for with any
structure, but which is best?

Thinking about what you said, mostly 2, but in the context (CQRS) I think it is CQRS that actually
helps solve this problem:

Perhaps there are two types of event -

those that are used to feed further sub-commands and active services
   Command destined Events
and those that are used to feed passive state distribution for query consumption.
   Query destined Events (i.e. feeding Event sourced RepostingDatabases)

For the Events which trigger actions and further commands what you said about
" lose the intent of the event ...you still have to write code that manually reserve engineers events from state changes, and this can be an error prone and tedious process"
makes sense - so it seems we want to keep the flow of such events in terms
of the application domain data model verbs
(which may be close to the commands service's triggering interface for simple commands)

But for query supporting views maintenance - if this is all we had we would need subscribing
logic to catch these (normalised/vanilla) data model verbs events and translate
these to the particular view - but this code would add little value. It might have to
catch multiple events and has to track the logic of any view definition (for event sourced
views it IS the operational view definition (expressed in code).  The generation of these types of Events
and the processing of them could perhaps be automated by the middleware and this would
be CQRS value add.

So to net this out :-)

I am basically persuaded by what you said but
the discussion has generated, for me, something I have been looking out for - perhaps a
potential concrete MicroProfile CQRS feature on:

Maintenance of ReportingDatabases (Query supporting views):
a specific class of Event that allowed for their distributed (event sourced)
maintenance. The generation and consumption
of those events to be automated value add
via the MicroProfile Reactive Messaging (Kafka etc)
spec. Triggering generation of these at the correct point
without it being done by user code is an interesting problem
but one that would indeed be easier to solve if I could have a
tailable cursor/trigger as a stream source.

I will go and have a read on what other systems (Axon etc) do in this
space.

Thanks for your thoughtful and persuasive reply James.
in fact I read here/here after typing most of the above and I
think that I have arrived, eventually, at a similar conclusion.

I really like the idea of the generation of Events
occurring via a txn isolated write to the database
(in an Event log table) and thus subject to
commit/rollback in the same txn as the data update
before being cascaded.

Some generated events (sub-commands) we want to send out
inside any local txn and some generated events we only want to
send out if the local work 'commits'/ends unexceptionally. cf this

I wonder if these two types of Events correlate with Command-feeding-Events
and Query-feeding-Events?


Gordon.

Gordon Hutchison

unread,
Feb 22, 2019, 6:33:29 PM2/22/19
to Eclipse MicroProfile
My question at the end is covered by 'command event separation' here https://eventstore.org/docs/event-sourcing-basics/  ...Greg Young.

I have a lot to catch up with.

Gordon
Reply all
Reply to author
Forward
0 new messages