Would good CQRS middleware need a pub-sub system to and from the datastore?

64 views

cqrsdata-accesseventsreactive

Skip to first unread message

Gordon Hutchison

unread,

Feb 19, 2019, 2:24:07 AM2/19/19

to Eclipse MicroProfile

My motive in asking this question is to help answer the more abstract question:

Do we need a MicroProfile Reactive Datastore spec?

The term 'Reactive' came from the 'reactive extensions' for .net and was then fundamentally a pub-sub interface. Back-pressure and Java generics (allowing functional stream composition) were layered on top.

CQRS

In a CQRS system we might have streams of events that represent commands arriving

from a remote system on a kafka queue or state change requests from a CRUD application's backend-for-frontend for example.

These will change the local state stored in the datastore.

We can also serve query requests, often with pre-canned and maintained 'views'

that might fit well with individual queries. (cf https://martinfowler.com/bliki/ReportingDatabase.html )

In designing such system where the services hang together one needs to be able to answer:

When commands change the state which views (reporting databases) need to get updated?

When commands change the state which kafka events/topics need to get posted?

Is knowing these and actioning them the responsibility of each application 'command' microservice?

The obvious answers is no, that is a crazy loss of encapsulation in a microservice architecture.

So that would currently leave us with having to write a "backend-for-backend" above the middleware

but below any microservice - a sort of cross service data monolith.

Doesn't it make sense to push on from tailable cursors and database triggers to allow for the database to be the source of events - both towards CQRS query supporting views and to generate application (Kafka) events?

In order to help with CQRS we either need to build an ecosystem of datastores (via a shared spi)

that can be the source of such subscribable events or explore the more complex task trying to write

event based 'backend-for-backed' CQRS hubs without the support of datastore callbacks/events.

I believe that doing the latter is not best for customers as it would have to get 'in-between' applications and datastores for all updates. Whereas if we push the ability to generate events into the datastore this allows for

heterogeneous systems and thus more flexible evolution and competition between MicroProfile/Webflux/Helidon/Jarkarta and this will lead to better systems for users in the end.

The best way I know to try to cultivate a datastore SPI that does what CQRS ideally needs

(subscribable events/triggers/callbacks etc.) is to create a MicroProfile Reactive Datastore

specification and to use this as a means to work with R2DBC, ADBA, Mongo etc SPIs.

There are already forward thinking people doing good work in this area, for example here

but this is not yet possible in a cross vendor form (currently Pivotal-Mongo) which restricts

user choice.

Pushing subscribable events into R2DBC (or ADBA etc) and then MicroProfile (and others)

building streams on top of that and CQRS frameworks (named aggregates etc)

being built on top of that seems like a sensible plan to me.

Gordon

PS: I realise that I could have appended this onto the thread here

but I felt that a CQRS topic might be a clearer place to show the real customer

value.

James Roper

unread,

Feb 19, 2019, 4:45:11 AM2/19/19

to MicroProfile

On Tue, 19 Feb 2019 at 18:24, Gordon Hutchison <gordon.h...@gmail.com> wrote:

My motive in asking this question is to help answer the more abstract question:

   Do we need a MicroProfile Reactive Datastore spec?

The term 'Reactive' came from the 'reactive extensions' for .net and was then fundamentally a pub-sub interface. Back-pressure and Java generics (allowing functional stream composition) were layered on top.

CQRS

In a CQRS system we might have streams of events that represent commands arriving
from a remote system on a kafka queue or state change requests from a CRUD application's backend-for-frontend for example.

These will change the local state stored in the datastore.

We can also serve query requests, often with pre-canned and maintained 'views'
that might fit well with individual queries. (cf https://martinfowler.com/bliki/ReportingDatabase.html )

In designing such system where the services hang together one needs to be able to answer:

   When commands change the state which views (reporting databases) need to get updated?

   When commands change the state which kafka events/topics need to get posted?

Is knowing these and actioning them the responsibility of each application 'command' microservice?

The obvious answers is no, that is a crazy loss of encapsulation in a microservice architecture.

For the first, which views need to be updated, no, the "command" microservice shouldn't know. But the second - if you consider the interface that a microservice offers to be the union of its REST API and the events it publishes, then yes, the command microservice does know which kafka events/topics need to get published, since that's part of the interface that it offers and is in control of.

So that would currently leave us with having to write a "backend-for-backend" above the middleware
but below any microservice - a sort of cross service data monolith.

Doesn't it make sense to push on from tailable cursors and database triggers to allow for the database to be the source of events - both towards CQRS query supporting views and to generate application (Kafka) events?

It can make sense, and in fact tailable cursors and triggers are probably the most robust way of materializing events from updates to a CRUD data store (though, they are not needed for an event source data store). That said, tailable cursors and triggers have a major drawback, and that is that you lose the intent of the event. For example, consider a state machine, a transition from one state to another could be triggered by multiple different events. How, by inspecting the state change (given to you by a database trigger or tailable cursor), do you know which event triggered it, and therefore which event to publish to consuming services? You don't. Now in many cases, there's only one event that can cause a particular transition, but even then, you still have to write code that manually reserve engineers events from state changes, and this can be an error prone and tedious process. So, at Lightbend, we don't recommend this approach, we recommend explicitly appending events to an event table, as part of the same transaction as the CRUD update, as this is far simpler to implement, and far easier to get right - at the end of the day, you're going to have to create these events anyway, may as well do it at the point in the code where you know exactly what the event is because you're handling the command, rather than at the opaque end. The downside to this approach of course is that you need to be disciplined about never doing an update to your CRUD schema without also appending a corresponding event.

In order to help with CQRS we either need to build an ecosystem of datastores (via a shared spi)
that can be the source of such subscribable events or explore the more complex task trying to write
event based 'backend-for-backed' CQRS hubs without the support of datastore callbacks/events.

I believe that doing the latter is not best for customers as it would have to get 'in-between' applications and datastores for all updates. Whereas if we push the ability to generate events into the datastore this allows for
heterogeneous systems and thus more flexible evolution and competition between MicroProfile/Webflux/Helidon/Jarkarta and this will lead to better systems for users in the end.

The best way I know to try to cultivate a datastore SPI that does what CQRS ideally needs
(subscribable events/triggers/callbacks etc.) is to create a MicroProfile Reactive Datastore
specification and to use this as a means to work with R2DBC, ADBA, Mongo etc SPIs.
There are already forward thinking people doing good work in this area, for example here
but this is not yet possible in a cross vendor form (currently Pivotal-Mongo) which restricts
user choice.

Pushing subscribable events into R2DBC (or ADBA etc) and then MicroProfile (and others)
building streams on top of that and CQRS frameworks (named aggregates etc)
being built on top of that seems like a sensible plan to me.

Gordon

PS: I realise that I could have appended this onto the thread here
but I felt that a CQRS topic might be a clearer place to show the real customer
value.

--
You received this message because you are subscribed to the Google Groups "Eclipse MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microprofile/5d817e86-667e-403e-a3ed-55be73b3186d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Roper
Architect, Office of the CTO, Lightbend, Inc.
@jroper

Gordon Hutchison

unread,

Feb 20, 2019, 10:13:59 AM2/20/19

to Eclipse MicroProfile

Hi James,

On reading your note, here is what came up for me

If I assume one of the features of a CQRS system is that it can support 'Queries'

that improve on performance by maintaining Reporting Databases

that back those queries that are tailored to them.

and as you say:

"For the first, which views need to be updated, no, the "command" microservice shouldn't know"

Then, for a traditional relational DB, it is 'normal' to define a view using SQL and as they are merely

functions over what is stored rather than additional copies, having multiple views of the same

data does not introduce any new issues. However it also does not give us the performance

boost we hope for. What we are thinking about is to do the filtering/joining/arranging up front

in an eager manner so that when the query comes in it runs quickly.

To get that we would need some sort of

maintain-view-as-a-cache-table option in the datastore.

We also might desire to make these read only views distributed/shared (via Events)

and maintained and scaled 'locally' to the query microservices.

If that maintain-view-as-a-cache-table option did not exist then we are down to

user code to maintain that Reporting Database cache table for the Query.

When does that code get run?

...when 'poked' by an Event.

There is no point in the event

just saying 'something has changed' so the Event has to be particular to the row/tuple that has

changed.

There are choices for what domain the Event is in terms of, i.e. how 'raw' it is, it could be at the level of:

1.The Command service's triggering API Point of View [This is the command+parms that occurred]

2.The Command internally generating a data model event [So not necessarily Command specific but still in the users app data model domain of entities rather than raw tables]

3.The datastore table changes [So particular raw 'normalised' values have changed]

4.Something close to the tuple that is present in the view that is consuming the event

Notes:

Level 1. If the Command was event driven 1. is easily available but consuming would mean duplicating the

early logic of the command so seems like a bad option.

Level 2. Sounds like me rephrasing what you were saying - in that this could be considered

part of the Command services interface that it emits these events based on the Command

logic's outcome/path.

Level 3. How would this work if a command updated multiple elements (columns or rows) -

it would only be sensible to generate these in 'lumps' - perhaps demarcated by the end of

command event processing or some micro-txn? Views would have to subscribe to these.

Level 4. This is similar to 3 but the system does the subscription based on the submitted

view definitions. This is the Event equivalent of a ReportingDatabase as there could be >1

view based synthetic event, each 'sent' to remote event sourced view databases.

On pondering this, it appears to me that only 2 and 4 can be argued for with any

structure, but which is best?

Thinking about what you said, mostly 2, but in the context (CQRS) I think it is CQRS that actually

helps solve this problem:

Perhaps there are two types of event -

those that are used to feed further sub-commands and active services

Command destined Events

and those that are used to feed passive state distribution for query consumption.

Query destined Events (i.e. feeding Event sourced RepostingDatabases)

For the Events which trigger actions and further commands what you said about

" lose the intent of the event ...you still have to write code that manually reserve engineers events from state changes, and this can be an error prone and tedious process"

makes sense - so it seems we want to keep the flow of such events in terms

of the application domain data model verbs

(which may be close to the commands service's triggering interface for simple commands)

But for query supporting views maintenance - if this is all we had we would need subscribing

logic to catch these (normalised/vanilla) data model verbs events and translate

these to the particular view - but this code would add little value. It might have to

catch multiple events and has to track the logic of any view definition (for event sourced

views it IS the operational view definition (expressed in code). The generation of these types of Events

and the processing of them could perhaps be automated by the middleware and this would

be CQRS value add.

So to net this out :-)

I am basically persuaded by what you said but

the discussion has generated, for me, something I have been looking out for - perhaps a

potential concrete MicroProfile CQRS feature on:

Maintenance of ReportingDatabases (Query supporting views):

a specific class of Event that allowed for their distributed (event sourced)

maintenance. The generation and consumption

of those events to be automated value add

via the MicroProfile Reactive Messaging (Kafka etc)

spec. Triggering generation of these at the correct point

without it being done by user code is an interesting problem

but one that would indeed be easier to solve if I could have a

tailable cursor/trigger as a stream source.

I will go and have a read on what other systems (Axon etc) do in this

space.

Thanks for your thoughtful and persuasive reply James.

in fact I read here/here after typing most of the above and I

think that I have arrived, eventually, at a similar conclusion.

I really like the idea of the generation of Events

occurring via a txn isolated write to the database

(in an Event log table) and thus subject to

commit/rollback in the same txn as the data update

before being cascaded.

Some generated events (sub-commands) we want to send out

inside any local txn and some generated events we only want to

send out if the local work 'commits'/ends unexceptionally. cf this

I wonder if these two types of Events correlate with Command-feeding-Events

and Query-feeding-Events?

Gordon.

Gordon Hutchison

unread,

Feb 22, 2019, 6:33:29 PM2/22/19

to Eclipse MicroProfile

My question at the end is covered by 'command event separation' here https://eventstore.org/docs/event-sourcing-basics/ ...Greg Young.

I have a lot to catch up with.

Gordon

Reply all

Reply to author

Forward

0 new messages