Finders - what's for?

70 views
Skip to first unread message

Marcin Kuthan

unread,
Sep 2, 2011, 3:51:01 PM9/2/11
to ddd-cqr...@googlegroups.com
I have never used CQRS in practise, so excuse my question. 

It looks like finders are responsible for:

1. retrieving data (domain model)
2. converting (to dto)
3. providing data to the client

A little bit too much for single class, IMHO. Let me revise the existing patterns:

1. repository for retrieving data (evans)
2. assembler for converting to and from DTOs (fowler)
3. application service for digesting input/output (evans)

Why the finders are implemented as single class? There are well defined patterns for each of finder duty.

client 
     ->        app_service
                 ->              repository
                 <- domain object
                 ->              assembler
                 <- dto
       <- dto

Repository seems to be optional if we do not use domain model as intermediate data format. Finders seems to be an application services, assemblers are missing.

Sławomir Sobótka

unread,
Sep 2, 2011, 4:20:34 PM9/2/11
to ddd-cqr...@googlegroups.com
In CqRS You have 2 stacks of layers.
Command Stack for performing domain operations
Read stack for reading data for presentation

Now in non-crud applications, there is a significant miss-math:
We try optimize model for
- being supple - 3rd normal form at DB level, buts sucks while reading
in case of cross-table presentation (joins). Upper level (object
model) is good for business operations but sucks as a presentation
model (unnecessary data - irrelevant for screens)
- read fast for reporing-like views- 1st form rocks, no joins

This are just few, very very basic motivators for CqRS.


So finders are responsible for reading stack. In general reading stack
returns DTos because:
- they fit screen needs
- encapsulate domain "shape" - extremely important when remote apps
join our playground.

In general finders should be thin. They just read data and send it to
the caller. Trivial responsibility.
In our sample we tried to present many forms of separation of both stacks.

Strongest form rely on 2 separate DB model - optimized for different
purposes. This doesn not to have to be relational engines even!
Command stack may rely on event sourcing and read on some ultra fas
nonsql.
or You can have 2 models where read model if flaterned do 1st form by
materialized views
or updated b events for domain/application even

Or You can just query for pure sql and pack it to dtos. That is also
ilustrated and thats what mentioned finders do.
WARNING When reading many rows You should not read entities and repack
it to DTOs. That pure evil, besides... db-nazis will kill You for that
sooner or later:)
I know that is presented in may enterprise so called patterns, but
it's hard to imagine something more silly:P

But when reading single DTO... some lazy developer did this what You
ask about: loaded entity and repacked to DTO.
This is very, very basic form of "separation" - rather mental than
technical CqRS.


So just to summarize:
Reading stack is thin, simple, one layer, with no need to repack,
assemble and layering. Just read data and do it as fast as possible.
Good playground for apprenticers.

Sławomir Sobótka

unread,
Sep 2, 2011, 4:23:05 PM9/2/11
to ddd-cqr...@googlegroups.com
btw:
I think it should be explained more straightforward in wiki.

Marcin Kuthan

unread,
Sep 5, 2011, 10:59:42 AM9/5/11
to ddd-cqr...@googlegroups.com
I really like your CQRS explanation, incorporate the definition to the wiki :-)

I've just figure out that I have been used CQRS several times in my projects. External search index plays as de-normalized model of the application, optimized for searching. But finders implementation for retrieving data from search index is not as easy as "select * from foo" :-( 

For educational purposes I would prepare de-normalized db tables/views for finders, get rid of DTOs and return pure collection (perhaps map, or list of map) to the UI. Any additional logic in the finders implementation is misleading. But if data retrieving is more complex, simple finders are not enough and mentioned patterns are welcome. 

One more concern about de-normalization with db views / materialized views / no-sql. The build should be still portable on some lightweight equivalent (for local developer environment - h2, mysql, etc.).

Sławomir Sobótka

unread,
Sep 5, 2011, 3:26:32 PM9/5/11
to ddd-cqr...@googlegroups.com
> I really like your CQRS explanation, incorporate the definition to the wiki
> :-)

General idea is quite simple. Two models (whatever "model" means:)
designed and optimized for two different purposes (uber-model
optimized for both is rather not possible).
But to excite You, I can say that interesting stuff begins when You
introduce Event Sourcing as a persistence technique:P

> I've just figure out that I have been used CQRS several times in my
> projects. External search index plays as de-normalized model of the
> application, optimized for searching. But finders implementation for
> retrieving data from search index is not as easy as "select * from foo" :-(

It is in Your affair to make is as easy and fast as possible:P

As I understand, Your second model was just as index, but actual
business data was fetched from common model...?
I assume that It was designed this way in sake of consistency.
btw: Graph-oriented DB can be used to index data when dealing with
graph problems (ex. recommendations of products by friends, or people
who bought the same as I did...)

In CqRS we can assume that "consistency is eventual" for some/all
data. So in sake of scalability we can approve some staleness.
Therefore we *can* update read model using events send by queues.
Actually all data are stale when leaving transaction, so You may ask
client: how stale is still good enough:)


> For educational purposes I would prepare de-normalized db tables/views for
> finders, get rid of DTOs and return pure collection (perhaps map, or list of
> map) to the UI. Any additional logic in the finders implementation is
> misleading. But if data retrieving is more complex, simple finders are not
> enough and mentioned patterns are welcome.

There is one example presenting "spying" user who removes products
from Cart. This is done by listening to the *application* events
(because we assumed that Cart is application not domain concept). But
the same could be done on Domain Events. So event's data is stored in
denormalized table just to report it fast - just a sample case, but
idea is important.

Marcin Kuthan

unread,
Sep 6, 2011, 4:27:08 AM9/6/11
to ddd-cqr...@googlegroups.com
W dniu poniedziałek, 5 września 2011, 21:26:32 UTC+2 użytkownik Sławek Sobótka napisał:
> I really like your CQRS explanation, incorporate the definition to the wiki
> :-)

General idea is quite simple. Two models (whatever "model" means:)
designed and optimized for two different purposes (uber-model
optimized for both is rather not possible).
But to excite You, I can say that interesting stuff begins when You
introduce Event Sourcing as a persistence technique:P

I'm looking forward, then :-)
 

> I've just figure out that I have been used CQRS several times in my
> projects. External search index plays as de-normalized model of the
> application, optimized for searching. But finders implementation for
> retrieving data from search index is not as easy as "select * from foo" :-(

It is in Your affair to make is as easy and fast as possible:P

As I understand, Your second model was just as index, but actual
business data was fetched from common model...?
I assume that It was designed this way in sake of consistency.
btw: Graph-oriented DB can be used to index data when dealing with
graph problems (ex. recommendations of products by friends, or people
who bought the same as I did...)


There was an external index for full text searching. All data needed to display the results were stored in the index metadata. The results were displayed without any additional database lookup based only on the metadata. Index was built asynchronously, so the search results were not 100% consistent with database model.
 

In CqRS we can assume that "consistency is eventual" for some/all
data. So in sake of scalability we can approve some staleness.
Therefore we *can* update read model using events send by queues.
Actually all data are stale when leaving transaction, so You may ask
client: how stale is still good enough:)

Sure, the 100% consistency is not feasible even if the traditional web applications. When the list of customers is displayed, user will never know that customer still exists when they want to edit them (many thanks to Udi, he really helped me to understand consistency issues).


> For educational purposes I would prepare de-normalized db tables/views for
> finders, get rid of DTOs and return pure collection (perhaps map, or list of
> map) to the UI. Any additional logic in the finders implementation is
> misleading. But if data retrieving is more complex, simple finders are not
> enough and mentioned patterns are welcome.

There is one example presenting "spying" user who removes products
from Cart. This is done by listening to the *application* events
(because we assumed that Cart is application not domain concept). But
the same could be done on Domain Events. So event's data is stored in
denormalized table just to report it fast - just a sample case, but
idea is important.

Is it example of mentioned event sourcing?
 

Sławomir Sobótka

unread,
Sep 6, 2011, 4:55:34 AM9/6/11
to ddd-cqr...@googlegroups.com
>> There is one example presenting "spying" user who removes products
>> from Cart. This is done by listening to the *application* events
>> (because we assumed that Cart is application not domain concept). But
>> the same could be done on Domain Events. So event's data is stored in
>> denormalized table just to report it fast - just a sample case, but
>> idea is important.
>
> Is it example of mentioned event sourcing?

No, ES is more about "command stack".

In short:
In general ES is about shifting persistence model. In classic approach
we store relational (structural) data, that describe current state.
In ES we store behavioral model - events that occurs on Aggregate. So
there is no current state, just series of behaviors.
Additionally You can project this series of events to any denormalized
read model You want.

Short sample scenario:
Saving data:
1. Aggregate is not stored in relational tables (ex. via ORM).
2. Aggregate just publish events
3. There are at least 2 listeners interested in particular event.
a) Aggregate itself. Because domain methods perform calculations and
fire events. They do not change Aggregate state. Aggregate can listen
to event (one which was just published) and when Aggregate catches
event: looks ad data fields in event and changes own inner state
c) Listener that updates read model (this is case described previously
according do Leaven example). You can have many models (projections)
per Aggregate class - depends what You are interested in and how fast
You want to red.
4. Repo stores in some persistent event store serialized (ex. json)
events that occured on given aggregate (one domain method may fire
many domain events)

Loading Aggregate
1. Repo loads all events associated with concrete Aggregate (by ID)
2. Aggregate is loaded by all events to "replay" all actions in order
to become in current state (the same method that is described in
listener at Saving data 3a)

Read model (Finders) doest not know anything about any events,
aggregates, nothing, This model is designed to read fast.

Marcin Kuthan

unread,
Sep 6, 2011, 9:37:27 AM9/6/11
to ddd-cqr...@googlegroups.com
Now, it's clear but new doubts came to my mind ...

How to efficiently load aggregates from events? The full event history might be quite long ... Some sort of save points could be introduced but it will lead to ORM like storage.

And tell me how to easily check what is the current state of the system (from the domain logic perspective)? Replay the events, look into one of read models? Much worse than looking into regular database.

How to implement unique constraint checking? Iterate over all events and check one by one?

Marcin Derylo

unread,
Sep 6, 2011, 1:40:30 PM9/6/11
to ddd-cqr...@googlegroups.com


W dniu wtorek, 6 września 2011, 15:37:27 UTC+2 użytkownik Marcin Kuthan napisał:
Now, it's clear but new doubts came to my mind ...

How to efficiently load aggregates from events? The full event history might be quite long ...

You sure? Maybe that means that your aggregate is just too big and should be split? And the loading of the aggregate from a list of events is blazing fast - applying events to an aggregate is just a matter of copying all/some of the event properties into aggregate internal state, maybe doing some simple operations like adding values etc. You simply don't  do any business logic during loading. No logic at all. Just dead simple operations over the internal state of your aggregate.

If you really need your domain to be blazing fast, you can keep it in memory, all of it, can't you? Check out what guys developing LMAX did.
 
Some sort of save points could be introduced but it will lead to ORM like storage.

If you really have to do it you can create snapshots (Greg suggests doing it asynchronously, as a background process). Use the Memento design pattern for that. After loading the aggregate from snapshot apply only the events that happened after the snapshotting. Nothing close to ORM's level of complexity.
 

And tell me how to easily check what is the current state of the system (from the domain logic perspective)? Replay the events, look into one of read models? Much worse than looking into regular database.
 
Your event log is your source of truth. Your domain rules operate over the state of your aggregates, loaded from events. Clients read stuff from read models and assemble the commands based on what they find there. 

Marcin Derylo

unread,
Sep 6, 2011, 1:43:34 PM9/6/11
to ddd-cqr...@googlegroups.com


W dniu wtorek, 6 września 2011, 15:37:27 UTC+2 użytkownik Marcin Kuthan napisał:

How to implement unique constraint checking? Iterate over all events and check one by one?


Nope, for this one you will have to free your mind, Neo ;) Remember eventual consistency.

No matter where you ask this question you will end up being told to check http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/  . I think it is a good read before discussing this matter in details.

Marcin Kuthan

unread,
Sep 6, 2011, 3:47:44 PM9/6/11
to ddd-cqr...@googlegroups.com
Thanks for great reference (LMAX)! In Martin Fowler article I found (briefly) answers to my questions/doubts. and I will continue reading tomorrow morning, for sure :-) 

LMAX team decided to use event sourcing due to very specific requirements for financial systems - low latency and high throughput. Let's back to average architect reality - web application(s), deployed in the intranet with small or medium userbase. Do you think that event sourcing is still applicable? My assumption was that leaven is not an example for LMAX like system, but rather average enterprise application, and I'm looking for use cases when event sourcing could help.

Marcin Kuthan

unread,
Sep 6, 2011, 4:02:10 PM9/6/11
to ddd-cqr...@googlegroups.com
Thanks again for reference, I had to change my plans for this evening from cleaning to reading ;-)

Conclusion after reading: it should be a business decision - do we need 100% consistency or not.  And design how to mitigate potential inconsistency (if they really matter from business perspective). But if the application is built on top of RDBMS, unique constraint still seems to be the easiest and cheapest way to achieve "absolute consistency" (perhaps it won't break the system, but might help in most scenarios).

Marcin Derylo

unread,
Sep 6, 2011, 4:24:36 PM9/6/11
to ddd-cqr...@googlegroups.com
My guess is that if event sourcing got popular a few years ago, became a "default" solution for your average enterprise web app like ORMs are (unfortunately) now the question would be: are those extremally complex beasts called ORMs a suitable solution for our app? 
I'm not really convinced that something like Hibernate is a less complex solution than event sourcing. Might make starting development faster, but aftrer some time you have to optimize the queries for some use cases, optimize them a bit more for other use cases (by that time you might have already started passing back and forth a list of relationship names to fetch eagerly in current use case), add some hacks to work around known ORM limitations. I've been using Hibernate/JPA for last 5 years and recently I really grown sick of it, after having done a couple of optimize-loading-time-of-screen-A tasks and seeing ugly hacks in the domain to solve some Hibernate issues (using a one-to-many relationship never having more than one entity on the child side, instead of using simple one-to-one because cascading operation fails sometimes? yuck!).

Having explained what my guts keep telling me - but back to your question. 
(DISCLAIMER: So far I didn't use Event Sourcing in real world app, just working with some small samples so what I say is based on my limited knowledge gathered from training with Greg Young, blog posts, discussion groups, presentations and my own experiments)

  • Pretty much every system I've been working on had a requirement for audit log (not really at the beginning, rather a couple of months later, when you already have an established model that don't easily support it). You get that for free, as a side-effect, when event sourcing.
  • Enterprise apps often have requirements for some statistics, pie charts etc. More often than not you don't really know upfront what kind of statistics you will need. Not a problem if you get those requirements before your first deploy at client's site. But what to do if we already have some data in the database? Having an event log that carries not only the data itself, but also the user intent of the actions, helps a lot when you have to implement such features. You can create read models for such statistics from the beginning of time by just replaying to them all events you ever stored. This possibility to create any possible structural model of your data is not to be underestimated.
  • You can implement your model not thinking about persistence. No more ugly hacks to make ORM work fine with our entities! Probably cleaner code. Of course - you can't just take your current ORM-based model and refactor it into event sourcing - I suppose it would be sub-optimal to the design you could come with if you designed your domain using events from the beginning.
Probably could continue with other benefits but I got distracted and lost my focus.

I suppose the fear factor of such a "new" (not really, but people just don't realize it's been there for ages) thing is big enough so that most people won't even try suggesting such solution. We tend to stick to what we are comfortable with (Funny, just got this link in my twitter stream http://www.news.cornell.edu/stories/Aug11/ILRCreativityBias.html ). 

Marcin Derylo

unread,
Sep 6, 2011, 4:39:48 PM9/6/11
to ddd-cqr...@googlegroups.com
That's a trade-off you might consider - sacrificing out-of-the-box consistency (in ACID sense) for things like:
- better separation of concerns (asynchronously synchronized BCs, cleaner domain model)
- easy to implement, discardable read models
- organizational benefits, like being able to double the size of the team without being hurt by it (think communication overhead for example)
- out-of-the-box audit log

Here, having a choice is actually a good thing. You are free to choose whatever way will best fit your needs. You just have to keep your eyes wide open to see such opportunities.

Sławomir Sobótka

unread,
Sep 6, 2011, 4:59:37 PM9/6/11
to ddd-cqr...@googlegroups.com
Marcin mostly answered Your concerns. I'll just add some stuff to
Marcin's reply to give You some details.

> Now, it's clear but new doubts came to my mind ...
> How to efficiently load aggregates from events? The full event history might
> be quite long ... Some sort of save points could be introduced but it will
> lead to ORM like storage.

Snapshots, like Marcin said.
Greg who uses ES for algorithmic trading systems runs (I' can
remember, Marcin/Rafał correct me if I'm wrong): 200/300 thousands
transactions per second.
But I think that domain is different in all aspects than in ERP
systems... In ERP i expect hundreds of aggregates and dozens of events
per aggregate. I don't know algorithmic trading domain but I can
imagine dozens of aggregates and hundreds thousands of events per
aggregate.

> And tell me how to easily check what is the current state of the system
> (from the domain logic perspective)? Replay the events, look into one of
> read models? Much worse than looking into regular database.

I add to Marcin's reply that You can think about another read model
(another projection that occurred to be needed) and simply generate it
from events. Because events models behavior - what was done.
And most important is that Your new read model contains projection of
data You have gathered form the beginning of system lifetime (not just
from point in time when You release new version on the production).

> How to implement unique constraint checking? Iterate over all events and
> check one by one?

Very good question sir:)
So you ask if we need to implement by our own all mechanics that are
already provided by all RDBMs out of the box - and they work well.
Indeed, very good question:)

This is one of the main reasons why I personally can not imagine ERP
class system based on ES. But this may be just a problem with my
imagination and lack of experience.

I really hope I have just started a flame:P Unleash hell!

Reply all
Reply to author
Forward
0 new messages