[DDD/CQRS] Checking if a unique value is already in use

3,867 views
Skip to first unread message

Jeff Doolittle

unread,
Sep 2, 2010, 1:17:26 AM9/2/10
to DDD/CQRS
Let's say you have a User domain object and you want to constrain
uniqueness for Users' Email Addresses. When another user attempts to
create an account for an Email Address which is already in use, is
there a correct CQRS approach to such a scenario?

Command - CreateNewUserCommand
Event - NewUserCreatedEvent
Report - UserReport

1) check the UserReport in the read repository from the client before
submitting the command and notify that the email address is already in
use
2) check the UserReport in the read repository from the
CreateNewUserCommand handler and throw an exception if the email
address is already in use (is it even appropriate to be checking the
read model from a command handler?)
3) combination of 1 & 2
4) ???

Chris Martin

unread,
Sep 2, 2010, 1:20:27 AM9/2/10
to ddd...@googlegroups.com
Wouldn't your command handler find an existing user with the email address and prohibit the operation from continuing? I don't see an event being published here. 

Please correct me if I'm wrong.

Cheers,
Chris

Udi Dahan

unread,
Sep 2, 2010, 1:59:39 AM9/2/10
to ddd...@googlegroups.com
You wouldn't check the "read repository" from the command side.
Instead, just set up a unique constraint in the DB on that field on the
command side.

Cheers,

-- Udi Dahan

Jeff Doolittle

unread,
Sep 2, 2010, 2:00:21 AM9/2/10
to DDD/CQRS
How would the command handler find the existing user? I'm using Event
Sourcing for storing the domain objects, so the only way to retrieve
these is by ID which is a guid. I don't have a way to retrieve the
domain object by email address. I don't think I like the idea of
storing the domain objects with a natural key, but I'm not sure that's
exactly what you had in mind anyway.

The event I mentioned would only be published on success, not on
failure to create a new user due to an email address already being in
use.

I did think of another option that I'll call #4

4) In NewUserCreatedEvent handler, if the email address is already in
use in the reporting model, call a domain service which sets off the
following: a) submit a compensating command that "undoes" the recently
created duplicate user in the domain, b) sends a notification to the
system administrator that a duplicate user had been created, but that
the second instance has been reversed

Perhaps 1 2 and 4 together is the right approach, but I'm still not
sure about referencing the reporting repository from a command handler
(like in #2).

Curious what Chris and others think.

--Jeff



On Sep 1, 10:20 pm, Chris Martin <ch...@caliberweb.com> wrote:
> Wouldn't your command handler find an existing user with the email address
> and prohibit the operation from continuing? I don't see an event being
> published here.
>
> Please correct me if I'm wrong.
>
> Cheers,
> Chris
>

Jeff Doolittle

unread,
Sep 2, 2010, 2:02:15 AM9/2/10
to DDD/CQRS
@Udi - how does the use of Event Sourcing affect your previous
response?

--Jeff

Stefan Holmberg

unread,
Sep 2, 2010, 2:29:00 AM9/2/10
to ddd...@googlegroups.com
There was a loong thread on this matter, and when eventsourcing and no
column to put the contraint on:

http://groups.google.com/group/dddcqrs/browse_thread/thread/69496d381d9afb76/f55eb0c2723cc17d

greg comes to rescue in the middle somewhere, asking the question,
"what are the business implications of a duplicate? What are the
chances of it happening"
= if neither of them not too big, simply let it go through and handle
it asynchronsously with some sort of compensating command or whatever.
He's stating consistency is overrated.

In the thread Adam also gives a solution in the very special occassion
where consistency IS indeed needed, with introducing a SHA1 lookup of
email addresses from the commandhandler / domain, cant remember the
details.

--
Stefan Holmberg

Systementor AB
Blåklockans väg 8
132 45  Saltsjö-Boo
Sweden
Cellphone : +46 709 221 694
Web: http://www.systementor.se

stacy

unread,
Sep 2, 2010, 2:41:28 AM9/2/10
to DDD/CQRS
Jeff,

According to Udi's recent video that appeared today (http://
www.infoq.com/presentations/Command-Query-Responsibility-Segregation),
he said to check such a constraint using the read-repo BEFORE you
submit the "CreateNewUserCommand" to the command side.

If the email address is already taken, that command would never get
submitted. The idea is that you want to submit commands that have a
high probability of success.

Ironically the read-repo actually used with commands too - by helping
you to decide IF you should even submit commands to the command side.
So that would be your option 1 in this case. Use of Event Sourcing is
obviously irrelevant.

Hope this helps,

Stacy

Jeff Doolittle

unread,
Sep 2, 2010, 2:41:38 AM9/2/10
to DDD/CQRS
Hi Stefan,

I've read through that thread and I recall Greg's comment to that
effect. I think it's perfectly legitimate - I understand the
implications of eventual consistency and I'm not trying to ensure
immediate consistency. I'm just trying to determine if there is a
recommended approach to such an issue.

I'd appreciate specific commentary on the options I've proposed and
any appropriate/inappropriate combination(s) of them:

1) check the read model in the client before submitting the command
(seems like a no brainer, why wouldn't you at least do this?)
2) check the read model in the command handler and throw if it already
exists (questionable - should I really be checking the read model from
a command handler? ... keep in mind Event Sourcing: I can't query the
domain objects by specific properties)
3) 1 & 2
4) catch the duplicate on the CreateNewUserEventHandler and then
perform a compensating command to rollback the duplicate, then send
some sort of notification regarding the issue
5) 1 & 4
6) 1 & 2 & 4

I'm thinking option 5 makes the most sense, but I'm curious what
others think about also including option 2 which requires a command
handler checking the read model.

--Jeff





On Sep 1, 11:29 pm, Stefan Holmberg <stefan.holmb...@systementor.se>
wrote:
> There was a loong thread on this matter, and when eventsourcing and no
> column to put the contraint on:
>
> http://groups.google.com/group/dddcqrs/browse_thread/thread/69496d381...

Jeff Doolittle

unread,
Sep 2, 2010, 2:44:15 AM9/2/10
to DDD/CQRS
@stacy - just read your post right after submitting my last one. I
agree, and it's my option 1. I'm curious what you think about the
other options?

--Jeff

Udi Dahan

unread,
Sep 2, 2010, 2:44:17 AM9/2/10
to ddd...@googlegroups.com
I consider event-based persistence (a.k.a "event sourcing") to be a pattern,
like CQRS, in which case it isn't necessarily used for everything - every
pattern has a context.

I'd recommend not reinventing the unique-constraint wheel.

Kind regards,

-- Udi Dahan


-----Original Message-----
From: ddd...@googlegroups.com [mailto:ddd...@googlegroups.com] On Behalf
Of Jeff Doolittle
Sent: Thursday, September 02, 2010 9:02 AM
To: DDD/CQRS

Jeff Doolittle

unread,
Sep 2, 2010, 2:46:01 AM9/2/10
to DDD/CQRS
@Udi - that is certainly a consideration, however I'm not sure I want
to introduce two different storage mechanisms for domain objects, some
into an event source and others to relational tables.

Stefan Holmberg

unread,
Sep 2, 2010, 2:58:13 AM9/2/10
to ddd...@googlegroups.com
Personally I strive to do no 5 in these cases. Ie first check read
model if its even worth sending the command. Then in command handler I
try to not care too much about duplicates but handle them by a
"CreateNewUserEventHandler"

As for no 2 it wont do you much good. It another hint (like no 1), but
since readmnodel is (preferably) asynchronously updated it might not
be correct, yet.

So some sort of #4 is always needed - or introduce a sha1 table on
email like Adam suggests in the long thread I pointed to.

No 3
1) check the read model in the client before submitting the command


(seems like a no brainer, why wouldn't you at least do this?)
2) check the read model in the command handler and throw if it already

exists (questionable - should I really be checking the read model from


a command handler? ... keep in mind Event Sourcing: I can't query the
domain objects by specific properties)
3) 1 & 2
4) catch the duplicate on the CreateNewUserEventHandler and then
perform a compensating command to rollback the duplicate, then send
some sort of notification regarding the issue
5) 1 & 4
6) 1 & 2 & 4

--

Jeff Doolittle

unread,
Sep 2, 2010, 3:08:17 AM9/2/10
to DDD/CQRS
@Stefan,

Sounds about right. I'm intrigued by the SHA1 approach. Based on
Udi's comments, I still may consider the possibility of enforcing
immediate consistency in this one scenario. However, Greg's comments
regarding business impact are also worth considering - really how bad
is it if the duplicate is created? and how hard is it to recover? and
what is the cost of preventing the duplicate in the first place?

I think I've got enough information to make an educated decision. I'm
going to go with option 5 for now and if it ever becomes a major issue/
concern and is worth the cost, migrate to some sort of immediate
consistency for enforcing the unique constraint in this particular
instance. At this point, I don't think it's worth the cost, but it's
nice to know the option is there if I ever need to migrate to it.

--Jeff



On Sep 1, 11:58 pm, Stefan Holmberg <stefan.holmb...@systementor.se>
wrote:

stacy

unread,
Sep 2, 2010, 3:16:13 AM9/2/10
to DDD/CQRS
@Jeff

Once on the command side, I would not use the read-repo for anything.
It's just an artifact when using event sourcing and could be out of
date as well. The latest truth is in the event store.

You persist the "read-repo email insert" and the "event-store
NewUserCreatedEvent" together in a transaction. So everything rolls
back if the email address constraint in the db is violated. So I don't
think the other options are appropriate, even though you might get
them to work.

Nuno Lopes

unread,
Sep 2, 2010, 3:20:01 AM9/2/10
to ddd...@googlegroups.com
Well, how would you implement number 4? Iterate over all events stored looking for a match? While you are iterating, what if another event looking the same pops in?

Nuno

Sent from my iPhone

Stefan Holmberg

unread,
Sep 2, 2010, 4:12:54 AM9/2/10
to ddd...@googlegroups.com
@nuno.
Not sure exactly what you mean " what if another event looking the
same pops in" so this might not be an answer but in short (at least
how I have done it), the duplicate check handler uses a table of its
own. Storing say aggregateid along with email. So the duplicate check
handler does not parse/read eventstore but simply queries its very own
table. Already exist, send command to domain to handle it.

As for introducing the extra table if you worry about that, the whole
"solution" is the fact that "contraint check is outside of the
domain". The domain instead handles the duplicate cases.

My event handlers are separate processes chasing the eventlog based on
SLA, and so far I have had no reason to make them multithreaded if
thats what you
mean with "what if another event looking the same pops in" - so I am
simply not sure about those type of scenarios.

--

Udi Dahan

unread,
Sep 2, 2010, 4:32:25 AM9/2/10
to ddd...@googlegroups.com
He's referring to the problem with read-committed isolation where if two
transactions are running in parallel, both will still be able to succeed
resulting in a duplicate record. For table-level uniqueness, you need
serializable isolation resulting in a full-table lock.

Nuno Lopes

unread,
Sep 2, 2010, 6:05:19 AM9/2/10
to ddd...@googlegroups.com
Hi,

> Storing say aggregateid along with email. So the duplicate check
> handler does not parse/read eventstore but simply queries its very own
> table.

Where is that table described using pure Event Stores as defined by Greg? How consistent are the Tables you advised in relationship to the Event Store?

Are we implementing it ourselves or relying on some third party product data facilities?

If are implementing ourselves how smart is our implementation?

Are we doing a full table scan and locking the entire table or are being smart about it? If so serializing every single write to the Even Store buckets (key, value)

Or are we organizing indexes on a BTree and locking and freeing the necessary subtrees has we scan?

What about inserts in those indexes? Are we locking the all thing again as such serializing writes in our even store too?

Or are being smart again and locking only what is needs when we rebalance the btree?

Then we have availability. What happens if the system containing the table goes down? All systems goes down? Use secondary a secondary system for backup? Are we distributing our table and using some form of quorum technique?

Listen to what Udi said:

"I'd recommend not reinventing the unique-constraint wheel."

I think what Udi is advising us to reuse the facilities provided by current products in the market for that. There is a lot of knowledge in them. Databases providing these facilities are not BullshitDatabase systems unless you know very little about it.

Basically look for a real databases providing "unique constraints" unless you are planning to build a product out of your solution.

They don't necessarily need to be commercial. For instance, you can use reuse MySQL unique constraints facilities in conjunction with Cassandra, but then you need 2PC across both systems has far as I understand which can bring a penalty. You can shard your index tables etc etc etc.

Or you can use just use a commercial RDBMS (MSSQL, Oracle etc etc) storing the AR in a "Blob" and put fields requiring unique constraints outside and define an index over them.

If you have a FREE product that does this all in a full distributed system I would for sure like to know.

This is pretty much how I understand Udi observation.

Finely, implementing unique constrains has nothing to do with consistency but keeping your system sound (http://en.wikipedia.org/wiki/Argument#Soundness). Consistency is a another beast (http://en.wikipedia.org/wiki/Consistency).

Consistency may be overrated but soundness is an imperative. The impact of lack of soundness basically means that your assets are flowing throw tiny tiny holes in you wallet and no one knows why, or worst you don't notice that it is.

I'm assuming that we are discussing very large datasets, otherwise why bother. On another note, unique constraints can be used not only to establish uniqueness over all Aggregates of a kinf (say email of a Costumer) but also within an Aggregate. (say that the same product cannot be inserted twice in an order, not a very good example but illustrates its usage)

Hope it helps.

Nuno

Stefan Holmberg

unread,
Sep 2, 2010, 6:56:34 AM9/2/10
to ddd...@googlegroups.com
@nuno, you got me really confused...

of course my "special table" is in SQL Server or whatever, a real
database. I apologize if I somehow gave you the impression I have
implemented my own database???

"using pure Event Stores" ?. Handles events by reading from eventstore
and inserts into the special table. Which has contraints. So I am
using the database for contraints handling.

as for "Or you can use just use a commercial RDBMS (MSSQL, Oracle etc


etc) storing the AR in a "Blob" and put fields requiring unique

constraints outside and define an index over them." you do have a
valid point, as it would break case the break already in
commandhandler.

Cause what I mean by saying I am indeed reinventing the
unique-contraint wheel, its exactly what I do right now with handling
it later and "rollback" with a compensating command. But basically its
good enough for me atm.

--

Nuno Lopes

unread,
Sep 2, 2010, 11:16:13 AM9/2/10
to ddd...@googlegroups.com
Hi Stefan.

I thought you where implementing your own unique constraint mechanism. Table can mean many things :)

> as for "Or you can use just use a commercial RDBMS (MSSQL, Oracle etc
> etc) storing the AR in a "Blob" and put fields requiring unique
> constraints outside and define an index over them." you do have a
> valid point, as it would break case the break already in
> commandhandler.


Yes. Much simpler.

> But basically its
> good enough for me atm.


Ok. In cases where you don't have high concurrency requirements on writes over the same Aggregate it might just work.

In the scope of persistence my impression over this is that that one solution is creating a problem whose solution requires the first solution. So one feeds the other. The event store requires an external index table to maintain unique constraints. To avoid 2PC between both sometimes we need to compensate, as such validating the need for an Event Store in the first place. To me it is silly.

But if both Event Store and the index Table are modeled in the same datastore supporting transactions, you don't need to compensate at all.

All in all if we just decided for the event store for some other reason then persistence and compensation and an Aggregate instance does not change often I see its benefits. As you may not need to roll back that much.

Cheers,

Nuno

Stefan Holmberg

unread,
Sep 2, 2010, 12:24:49 PM9/2/10
to ddd...@googlegroups.com
@nuno,

Ok, were understanding each other:) Actually I did code some lowlevel
database storage, along with a simple SQL parser, back in 1999 or so,
but it was not that funny so I stay away from that nowadays :)

as I said, I fully understand your points and the simplicity of it
should make it the obvious solution.

So why am I not doing it your way? While in practice I do use a
fullblown RDBMS (SQL Server) for my eventstorage but my goal is to get
rid of it. I therefore try to avoid introducing transactions/uow into
my commandhandlers. When I notice I need a UOW I know I probably
havn't modelled my AR right or it is a special case like the one were
talking about. And I work my way around it if possible.

More, for maximum throughput (while it might not be really needed in
my modest systems yet but still) : is checking for duplicates
THAT important for the business? Will duplicates happen often?
Cause we will get better throughput by not doing the checking from the
commandhandler and just allowing it and go on with the next one.

Whos right and whos wrong??...Its a personal opinion I guess, and what
I think is the correct way now might change later on when I do get
more experience.

--

Nuno Lopes

unread,
Sep 2, 2010, 12:48:23 PM9/2/10
to ddd...@googlegroups.com
Hi,

> Ok, were understanding each other:)

Good. I guess the difference between one and the other is:

1) We assume that that an unique constraint will be violated so we first check for it before writing (in the same transaction, one transaction)
2) We assume that an unique constraint will not be violated so we let i write and then check and rollback if necessary (3 transactions).

Since we have made checks in the UI (leaking business rules), the handler (leaking business) the probability of happening in the Repository is greatly reduced.

The second increases performance but makes the contract more complex only simplified if both clients and server work together.

My fear (probably emotional) is that if we "blindly" solutions based on this premisses I leave my system open to very very nasty attacks. An attack, leading the system to perform a huge amount of compensating actions, putting it into crawl. No to mention feeling the pipes with lots of messages.

Either solutions have little to do with UoW. UoW is relevant when we execute transactions across multiple entities, or in DDD terms ARs.

Cheers,

Nuno

PS: It is not about being write or wrong, but looking at several options ending up with best option for job.

stefan

unread,
Sep 2, 2010, 3:13:22 PM9/2/10
to DDD/CQRS
In your case about attacks, for a public system(?), I guess there is
no doubt. It could be a risk, and business would probably(??) decide
the check need to be done asap (your no 1) if you present the risk to
them. Since its also our job as developers to help clients with
explaining risks your thinking about attacks I will carry with me as
I have a public web project in my pipeline.

My point about uow was meant as more general: ouw/transactions/
unneccesery database lookups/inserts involving multiple tables. In
general thats how I spot things in my code which might put me in
trouble in the future. Being a tech nerd as well, I do see cqrs also
as a way to shorter/fewer transactions involving fewer tables => great
perf and no dependency of expensive database vendors. Although I do
admit I sometimes/fairly often mix terms up :)

your no1, no2 sums it all up nicely as for what would be ultimate from
the technical view :)

Chris Nicola

unread,
Sep 2, 2010, 4:17:35 PM9/2/10
to ddd...@googlegroups.com
I would agree with Udi on the issue of "do you need event sourcing for adding a user?".  Does adding a user require DDD?  Do we need to have a history around this event?  Can this event happen with consistency or do we need higher availability or partioning around this?

Arguably in any system the task of creating a new user on the system should happen with far, far less frequency compared to the task of doing something with that user.  For situations where it is both simpler and more important to have C than A or P then why not?  CQRS gives us options, one of them should be the option of whether or not we need to use event sourcing.

Chris

Suirtimed

unread,
Sep 3, 2010, 9:49:56 AM9/3/10
to DDD/CQRS
How often do you run into a situation where more than one 'person'
uses a single e-mail address?

What is the harm of publishing a NewUserCreatedEvent with a duplicate
e-mail address?

You may need to use idempotency to check that an e-mail address
doesn't exist before you do the "insert" as a new record into your
report so that you don't have multiple rows with the same e-mail
address, but I haven't come up with any situations where a duplicate
registration is that destructive to the business. Just remember that
your report is likely to be eventually consistent. If you can't
tolerate a duplicate, then you may need to evaluate all of the
NewUserCreatedEvents from your event store before publishing the
message.

Sebastian

unread,
Sep 3, 2010, 11:24:05 AM9/3/10
to DDD/CQRS
Greg has already posted about this problem a couple of days ago in his
blog.

The key is "what is the impact of having a business failure".

Cheers.

Nuno Lopes

unread,
Sep 3, 2010, 7:34:40 PM9/3/10
to ddd...@googlegroups.com
A failure integrated in the business process is a business exception. Not integrated is a technical failure, not a business one. That is how things work in real life. 

When the exception becomes the rule then you start having a quality problem.

All in all there should be a business case to drop any business constraint during system design. In principle is not good. Talk money or deal with it.

Nuno

siggimoo

unread,
Sep 12, 2010, 12:38:51 AM9/12/10
to DDD/CQRS
The impact of a duplicate constraint violation probably varies
depending on who you ask. Presently, I'm trying to understand the
similar problem of sequential integer ID assignment for new objects.
The accountants I've spoken to get very nervous when you talk about
issuing duplicate IDs.

Carl Hörberg

unread,
Sep 13, 2010, 7:52:05 AM9/13/10
to ddd...@googlegroups.com
Hi/Lo algorithm? 

siggimoo

unread,
Sep 13, 2010, 3:24:37 PM9/13/10
to DDD/CQRS
Carl,

That is one approach, but I think even a simple sequence can be
achieved. Adding a new customer, for instance, could start with a read-
side query:

SELECT MAX(id) + 1 FROM customers;

With an index on the id column that shouldn't be more than a quick
index-scan. For extra performance, however, one could have a table of
max-ids by type:

SELECT customer + 1 FROM max_ids;

Once the client has a new ID, a command can be sent:

EstablishNewCustomerAccount(id, name, address, ...)

That would result in the creation of a new aggregate:

new Customer(id, version=0, name, address, ...)

It would also result in an event:

NewCustomerAccountEstablished(id, name, address, ...)

And finally, that would result in an update to the read-model:

INSERT INTO clients (...);
UPDATE max_ids SET customer = ?;

That should handle most cases I would think. However, in the unlikely
event that two clients simultaneously get the same new ID from the
read-model, only the first of their EstablishNewCustomerAccount
commands would succeed. The second should fail with a consistency
violation as there already exists a version-0 Customer aggregate with
that ID.

- Milo

Scott Reynolds

unread,
Sep 13, 2010, 3:29:10 PM9/13/10
to ddd...@googlegroups.com

Do what this guy says.... hi/Lo. 

Milo Hyson

unread,
Sep 13, 2010, 3:59:28 PM9/13/10
to DDD/CQRS
Scott,

Could you explain the reasoning for choosing this approach?

Thanks.

- Milo

On Sep 13, 12:29 pm, Scott Reynolds <sco...@reynoldsphotography.com>
wrote:

Scott Reynolds

unread,
Sep 13, 2010, 4:19:57 PM9/13/10
to ddd...@googlegroups.com

I use this to generate ids for the domain objects (when the business object requires a friendly id). I do it this way because I have less db hits to generate a sequential number. It's not ideal hitting the dbase everytime and the previous code will fail .... two reads to the same table at the same time will have duplicates. Hilo is a good trade off for performance.

Personally I use guid ids and wouldn't use friendly ids unless I had too.... which is almost never.

Finn Neuik

unread,
Sep 13, 2010, 4:32:32 PM9/13/10
to ddd...@googlegroups.com
I agree 100%, plus this isn't the same issue at all as the 'unique constraint on e-mail' at the start of this discussion. This is about generating a unique key rather than ensuring a unique key is entered from outside the system.

If you can you want to be generating your keys independently of the database otherwise you won't gain anything from generating them client side - which is why GUIDs are a better choice. Where this isn't possible (as GUIDs aren't friendly numbers to give to a customer over the phone) then high-low is a good compromise and simpler than the one you propose (and that's before you hit the headache of dealing with collisions - which would definitely happen).

Scott Reynolds

unread,
Sep 13, 2010, 4:37:03 PM9/13/10
to ddd...@googlegroups.com

Actually guids are still  good.choice and should still be used for your id. The long value is is just another field.

Finn Neuik

unread,
Sep 13, 2010, 4:41:19 PM9/13/10
to ddd...@googlegroups.com
True, I automatically use GUIDs for all IDs to keep the infrastructure consistent - but in my head I still think of the other field as an ID! Old habits die hard I guess...

Milo Hyson

unread,
Sep 13, 2010, 4:47:48 PM9/13/10
to DDD/CQRS
To avoid confusion, I'll start a separate thread.

Thanks.

- Milo

On Sep 13, 1:32 pm, Finn Neuik <fin...@gmail.com> wrote:
> ... this isn't the same issue at all as the 'unique

Youssef Sherif

unread,
Dec 16, 2017, 5:22:11 AM12/16/17
to DDD/CQRS

Youssef Sharief

unread,
Dec 16, 2017, 5:22:17 AM12/16/17
to DDD/CQRS
Why don't you check that AggregateCreatedEvent is not there while repeating your events (if there is a snapshot) or check that AggregateCreatedEvent should only be the first event (in case there is no snapshot)
Reply all
Reply to author
Forward
0 new messages