Preventing duplicates. Can be done in aggregate root but inefficient. Use domain service instead?

258 views
Skip to first unread message

dav...@hotmail.co.uk

unread,
Nov 15, 2016, 3:09:16 AM11/15/16
to DDD/CQRS
Let's say we have a bounded context called ETraining (Electronic Training). It allows students to signup to a training programme and download the weekly programme PDF which has training instructions.

A course has a running time which is a datetime range. The time it starts running the time it stops running.
A course has multiple programmes.
A programme consists of a course and a subject.

I have an invariant which is to ensure that a course can only have a programme per subject. If I tried to add subject A to the same course twice that is not allowed.
With a course being the AG (Aggregate root) and holding a collection of programmes I can enforce this invariant, but not very efficiently. If the number of programmes goes into the 1000s it means every time I load this course AG to add a new programme it needs to load 1000s of programmes and iterate through them all. It works but I'm not happy with it. I do not want to use lazy loading, I think it's a sign the aggregate is too big.

Later on students will need to enroll in a programme. A student can only be enrolled in a programme once. To enforce this invariant I'd have to keep a list of all registered students in each programme and iterate through them all. The amount of students could be up in the 10s of 1000s so this is extremely inefficient.

I feel like the course and programmes are merely acting as containers for other entities. Yes they hold lists of the entities in order to enforce the invariants but in an extremely inefficient way.

Would domain services be the correct choice in this scenario? In the domain services I could use a repository to find out if there are duplicates being added or not. This would be much faster than manually iterating over the contained entities to do the checks since the database would use indexes and is just designed better for this sort of work.

Any thoughts?

Michael Yeaney

unread,
Nov 15, 2016, 8:59:30 AM11/15/16
to ddd...@googlegroups.com
Any reason not to use the student as the root? In that model, the student holds a list of the courses/programmes they have signed up for (that's a much smaller list). This seems to scale much better, and avoids the problem of 1000's of students (or more).

--
You received this message because you are subscribed to the Google Groups "DDD/CQRS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/dddcqrs.
For more options, visit https://groups.google.com/d/optout.

David

unread,
Nov 15, 2016, 9:23:41 AM11/15/16
to DDD/CQRS
The student is it's own Aggregate. It has an id, name, and email. I don't think the courses or programmes should be tied to it.
To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+u...@googlegroups.com.

Michael Yeaney

unread,
Nov 15, 2016, 12:21:29 PM11/15/16
to ddd...@googlegroups.com
Apologies - I didn't mean to imply the Student held the entire course/programme entity. Rather, an identifier that points to a specific one (i.e., List<guid>). This still allows separate transactional boundaries, but allows much more efficient way to track what a student is signed up for.

Note you also have a list of registered students in each course/programme that supports different use cases (i.e., class roster, etc.).

To unsubscribe from this group and stop receiving emails from it, send an email to dddcqrs+unsubscribe@googlegroups.com.

Kasey Speakman

unread,
Nov 16, 2016, 10:22:24 AM11/16/16
to DDD/CQRS
I am working in a similar area.

For me, the proper AR is the Registration. That is, the combination of student and course. A registration has a well defined and relatively short-lived scope. Most the interesting business events have to do with registrations. (RegistrationCanceled, TrainingStarted, TrainingCompleted, TestingStarted, TestingPassed, TestingFailed, TestingTimedOut, etc.) The student and course are just configuration data (Entities?). Granted, this configuration data must also be versioned so you can answer as-of queries for historical reasons.

As far as enforcing uniqueness, this post is commonly mentioned.

Michael Ainsworth

unread,
Nov 21, 2016, 8:30:46 PM11/21/16
to DDD/CQRS
David,

I'm (still) learning a lot about DDD, CQRS and ES, so take this with a grain of salt.

"A course has multiple programmes. A programme consists of a course and a subject."

A course "has" a programme, and a programmer "consists of" a course? This is a rather ambiguous definition of the relationship. Generally speaking, relationships should be unidirectional.

I'm not sure what the difference is between "programme" and "subject". Can you give a concrete example? E.g., "a subject is a focused area of learning, while a programme provides an overview of the material to be learned with a scheduled ordering".

Kasey Speakman sounds like he's spent some time analysing this problem space.

In regards to uniqueness and set validation, I would say you have three categories of options:

Category 1. Prevent it from occurring on the client by using the read model before issuing the command. This works, but if you have multiple clients (native mobile vs HTML5 web application), then you need to duplicate this logic.

Category 2. Prevent it on the server as well. This typically uses a domain service. If you're doing strict eventual consistency (for distributed reasons), then there's a small window (milliseconds, probably) in which the uniqueness domain service is not up to date with the aggregate data.

Category 3. Detect it after the fact and compensate. I'm still learning that this (coupled with category 1) is often a good solution.

Note that I've used the word "category" above, because there are multiple ways to do it in each category. For example, to solve the age-old "unique username" scenario using category 2, you could have:

A. A "domain service" that is a wrapper around a simple SQL table.

B. A big fat aggregate containing a list of all usernames and the UUIDs to which they map.

C. A "lookup" aggregate, e.g., "UsernameList". The UUID of this aggregate is a deterministic UUID based on the username, and contains an array of the UUIDs of the users. That is, if person A registers with the username "michael", the system generates a deterministic UUID using this username to identify the UsernameList "e1ac86e6-dd0b-43e3-b9f8-122b81d26289". This UsernameList contains a single UUID pointing back to the user registered by person A. If person B also register with the username "michael", the same UsernameList would be loaded (because the UUID is deterministically generated based on the username "michael") and person B's UUID is added to the UsernameList.

There's 3 options in this one category, with option C effectively being a hash table of linked lists accessible to the domain. This can be distributed (unlike option A - you can't easily distribute a single SQL table), and is scalable with a smaller probability of contention that option B.
Reply all
Reply to author
Forward
0 new messages