But Uncle Bob, entities have ids in databases, but references/pointers at runtime.

1,208 views
Skip to first unread message

Gareth Wedley

unread,
Sep 10, 2014, 9:41:12 AM9/10/14
to clean-code...@googlegroups.com
Hi Uncle Bob,

I've just been watching the Java Use Case Study Episode 2 part 2. One of the reasons I downloaded it was that I was hoping to have my ideas validated about how to manage an entities identity between the persistent layer and the application. It was really interesting to see that you both also struggled with this. 

I wanted to share my ideas on the off chance that by some miracle (and a lot of reading and self learning) that I maybe have a potential solution. If it turns out to be a load of rubbish, its back to the drawing board.

From Eric Evans DDD we know that entities are objects that have identity and have a lifecycle. For most of them, they get moved between an instantiated object and a data structure in a database many times over their lifetime, but I believe that database and objects manage the identity of entities in different ways.

Lets start with the easy side - the database. As you pointed out in the video, when a user object is created in a database, an unique id is added to it. This id will remain until the user is deleted and is the way a database handles identity. Its also very tempting to use this id in our objects, but I feel this is a persistent detail that shouldn't leak into our application abstraction.

I don't believe the database id has any place on a user object. Instead, we need to look at the difference between things being equal, and things being the same.

As you pointed out, if you created two new user objects they will both be equal. But they won't be the same object - they have completely separate identity even at this stage. In my eyes, the identity of an object is managed by the bits of memory it takes up at runtime. 

$user1 = new user();
$user2 = new user();

Im only really familiar with php. In php we have two types of 'equality'.

$user1 == $user2

The above would return true as both objects are equal. That is to say that the bits that make up the objects in memory have the same values.

$user1 === $user2

This would return false as the objects are not the same. That is the variables user1 and user2 are pointers to different bits in memory. The bits may or may not have the same values.

Maybe your isSame(user) method should have done something like:

return $this === $user;

You then avoid having to litter your objects with database implementation details. When they are eventually persisted over the gateway, a database will then give them unique ids which it uses to manage the identity.

One thing I guess we do need to be careful of is how do we save an existing object back to a database if it has been modified as we don't have the id on the object. As you mention in some of the previous videos, entities are abstract. The full implementation of those objects are below the line (in the persistence layer). Here we can have ids on the object as this layer is allowed to know about the database details. The implementation of the gateway can then use the ids to ensure it saves the object back to the correct database row. 

Welcome your thoughts :-)



Sebastian Gozin

unread,
Sep 10, 2014, 12:33:34 PM9/10/14
to clean-code...@googlegroups.com
In Java $user1 === $user2 is written as user1.equals(user2) unless the equals method is overridden on the User class.
In the case study they did not do this so object equality could be checked via the equals method.

However identity equality is what they check with the isSame(user) method.
They knew the instances were not the same but they wanted to treat them the same anyway when the identifier was the same.
I believe the main reason for this is that it is not practical to preserve object references when dealing with serialisation and deserialisation (persistence) hence the use of the id as a reference.

I feel like I did not properly answer your thought but I am also a bit confused about where you want to go exactly so perhaps some extra discussion would help ^^

Łukasz Duda

unread,
Sep 10, 2014, 1:28:21 PM9/10/14
to clean-code...@googlegroups.com
Hi Gareth,
I know I'm not Uncle Bob ;-P but
I think it's not a simple problem,at least not one we can resolve with language syntax. Eric Evans suggests, that value objects should always have well defined equality, but entities are not the case. We usually want to know if two entities are referring to the same object. Sometimes even ID is not enough to answer this question.

Greetings
Lucas

Gareth Wedley

unread,
Sep 10, 2014, 1:52:58 PM9/10/14
to clean-code...@googlegroups.com
In java, is there any way to check if two variables hold a reference to the same object? If yes, thats what should be done.

There are no 'objects' in a typical database so we cannot do this. Instead we flatten the object and represent identity with an id field.

Regarding the serialisation and deserialisation of the object, that is still being handled, but happens below the line i.e. we have a User class which is extended by PersistentUser. PersistentUser is allowed to have the database id stored in it as its in the persistent layer.

I think the entities should be retrieved through the gateway findUser(username) which returns a reference to a PersistentUser object (the application will only see it as a User object as knows nothing about PersistentUser). Subsequent calls to findUser(username) will check if a user object has already been loaded with a matching username. If it has, the method returns the same object. You then enforce that anyone who calls findUser(username) with the same username will hold a reference to the same object, thus preserving the identity during runtime.

Carlos Buenosvinos

unread,
Sep 10, 2014, 2:50:52 PM9/10/14
to clean-code...@googlegroups.com
Take a look into chapter 5 of "Implementing Domain-Driven Design" from Vaughn Vernon, Entities.

Sent from my iPhone
--
The only way to go fast is to go well.
---
You received this message because you are subscribed to the Google Groups "Clean Code Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discu...@googlegroups.com.
To post to this group, send email to clean-code...@googlegroups.com.
Visit this group at http://groups.google.com/group/clean-code-discussion.
Message has been deleted

Michel Henrich

unread,
Sep 10, 2014, 6:14:51 PM9/10/14
to clean-code...@googlegroups.com
The identity is also required for reasons that have nothing to do with the database.
For instance, how do you know which User to modify when a simple UpdateUser use case is triggered? The user has to have an identity. 
Sometimes you can say that the identity is the "username", or the e-mail. For other cases, such as a PurchaseOrder, you may simply have an auto-number ID, since it doesn't make any sense to ask someone to give a "name" to the order.
In the end you have to expose this identity to your consumer (UI, rest-api, whatever), so that it is used when performing any sort of modification/query on this specific entity. You can't do that with a memory reference because it is transient.

So, I agree with you that an entity's identity should not be coupled to the database that is used underneath the application, but I disagree that it can be abstracted away. The application must choose how to identify an entity and stick to it, independently of the persistence mechanism.

Kind regards,
Michel

Gareth Wedley

unread,
Sep 10, 2014, 6:29:13 PM9/10/14
to clean-code...@googlegroups.com
Makes a lot of sense. Thanks Michel.

Not too sure what all the fuss was about in the episode then! Why not use username as the identity? Maybe we'll see this develop in later episodes.

Michel Henrich

unread,
Sep 10, 2014, 7:09:01 PM9/10/14
to clean-code...@googlegroups.com
There's an kind of an unspoken rule nowadays that all entities should have a unique ID independently of any attributes that it may have. This allows for greater flexibility, for instance, if you want to allow your users to change their username or e-mail (like facebook does). Also it simplifies the way the application deals with entity identification by having a "pattern" to follow (like always having a UUID). If I remember correctly, the fuss was only because of the usage of UUID, which UB is clearly against - which makes me wonder if they will change to something else later.

Gareth Wedley

unread,
Sep 11, 2014, 2:51:01 AM9/11/14
to clean-code...@googlegroups.com
Thanks Michel. Just re-watched the episode and can see now that it is more down to the use of UUID. Amazing how things look from a different view point.

Glad I've been steered in the right direction. Im sure my hair brained idea would have been disastrous at some point! Thanks for taking the time to fully explain your thoughts in your answer.


Jakob Holderbaum

unread,
Sep 11, 2014, 2:52:25 AM9/11/14
to clean-code...@googlegroups.com
I found this answer to be the most appealing. :)

I dealt with this issue in a lot of applications and there are several
facts that are just difficult to discuss "away":

* There will be a lookup from different use cases based on an unique
identifier. This identifier should probably not be a domain-related
thing because than you couple your domain attributes directly to your
persistence mechanisms. And since object instances are highly transient
(and kind of an implementation detail) they are just not suitable to
determine identity.

* Identifiers are the only commonality between the inherently different
worlds of "serialized data rows" and "object instances in memory".

So I strive for a semantic decoupling of the generation of the UID from
the actual persistence. Vernon suggests this approach of exposing a
service in your domain that can give you a new UID for a new entity of a
specific kind (e.g.: `userAuthority.nextUid()`). If this service is
given as an interface suddenly a lot of possibilities arise. A simple
approach would be using UUID or an incrementing integer as a first naive
implementation. A more advanced approach could be implemented by using
the physical persistence layer to implement a persisted counter that is
used to generate unique UIDs.

Anyway the point is that this approach moves the generation of unique
identity away from the actually persisting layer towards the core of
your domain where it is way more flexible.

Since I use this approach a lot of problems with (otherwise artificially
complex feeling) identity management have just vanished.

WDYT?

Cheers
Jakob

On 09/11/2014 01:09 AM, Michel Henrich wrote:
> There's an kind of an unspoken rule nowadays that all entities should have a unique ID independently of any attributes that it may have. This allows for greater flexibility, for instance, if you want to allow your users to change their username or e-mail (like facebook does). Also it simplifies the way the application deals with entity identification by having a "pattern" to follow (like always having a UUID). If I remember correctly, the fuss was only because of the usage of UUID, which UB is clearly against - which makes me wonder if they will change to something else later.
>

--
Jakob Holderbaum, M.Sc.
Systems Engineer

0176 637 297 71
http://jakob.io
h...@jakob.io
@hldrbm

Jakob Holderbaum

unread,
Sep 11, 2014, 3:12:05 AM9/11/14
to clean-code...@googlegroups.com
By the way, I forgot to mention that I am also not a big fan of the
usage of UUID even in small scopes. Just the feeling that there is the
probability of collision gives me a bad feeling.

Persisting a counter is a good alternative. If you for specific reasons
want or have to avoid any call to a physical persistence layer for the
generation of UIDs you can use something like this (borrowed from Impl.
DDD by Vernon):

USER-20140911-73f34c5a

The last part is the first segment of a UUID. If you seed it with the
current time stamp and nanoseconds you wont have a collision even in
systems with a lot of load. You could think of adding even more specific
details as seed to actually eliminate the risk of collision.

By scoping it into the day of creation, there is a meaning for the UID
and the collision scope is at an absolute minimum.

My additional 2ct.

WDYT further?

Cheers
Jakob

Mark Badolato

unread,
Sep 11, 2014, 1:36:22 PM9/11/14
to clean-code...@googlegroups.com
On Thu, Sep 11, 2014 at 12:11 AM, Jakob Holderbaum <mail...@jakob.io> wrote:
By the way, I forgot to mention that I am also not a big fan of the usage of UUID even in small scopes. Just the feeling that there is the probability of collision gives me a bad feeling.

I get where you're coming from with this, but I really don't think it's at all a concern. 

A few months back, we decided to use UUIDs. Before we made the switch, I wanted to run some collision experiments. I created a simple script to do uuid generation and ran it simultaneously on multiple boxes.  Some boxes/environments were identical, some were a bit different (i.e., some identical FreeBSD 10 instances---Some bare metal, some virtualized; all configured identically), a FreeBSD 9.2 instance, an OS X instance, a Windows instance, an Ubuntu instance, a few more),  Some I had set to the same time, some to offset times, some to UTC, etc. Each one generated a billion uuids.  

When I was done I had 10s of billions of uuids generated.  There was not one duplicate in the entire lot. So, granted there could be factors that *could* cause a collision. I don't think it's a concern (for us at least) so we went ahead and used uuids.

mark

--
Mark Badolato 

Twitter: @MarkBadolato

Martin Lee

unread,
Sep 11, 2014, 6:03:16 PM9/11/14
to clean-code...@googlegroups.com, mail...@jakob.io
There seem to be several variants for how UUIDs are generated (http://en.wikipedia.org/wiki/Universally_unique_identifier). Historically there was a version which looked a lot like this (version1) based on the MAC address and the nano-second resolution time, but it this scheme had several criticisms. You might want also to consider hashing the result so that it has no usable meta-information in it (depending on the use cases and security concerns). 

I thought of this other alternative to the identity leakage also:
a derivative of the "user" class can contain extra fields required for its persistence in a way transparent to any code which depends on the "user" objects.
for instance, the identity field from the database, and a reference to the relevant gateway (or a means to retrieve the same).

class User{public name; public dob;}
class SQLUser: public User{private uuid; private SQLGateway;}

another thought: is it actually a problem for an identity field to be "common knowledge"? it is a form of abstraction, the most abstract form for an identity is one which may be usable by any client/server implementation combination. Just like a string based abstraction sometimes used for decoupling callers from implementations in factory-things. (see CleanCoders episode 26)

Martin

Jakob Holderbaum

unread,
Sep 12, 2014, 2:39:23 AM9/12/14
to clean-code...@googlegroups.com
Well yeah, you basically did experiments which support the math.
Collisions are highly unlikely. But probability is never, not even the
smallest one, equal to impossibility.

So, even the smallest probability will lead at some point in time in a
system somewhere to collisions.

In systems with central persistence layers and what not the effort of
generating identifiers is so low that it should be done.

In highly distributed peer-to-peer systems it's a different story. There
is definitely the need for some means to retrieve unique identifiers.

I think this discussion will never cease, just because of the different
(both valid) perspectives on the idea of probability.

I wouldn't want to rely on UUID generation but I can understand if
people think less absolutistic about them.

Cheers
Jakob
Reply all
Reply to author
Forward
0 new messages