API IDs and surrogate keys

1,542 views
Skip to first unread message

Ryan Hiebert

unread,
Aug 21, 2013, 6:40:57 PM8/21/13
to api-...@googlegroups.com
It seems to be a common pattern to use urls like

/wombles/{id}

If I use truly surrogate keys in my database, I really don't want to
expose any database primary keys, instead creating some other column
for that identification, set with a unique constraint. Otherwise, that
new ID will end up becoming "business logic", and I'll have destroyed
the value of my surrogate keys.

Do you think that using the {id} in a URL as in the above example
would constitute "business logic"? If so, what field name in the
database would be good to use for keys, since "ID" is very commonly
exposed with business meaning.

Thanks,

Ryan

Jack Repenning

unread,
Aug 21, 2013, 7:01:01 PM8/21/13
to api-...@googlegroups.com
On Aug 21, 2013, at 3:40 PM, Ryan Hiebert <ry...@ryanhiebert.com> wrote:

> If I use truly surrogate keys in my database, I really don't want to
> expose any database primary keys, instead creating some other column
> for that identification, set with a unique constraint. Otherwise, that
> new ID will end up becoming "business logic", and I'll have destroyed
> the value of my surrogate keys.

There is no obligatory, or even assumed, requirement that the {id} in the resource (URL) is identically the primary key in the database. Pick whatever mapping seems right between resource and data (controller, model, and storage).

I'm not clear why your DB primary keys could "become business logic." I've worked in contexts where certain operations (like updates) could change the DB auto-generated "row number", is it something like that?

It is, however, expected that "/resource/{id}" is a reliable, persistent identifier of the same object forever, which was not the case for the auto-generated DB row ID in the context I mentioned, so we had to create another column for that. We also had to mark the created column as the primary key, both for the DB and the model, since internal logic was just as dependent on a persistent object id. Is your "surrogate" something of this sort?
signature.asc

Kevin Swiber

unread,
Aug 21, 2013, 7:39:59 PM8/21/13
to api-...@googlegroups.com
Hey Ryan,

Many database technologies, especially of the horizontal scale variety, prefer to use a natural key over auto-generated GUIDs and auto-incremented numbers.  For database purposes, this natural key acts as a great shard key for large data sets; it can be used to partition data across nodes in a cluster.  The same benefits hold true for large resource sets behind URLs.

All that just to make the point that there are many benefits to using a natural key over an auto-generated key, whether for resources in an API or rows in a database.  I would lean that way, though it often requires more thinking to get right.  For some business domains, this is not trivial.  At a previous company, we used a key generating service that pulled in related data to form a meaningful and unique natural key.  Evaluate the size of your resource set and its rate of growth.  It could be possible that an auto-generated key works just fine if the resource set is small and isn't expected to greatly expand over time.

WIth respect to your other concern: I don't see a resource locator as a piece of business logic.  Business logic involves domain-specific rules surrounding business processes.  A URL is an address.  For HTTP, a URL acts as a way for servers to accept requests from clients.  For REST, a URL is a way for clients to speak to a resource so the resource can return a representation of itself to the client.  The natural key becomes a technical detail, not necessarily a business detail.


Cheers,
 
--
Kevin Swiber
Projects: https://github.com/kevinswiber
Twitter: @kevinswiber

Ryan Hiebert

unread,
Aug 22, 2013, 1:43:06 AM8/22/13
to api-...@googlegroups.com

On Aug 21, 2013, at 4:01 PM, Jack Repenning <repenni...@gmail.com> wrote:
>
> I'm not clear why your DB primary keys could "become business logic." I've worked in contexts where certain operations (like updates) could change the DB auto-generated "row number", is it something like that?
>
> It is, however, expected that "/resource/{id}" is a reliable, persistent identifier of the same object forever, which was not the case for the auto-generated DB row ID in the context I mentioned, so we had to create another column for that. We also had to mark the created column as the primary key, both for the DB and the model, since internal logic was just as dependent on a persistent object id. Is your "surrogate" something of this sort?

Yes, that's exactly the problem I'm thinking about. Thanks for your example. From your comments, I conclude that it is probably better, if you are using surrogate keys, to not use them in URLs, in favor of another unique identifier for the purpose.

Ryan Hiebert

unread,
Aug 22, 2013, 1:49:19 AM8/22/13
to api-...@googlegroups.com

On Aug 21, 2013, at 4:39 PM, Kevin Swiber <ksw...@gmail.com> wrote:
>
> Many database technologies, especially of the horizontal scale variety, prefer to use a natural key over auto-generated GUIDs and auto-incremented numbers. For database purposes, this natural key acts as a great shard key for large data sets; it can be used to partition data across nodes in a cluster. The same benefits hold true for large resource sets behind URLs.
>
> All that just to make the point that there are many benefits to using a natural key over an auto-generated key, whether for resources in an API or rows in a database. I would lean that way, though it often requires more thinking to get right. For some business domains, this is not trivial. At a previous company, we used a key generating service that pulled in related data to form a meaningful and unique natural key. Evaluate the size of your resource set and its rate of growth. It could be possible that an auto-generated key works just fine if the resource set is small and isn't expected to greatly expand over time.
>
> WIth respect to your other concern: I don't see a resource locator as a piece of business logic. Business logic involves domain-specific rules surrounding business processes. A URL is an address. For HTTP, a URL acts as a way for servers to accept requests from clients. For REST, a URL is a way for clients to speak to a resource so the resource can return a representation of itself to the client. The natural key becomes a technical detail, not necessarily a business detail.

Thanks. I can see if you are using natural keys as your primary keys that there would be no need to surrogate things. I can see benefits to using natural keys, and I've not decided that I don't want to ever use them. However, mostly because I'm working with an ORM that doesn't do composite keys yet, I've opted to try using truly surrogate keys exclusively, and try to take it to the logical maximum.

Doing something in between would give me the downsides of both methods, without the benefits, but doing to an extreme will allow me to see more clearly the benefits and pitfalls of using completely surrogate keys.

Ryan

Toralf Richter

unread,
Aug 26, 2013, 11:22:28 AM8/26/13
to api-...@googlegroups.com
Hi Ryan,
to your earlier question, what  to call the "id" - we call the unique immutable identifier for an entity in our public API a UID. But naming is always dispitable. We generate those based on a certain codec which is signified by the first char/digit of the UID. Depending on codec used to compose (and if it comes in as a parameter decipher) the UID can include partition / shard keys, etc. ...

B/R, Toralf

Reply all
Reply to author
Forward
0 new messages