URIs must have IDs somewhere!

327 views
Skip to first unread message

Dave Schinkel

unread,
Aug 25, 2015, 9:12:26 PM8/25/15
to API Craft
Has anyone had a crazy debate where your DBA or other UX programmers think that to call your REST API, that they should never have to reference a resource by id?

Seems ludicrous to me, even having such a conversation.

Kin Lane

unread,
Aug 25, 2015, 10:01:17 PM8/25/15
to api-...@googlegroups.com
I'm guessing the thinking is around exposing the ids publicly vs obfuscating them. 

Meaning they don't want their incremental database ids exposed.

Seems like they are just shutting down convo vs exploring options for referencing resources via API.
On Tue, Aug 25, 2015 at 6:12 PM, Dave Schinkel <dsch...@gmail.com> wrote:
Has anyone had a crazy debate where your DBA or other UX programmers think that to call your REST API, that they should never have to reference a resource by id?

Seems ludicrous to me, even having such a conversation.

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

Dave Schinkel

unread,
Aug 26, 2015, 10:36:05 AM8/26/15
to API Craft, k...@apievangelist.com
You hit the nail on the head.

Kin Lane

unread,
Aug 26, 2015, 11:19:05 AM8/26/15
to api-...@googlegroups.com, k...@apievangelist.com
Just tell them that is a pretty week excuse and you are going to need them to be a little more creative.

I just encrypt mine with a secret key on way out, ensuring no special characters are in there.

Then run reverse on way in. 

Nobody will see your sequential ids.

I'm sure there are other better ways. Push them for their own solution.

Kin Lane


--

Jack Repenning

unread,
Aug 26, 2015, 1:00:02 PM8/26/15
to api-...@googlegroups.com

On Aug 25, 2015, at 6:12 PM, Dave Schinkel <dsch...@gmail.com> wrote:

Has anyone had a crazy debate where your DBA or other UX programmers think that to call your REST API, that they should never have to reference a resource by id?

There's language rather like this in our security guidelines. Drilling down on it a bit, the concern in our case is that it should not be possible to guess a valid id based on knowing some other one. For example, if I arrange to create an account at siteX, learning that I'm user #42, I might validly suspect there are also users #41 and #40, and soon #43 and #44. It's entirely likely that I can make some evil use of this, to concentrate some attack on those four ids instead of the whole universe.

We deal with this by generating a random id for each addressable object, at the time it's created, and using that rather than the sequential integers. They're big enough to ensure a sparsely populated universe, and no computable way to transform one such id into other, likely-valid ones. But they're also small enough to make viable database keys and efficient, performant operation.

-- 
Jack Repenning
Repenni...@gmail.com

signature.asc

Stefan Matheis

unread,
Aug 26, 2015, 1:57:47 PM8/26/15
to api-...@googlegroups.com
I've used http://hashids.org for some projects in the past, might be worth a look.

Of course that doesn't solve the problem, that everyone (who knows that it's a hashid) is still able to decipher your underlying integer :)

-Stefan

Cooper Marcus

unread,
Aug 26, 2015, 3:40:41 PM8/26/15
to api-...@googlegroups.com
Doesn't authentication and authorization take care of the "people can guess the next ID" problem? 

Who cares if people can know that you have a resource 123, and that you probably thus have a resource 124, if those resources are available via an authenticated/authorized API? 

I'm sure I'm missing something here ; )

Jack Repenning

unread,
Aug 26, 2015, 3:51:53 PM8/26/15
to api-...@googlegroups.com

On Aug 26, 2015, at 12:40 PM, Cooper Marcus <coo...@newrelic.com> wrote:

Doesn't authentication and authorization take care of the "people can guess the next ID" problem? 

Who cares if people can know that you have a resource 123, and that you probably thus have a resource 124, if those resources are available via an authenticated/authorized API? 

There are at least two reasons to care:

  • Cross-site scripting (and similar hacks), where the malefactor tricks you browser into executing code or visiting  URLs that depend on your existing log-in for access to do nasty things.
  • Layered defense tactics, where you suppose the first layer of defense (auth) might have some bug or back door that allows the malefactor to get past.

In protection against these cases, we make it hard for him to guess how to "spell" his evil command

-- 
Jack Repenning
Repenni...@gmail.com

signature.asc

Cooper Marcus

unread,
Aug 26, 2015, 5:46:02 PM8/26/15
to api-...@googlegroups.com
Great explanation, thanks Jack - I learned something here - two things actually! Much appreciated.

Carl Zetie

unread,
Aug 27, 2015, 11:27:32 AM8/27/15
to API Craft

Good info, Jack. Thanks.

I would only add: when the OP goes back to the DBAs to discuss this, they *may* have to take the time to explain why generating random IDs is OK (and how the system deals with the occasional collision). If they are "old school" DBAs, their default mental model may be from the traditional transaction processing / relational database world where it was critical to be able to generate transaction IDs as fast as possible, use them as unique keys, and commit them to the database as fast as possible. (Back in the 1980s / 1990s, one of the selling points of Oracle was that it had built in "sequence generators" for this very purpose.)

Carl Zetie

Tom Christie

unread,
Aug 28, 2015, 7:58:49 AM8/28/15
to API Craft
UUID Primary Keys for your database tables. Always.

Kijana Woodard

unread,
Aug 28, 2015, 11:12:43 AM8/28/15
to api-...@googlegroups.com
Many DBAs will take exception with that statement.
Besides, your persistence tech need not dictate identifiers.

On Fri, Aug 28, 2015 at 6:58 AM, Tom Christie <christ...@gmail.com> wrote:
UUID Primary Keys for your database tables. Always.

--

Dave Schinkel

unread,
Aug 31, 2015, 8:37:28 PM8/31/15
to API Craft
Thanks Jack.

Also, here's another issue.  What if your client doesn't know your IDs period.  We had the debate of having the client get a map of IDs but that seems inefficient.  We don't have ids in our website URLs so the web team sometimes won't have an id to pass to the REST API.  How is that dealt with?

Dave Schinkel

unread,
Aug 31, 2015, 8:38:50 PM8/31/15
to API Craft
I don't want UUIDs in my REST URIs.  They're long.  Verbose, and heavy footprint over the wire.

Dave Schinkel

unread,
Aug 31, 2015, 8:39:32 PM8/31/15
to API Craft
Then how has so many APIs and every article you see out there use IDs!  Even when we were doing RPC, we used IDs in those services.

Dave Schinkel

unread,
Aug 31, 2015, 8:40:23 PM8/31/15
to API Craft
Sorry Carl, Not sure what you are saying, can you say it in lemans terms :)

Dave Schinkel

unread,
Aug 31, 2015, 8:42:44 PM8/31/15
to API Craft
I mean you look at MOST any API out there, they usually use IDs and so have I at every company I've worked for in the past  

Dave Schinkel

unread,
Aug 31, 2015, 8:45:28 PM8/31/15
to API Craft
Look at this section:

Multiple ID Read Requests


Looks to me like Facebook allows clients to query and get the IDs of a page so they can start using them in their code and probably in subsequent requests to the API.

Jack Repenning

unread,
Aug 31, 2015, 8:58:35 PM8/31/15
to api-...@googlegroups.com

On Aug 31, 2015, at 5:37 PM, Dave Schinkel <dsch...@gmail.com> wrote:

Also, here's another issue.  What if your client doesn't know your IDs period.  We had the debate of having the client get a map of IDs but that seems inefficient.  We don't have ids in our website URLs so the web team sometimes won't have an id to pass to the REST API.  How is that dealt with?

It is expected / encouraged / required-in-order-to-hold-your-head-high-in-REST-land that clients don't know your object IDs!

There are (at least) two ways to deal with this (many APIs use both, in various combinations):

  1. clients are given a way to list accessible objects with minimal foreknowledge. The result of such a query is a list of whatever-stuff-you-have, including URIs (possibly URI templates) to more specific queries or individual objects.
  2. responses to calls like "show me the dog named Barfy" (for a pet-shop API) include Barfy's description (resource fields) and also a so-called "links section" with URIs for stuff like "adopt Barfy".

Either way, there's a certain amount of start-up cost, both in terms of a sequence of calls, and in terms of knowing how to get started. The notion is analogous to the basic web browsing experience: visit www.coolsite.com, see some links, click one, see more info including more links to other coolstuff.

-- 
Jack Repenning
Repenni...@gmail.com

signature.asc

Kijana Woodard

unread,
Aug 31, 2015, 9:01:18 PM8/31/15
to api-...@googlegroups.com
Having identifiers is one thing. Having identifiers dictated by a particular instance of a persistence engine is another.

Most people do the latter because "it's easy"....until it isn't.

For example, twitter used to e Snowflake[1] to generate identifiers. That's different from taking the next sequential id from mysql.

But if clients aren't constructing urls anyway....


Message has been deleted
Message has been deleted

Dave Schinkel

unread,
Aug 31, 2015, 9:05:43 PM8/31/15
to API Craft
So what you essentially just described are 2 things:

1) Clients can call an endpoint to get ID information.  Lets say they want to get IDs for all the countries, cities, and states in the world.  The API exposes an endpoint to provide that information.  Sure, we've thought about this.  But that means the client has to manage caching that and managing when it gets stale, and yada yada 
2) HATEAOS.  That doesn't always solve it though.  Sure you try to provide as much hypermedia as you want but it's not always the case that the client will still have all the ids it needs.  So then you're back to #1.  Then I get resistance from the web team because they don't wanna manage the IDs if they get that from my service on their side.

Dave Schinkel

unread,
Aug 31, 2015, 9:06:53 PM8/31/15
to API Craft
So can you explain to me how generating random IDs solves the problem of the client not having any in the first place?  random or not, I still don't see how this is solved.

Dave Schinkel

unread,
Aug 31, 2015, 9:07:22 PM8/31/15
to API Craft
right, lets take the web out of the picture..there are other apis that could call mine.


On Monday, August 31, 2015 at 8:01:18 PM UTC-5, Kijana Woodard wrote:

Dave Schinkel

unread,
Aug 31, 2015, 9:09:26 PM8/31/15
to API Craft
and can you explain your lines

Having identifiers dictated by a particular instance of a persistence engine is another.
Most people do the latter because "it's easy"....until it isn't.

On Monday, August 31, 2015 at 8:01:18 PM UTC-5, Kijana Woodard wrote:

Jack Repenning

unread,
Aug 31, 2015, 9:10:56 PM8/31/15
to api-...@googlegroups.com
On Aug 31, 2015, at 5:42 PM, Dave Schinkel <dsch...@gmail.com> wrote:

I mean you look at MOST any API out there, they usually use IDs and so have I at every company I've worked for in the past  

I'd hate to start a statistics war over which pattern is more common (simple IDs, or various means of obscuring them). But I do think, in a general sort of way, that the more sensitive the information at a site, the less comprehensible their URIs.


-- 
Jack Repenning
Repenni...@gmail.com

signature.asc

Dave Schinkel

unread,
Aug 31, 2015, 9:12:35 PM8/31/15
to API Craft
What about adding a column that increments in each entity that wouldn't have anything to do with the PKs.

Dave Schinkel

unread,
Aug 31, 2015, 9:22:55 PM8/31/15
to API Craft
I'm trying to understand the other side of the coin, only using natural keys.  I just don't see how that solves the problem and probably has its own issues.  I want to know how clients are able to become aware of identifiers when they don't have them.  Are they making calls to get identifiers for a list of resoruces if they don't and calling the API in those cases where they need to find out the id of something they have in order to act upon it and make a request to the REST API URI with the id that they didn't have but somehow now was able to get (I guess by calling the API to find out what the ids are associated to lets say a property on the resource?

Lets say I have John Doe and for whatever reason, the web team's code doesn't have an id for this guy.  But they wanna PUT to update something for that John Doe.  How would they obtain the id so they could make the PUT tot he REST API for /persons/12 (12 is john doe for example)?

Dave Schinkel

unread,
Aug 31, 2015, 9:23:49 PM8/31/15
to API Craft
Look at amazon's 

http://www.amazon.com/gp/product/0802849814?ref_=s9_qpp_gw_p14_d99_i5&redirect=true&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=192TMMGGZKY24P4EB4AE&pf_rd_t=36701&pf_rd_p=2079475242&pf_rd_i=desktop&pldnSite=1

Jack Repenning

unread,
Aug 31, 2015, 9:24:13 PM8/31/15
to api-...@googlegroups.com

On Aug 31, 2015, at 6:06 PM, Dave Schinkel <dsch...@gmail.com> wrote:

So can you explain to me how generating random IDs solves the problem of the client not having any in the first place?  random or not, I still don't see how this is solved.

Random IDs aren't about solving the initial-knowledge problem; they're a security measure. The "initial knowledge" problem really doesn't change much whether  you use "id=13" or "id=kMlYf".

There's a strong argument (made earlier in this thread) that actual database IDs should not appear in URIs. Several, actually: in addition to security, there are maintainability issues (say we switch DB engines, dumping and reloading the tables along the way, and suddenly the table IDs are in alphabetical order instead of when-created order).

What about adding a column that increments in each entity that wouldn't have anything to do with the PKs.

That's getting better. It solves the "decouple from persistence implementation" problem. It doesn't do much for the security (predictability) problem.


-- 
Jack Repenning
Repenni...@gmail.com

signature.asc

Dave Schinkel

unread,
Aug 31, 2015, 9:33:43 PM8/31/15
to API Craft
Thanks Jack.

I guess now back to my other question of how a client is to get an identifier that it doesn't know so that it can act upon that resource.

Dave Schinkel

unread,
Aug 31, 2015, 9:35:02 PM8/31/15
to API Craft
Also yes Etsy is using integer identifiers...

Dave Schinkel

unread,
Aug 31, 2015, 11:56:01 PM8/31/15
to API Craft
Hmm, GUID might be the best route afterall, even though I feel it's overly verbose and adds more to your footprint over the wire.

Basically after researching a bit more, I see these options for unique identifiers:

  • PK integers
  • Surrogate Pks – hash of several fields, still get an id back
  • Natural keys
  • GUIDs

Jørn Wildt

unread,
Sep 1, 2015, 2:24:05 AM9/1/15
to api-...@googlegroups.com
Lets say I have John Doe and for whatever reason, the web team's code doesn't have an id for this guy.  But they wanna PUT to update something for that John Doe.  How would they obtain the id so they could make the PUT tot he REST API for /persons/12 (12 is john doe for example)

Ask yourself, how would you do it on a website? Have you ever entered an ID or URL directly on Amazon, your local library or at the pizza delivery website? Probably not. You always start with a search/query: 

- You want the status for order number 1234? Well, submit that order number to the "search by order number" resource and get the URL/ID back, then GET that resource. 

- Are you looking for John Doe in the customer database? Well, submit his name to the "search by name" resource and get a list of URLs/IDs back - then let the user select the John Doe that matches his expectations (by address/phonenumber/social security number or what ever else) and GET that customer resource, modify it and PUT it back.

No one is expecting a client to know URLs/IDs ahead of time - it is something you query for with whatever "natural" information you have at hand; order numbers, social security numbers, book title, pizza menu list and so on.

/Jørn


Tom Christie

unread,
Sep 2, 2015, 5:01:28 AM9/2/15
to API Craft
>> UUID Primary Keys for your database tables. Always.
> Many DBAs will take exception with that statement.
> Besides, your persistence tech need not dictate identifiers.

Allow me to qualify that with the less bullish version :)

Using UUID primary keys is a smarter default - it implies less information if exposed directly, and it's easier to dump/load and shift data around if you don't need to be concerned about a single global counter. (Eg. hey, I now need to shard this single database instance into two.)

For most cases just using UUID primary keys will hit the sweet spot.
It's no more effort in terms of implementation, and for most use cases it'll be acceptable to naively expose in URLs.

For something a little more finessed, Kin Lane's "provide an opaque 1:1 mapping" between the raw identifier and the external one makes sense. That'll ensure you're fully decoupling the PK implementation detail from the externally visible URL.

Kijana Woodard

unread,
Sep 2, 2015, 10:02:40 AM9/2/15
to api-...@googlegroups.com
This next comment is pretty far OT.

Haven used UUID primary keys for several years, I've come to abhor them.
While "it works", it makes development, unit testing, operations a royal pain. Once "everything is a guid", you can't effectively visually scan the data.

My preference is for semantic string identifiers in the persistence engine [tend towards document db / event stores]. I may or may not expose those identifiers via the API. Lately I've found significant value in decoupling the db/backend identifiers from the api/client identifiers even when I'm writing both pieces.

In general for this thread, I agree with Jørn. You have to do some kind of search to find a url or identifier for the resource. You may as well present the url, then the client programmers don't care about the identifiers.


mca

unread,
Sep 2, 2015, 10:08:55 AM9/2/15
to api-...@googlegroups.com
<snip>
 Lately I've found significant value in decoupling the db/backend identifiers from the api/client identifiers even when I'm writing both pieces.
</snip>
+100

Al Holden

unread,
Sep 2, 2015, 10:23:55 PM9/2/15
to API Craft
Take your own checking account number, add 1 to it, then go in to the bank and ask for the balance on deposit to that account.

Whether or not the account numbers were random or sequential didn't matter much. The bank will determine that you do not have permission to access that account, and deny your request. That's the bank's job.

It makes little difference if the data resource is referenced by PK, UUID, unique string or other number. Your code should determine if the caller has permission to access THAT record for the requested purpose. That's the API's job.

Am I thinking about this too simply? That's a sincere question, because I miss the point frequently ;-]

Al Holden



On Tuesday, August 25, 2015 at 6:12:26 PM UTC-7, Dave Schinkel wrote:
Has anyone had a crazy debate where your DBA or other UX programmers think that to call your REST API, that they should never have to reference a resource by id?

Seems ludicrous to me, even having such a conversation.

Tom Christie

unread,
Sep 3, 2015, 4:44:14 AM9/3/15
to API Craft
> Am I thinking about this too simply?

Sequential identifiers in URLs leak information.
Eg how many users/orders/customers etc. does this service have?

Sometimes that may be an issue and you'll want opaque URLs, sometimes it won't and you'll prefer more readable URLs. (perhaps sequential, perhaps by unique slug, or whatever)

Either way around it'll typically make sense to expose relationships as hyperlinks rather than raw IDs, but there's still a valid design question as to exactly how they should be presented.

Alan Holden

unread,
Sep 3, 2015, 11:28:25 AM9/3/15
to api-...@googlegroups.com
Ah OK. I think I get it: This is more a discussion about "strategy" (the implicit revelation of business secrets via clear primary keys) and less about "security" (that one could obtain a sensitive data record merely by knowing the record's PK).

Returning to my banking analogy: My checking account number (when expressed as a PK from the accounts database) would reveal the minimum number of accounts the bank has ever had. And bank management may not like that.

Al Holden



--
You received this message because you are subscribed to a topic in the Google Groups "API Craft" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/api-craft/dxiQPA8cuqk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to api-craft+...@googlegroups.com.

Carl Zetie

unread,
Sep 3, 2015, 11:43:15 AM9/3/15
to API Craft

To give a real-world example: when you open a bank account, many banks will ask you what number you want your check numbering to start from. A very low check number reveals that the account is new, which can make retailers reluctant to accept those checks.

--CZ

Alan Holden

unread,
Sep 3, 2015, 4:39:51 PM9/3/15
to api-...@googlegroups.com
Understood. 
Somehow I got the impression that the folks bugging the OP were concerned that others could call a method on the API and retrieve protected data, JUST by knowing or guessing the identifier. 
I see now that this decision is more about business strategy than actual data access rights.
Al

Kijana Woodard

unread,
Sep 3, 2015, 4:45:55 PM9/3/15
to api-...@googlegroups.com
Yeah. Obscurity is not sufficient for security, but sometimes it's necessary.

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages