My First API

165 views
Skip to first unread message

Mark Knapik

unread,
Jul 4, 2016, 1:34:14 PM7/4/16
to API Craft
Hi, folks!

I'm new to the group, and currently designing my first API. I'd like to talk about it. I'm hoping to go through my decisions, maybe help out some folks who are hitting some of the same potholes I've found, and hopefully, find some help myself for some of the decisions I've yet to make. Would that sort of thing be welcome here?

-Mark

Jørn Wildt

unread,
Jul 5, 2016, 6:24:08 AM7/5/16
to api-...@googlegroups.com
Would that sort of thing be welcome here?

I don't know who should be the one to answer that question, so you probably won't get any answer :-) Just go ahead and share what you want.

/Jørn

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at https://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

Mark Knapik

unread,
Jul 5, 2016, 9:28:58 AM7/5/16
to API Craft
So, my first api is a a relatively simple one. There are only 4 major tasks it handles, but we expect to add more as time goes on.

The 4 major tasks we have are:
  1. Authenticate a user.
  2. Get a list of the user's purchased products
  3. Get info about a specific product
  4. Prepare a product for download.
I think I've been able to tackle the first 3 pretty easily, but the 4th one is still giving me headaches. I'll get into that a little later though.

To start off, I just googled "REST API design", and started reading. I was a little disappointed that there weren't really any strong standards for REST, but I did find some good guidelines.

I'll start going down the list about what design choices I've made and why.

The Back End
This decision was partially made for me. We're using PHP/MySQL with Silex to generate routes and handle the back end of things. There is some existing code from an earlier attempt at an API that I'm building off of, just so I don't have to reinvent the wheel. The old stuff has just been hacked together, and we need it to continue to run until we are able to replace it. Replacing the old api will be the first place where our new API expands after we finish the initial roll-out.
I could have abandoned Silex entirely but given that the other developers on the team are already familiar with it and that I am comfortable with it, we decided to hang onto it. PHP/MySQL was going to be a requirement, no matter what.

Versioning
My boss was strongly opposed to versioning. His thoughts were that any changes we make should just be backwards compatible. And I completely agree with him! Unfortunately, I've got just enough experience to know that it isn't likely to work out that way. Someone, somewhere is going to want a breaking change. I fought hard to include versioning and eventually won. I probably won more because I was stubborn, rather than my ability to make a good argument for it, but I did manage to point out that adding versioning to the URL was going to take us 30 minutes of actual work, and if we never use a new version, then we have lost very little. If we do need it, it's going to pay off in spades. It also makes it very easy to distinguish between our old api and the new one.
My hope and dream is that I wasted my time. We will only ever change to v2 if a breaking change is introduced. Non-breaking changes will be hidden behind minor version numbers, which won't be displayed in the URL.
For our purposes, a change is not breaking if it doesn't impact what a current developer is doing. Examples of non-breaking changes would be to add a new command, add a new option to an existing command as long as the default matches existing behaviors, or adding new optional fields to the result set returned by a command. Examples of a breaking changes would be removing an existing command, changing required options in an existing command, changing the data type of a field, or removing a field from an existing result set.
I settled on putting the major version number in the URI. Something like /api/v1/customers/(customer_id) might be a valid route. I picked this because of it's ease of use and understanding. Any developer building a client for the api will quickly understand what the v1 is for. 

Authentication/Security
The decision about how to properly handle authentication was a challenge for us at first. OAuth2 ended up winning, since it seems to be the defacto default for restful APIs. However, OAuth2 seems to be primarily geared for one webserver talking to another webserver. We had a problem with this because the first client we expect to consume our API is a desktop application. I'm not aware of a way for the app to receive a redirect after an Application Key has been generated.
We worked around this by allowing a user on our website to generate the key under their account page, and just having them copy/paste the key into the application. With the App Key, the application can request an access token. They include the application key or access token in the Authentication header, which is secured by SSL/TLS. At any time, a user can disable an Application Key and all access tokens associated with it from their account page on the main website. 
Tying an access token to a specific user id is how we achieve Major Task #1 from above.

No access tokens or application keys should ever be included in the URI itself, which is not secured by SSL, and which is commonly stored in logs.

In the future we want a set of permissions tied to a specific application key but currently, any valid key will allow access to any api command.

URI Design
The base path for our api is /api/(version). It gets prepended to any resource URIs.

Naming resources is hard. I was really happy to find this particular article http://apigee.com/about/blog/technology/restful-api-design-nouns-are-good-verbs-are-bad
I've tried to take it to heart over the entire design process. All of our URIs at this point follow the design of /(plural noun)/(identifier)/(related plural noun) as the most complex format. As far as I can tell, there is never any reason to go deeper than that. Resources that return a single object look like /(plural noun)/(identifier). A resource that returns a list of objects looks like /(plural noun).
As an example, we have /customers to return a list of customers, /customers/(customer_id) to return details about a specific customer, and /customers/(customer_id)/products to return a list of products purchased by a specific customer.
 /customers/(customer_id)/products and /products/(products_id) is how we achieve Major Tasks #2 and #3 from above.

Results
We only return JSON formatted results at this time. XML was a consideration, but we figure if we need it, we can always add it in later.
Our results are wrapped in an envelope. I didn't really want to use an envelope but it came with our existing API and it didn't seem to be a significant drawback so I didn't fight it too hard.
All of our resources that return a list of results (i.e. /customers) include a required field that is an array of objects. The array can be empty, if no objects match any filtering parameters used, but the array must be there.
Every object in a list, or individual objects being returned must include an identifying field (i.e. customers_id for a customer object). Almost any other field is optional. 
A call to any resource can include the fields parameter in the query to indicate which optional fields can be included for an object. Most of our commands will have a default list of optional fields included, in case this parameter is not included. 
All list-type results also support the optional parameters per_page and page_number, to allow for pagination. By default, per_page is 20 and page_number is 1.

We are using snake_case instead of CamelCase, even though it goes against JSON. The primary reason for this choice is because of our existing code base, where we use snake_case most often. Also, there is a study that says it is 20% easier to read.

HATEOAS
Don't hate me for this (hehe), but we aren't doing it. As a concept, HATEOS doesn't make sense to me. It seems like a nice thing that could be useful some day, but it is just a lot of effort for absolutely no gain right now. Maybe in v2 or 3, after I've become more comfortable with it.

Caching
We want to use caching primarily with regards to preparing a product for download. This is our most intense process, so we want to avoid it, unless necessary. When the user makes this request, we ask for an etag, which is just an MD5 hash of the downloaded file. If they don't have one, then they obviously need the file. If they have one, and it matches the current version of the file, we can respond by telling them they are up to date and don't need a fresh download. If their tag is out of date, then we initiate the process to prepare the file for download. This is involved in Major Task #4 from above.

We aren't currently planning on adding caching to other commands, though there is probably a good case to use it with the /customers/(id)/products command or other commands that return a large list. One thing that doesn't quite click yet for me is that i already need to query our datastore for the product list, and then generate the hash. That seems like a much more CPU intensive process, so if we're already generating the response in order to get the hash, why not just send the response? It might be a little more efficient, but I don't know if it is worthwhile at this point.

Outstanding Issues
I'm not 100% certain that I'm using OAuth2 properly with a stand-alone application. Maybe there is a different standard that should be applied? Maybe there is some way for an application to handle a redirect URL that I'm not aware of? 

gzip isn't currently enabled, but we're planning on adding it. Just haven't had the bandwidth to apply it yet. I'm hoping it is a relatively simple process.

I'm not happy with the naming of the URIs for Task #4, but I'm not sure exactly what a better option would be.
First, I've got POST /files/(id)/downloadables
This returns the ID for the downloadables object that is created.
Second, I've got GET /downloadables/(id)
This returns the progress of the file preparation process, and the URL where the completed file can be downloaded from, once the progress reaches 100%.
The process copies the source file to a location where the user can download it. Often, the copy has a watermark applied to it during this process.
Probably the biggest problem I have with this is the use of the world 'downloadables' because it makes me think of some sort of interface instead of an object type.
This is where I'd most appreciate advice and suggestions for a better way to handle this.

Johan Groenen

unread,
Jul 5, 2016, 11:13:14 AM7/5/16
to API Craft
GET /files/(file_id)

>

{
  "file" : {
    "id": (file_id),
    "downloadables": [
      {
        "id": (downloadable_id),
        "url": (downloadable_url)
      }
    ]
  }
}

GET /files/(file_id)/downloadables

>

{
  "downloadables": [
    {
      "id": (downloadable_id),
      "url": (downloadable_url)
    }
  ]
}

POST /files/(file_id)/downloadables < { } # maybe strange to post an empty object

Alternatively

POST /downloadables < { "file_id": (file_id) }

>

{
  "downloadable": {
    "id": (downloadable_id),
    "file_id": (file_id),
    "url": (downloadable_url)
  }
}

GET /downloadables?file_id=(file_id)

{
  "downloadables": [
    {
      "id": (downloadable_id),
      "file_id": (file_id),
      "url": (downloadable_url)
    }
  ]
}

Op dinsdag 5 juli 2016 15:28:58 UTC+2 schreef Mark Knapik:

Jørn Wildt

unread,
Jul 5, 2016, 11:34:28 AM7/5/16
to api-...@googlegroups.com

I'm not aware of a way for the app to receive a redirect after an Application Key has been generated.

I do not know your technology stack, but on Windows you can hook into various events on an embedded browser (IE) where one of them signals "page loaded" where after you can grab the URL.

No access tokens or application keys should ever be included in the URI itself, which is not secured by SSL, and which is commonly stored in logs.

Your conclusion is right, but the URI *is* secured by SSL (but the IP-number/host is not) as the URL is part of the request data (see for instance http://stackoverflow.com/questions/499591/are-https-urls-encrypted).

Hypermedia / HATEOAS is still an open ended discussion. You can read my few cents in favor of hypermedia here: http://soabits.blogspot.dk/2013/12/selling-benefits-of-hypermedia.html.

Besides that it seems like you have got your solution pretty well covered :-)

/Jørn



--

Mark Knapik

unread,
Jul 5, 2016, 4:20:17 PM7/5/16
to API Craft
Thanks for the version links. It was good to see that my selected option was one of the two main contenders. Given that the other one is HATEOAS dependent, it seems I made the right choice.

I know an embedded browser can act as the client, but I didn't think it could act as a server, which is where the redirect comes into play, if I am understanding the flow of OAuth2 correctly. I suppose a client could set up some sort of http listening service, but then we have to deal with firewalls, and a lot more burden than I'm willing to inflict at this point.

That stackoverflow article on SSL is a big part of what helped to keep any sort of authentication out of the URL in my API's design. Good to see others are recommending it.
I was trying to convey that the URI is not protected completely just by using SSL. Yes, it is encrypted during transit, but there are other avenues of attack to get to it.

I read your article, but I think the HATEOAS is staying where it is for now, in the 'nice to have' column, but not in the 'need' column. I think we will end up adding it at some point, but not for our initial launch. We're on a deadline and adding it to the mix won't help right now.

Mark Knapik

unread,
Jul 5, 2016, 4:30:19 PM7/5/16
to API Craft
Thanks for the feedback. This is pretty close to where I was heading originally.
Last night, I found this nice page talking about asynchronous operations over a REST API: http://restcookbook.com/Resources/asynchroneous-operations/

I think we'll be switching to something like this:

// CURL example to create the file preparation task
curl -XPOST -H "Content-type: application/json" -d '{
  "files_id": 456789
}' '/file_tasks/'

// Example response
HTTP/1.1 202 Accepted 
Location: /file_tasks/12345

//CURL example to get status of the file preparation task
curl -XGET '/file_tasks/12345'

//Example result of an uncompleted task
HTTP Status Code: 200 OK
Content-Type: application/json
{
  "progress": "60%",
}

//Example result of a completed task
HTTP Status Code: 303 See Other
Location: http://www.example.com/download_file.php?token=asdf1234


If you've got any suggestions for a better name than 'file_tasks', please let me know.

Jørn Wildt

unread,
Jul 5, 2016, 4:34:51 PM7/5/16
to api-...@googlegroups.com
I think you are missing some part of OAuth2: in order to login the client (browser) is redirected to the auth server. After login the browser is then redirected back again - this time with the authority token in the URL. The token is then read by the original server and exchanged to an access token by server-to-server communication. No need for the client to listen for HTTP connections or act as a server.

/Jørn


--
/Jørn

Mateusz Loskot

unread,
Jul 5, 2016, 5:49:42 PM7/5/16
to api-...@googlegroups.com
On 5 July 2016 at 22:20, Mark Knapik <mark....@gmail.com> wrote:
> Thanks for the version links. It was good to see that my selected option was
> one of the two main contenders.

BTW, API version does not have to 'pollute' base URL.
It can be specified in media type, e.g. application/json;application&v=1
Watch this fragment of Les Hazlewood's Beautiful REST & JSON APIs
for more details: https://youtu.be/mZ8_QgJ5mbs?t=26m13s

I'd also recommend to read Mike Stowe's about versioning
in https://www.mulesoft.com/lp/ebook/api/restbook,
or watch his talks on You Tube.

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net
Reply all
Reply to author
Forward
0 new messages