So, my first api is a a relatively simple one. There are only 4 major tasks it handles, but we expect to add more as time goes on.
- Authenticate a user.
- Get a list of the user's purchased products
- Get info about a specific product
- Prepare a product for download.
I think I've been able to tackle the first 3 pretty easily, but the 4th one is still giving me headaches. I'll get into that a little later though.
To start off, I just googled "REST API design", and started reading. I was a little disappointed that there weren't really any strong standards for REST, but I did find some good guidelines.
I'll start going down the list about what design choices I've made and why.
The Back End
This decision was partially made for me. We're using PHP/MySQL with Silex to generate routes and handle the back end of things. There is some existing code from an earlier attempt at an API that I'm building off of, just so I don't have to reinvent the wheel. The old stuff has just been hacked together, and we need it to continue to run until we are able to replace it. Replacing the old api will be the first place where our new API expands after we finish the initial roll-out.
I could have abandoned Silex entirely but given that the other developers on the team are already familiar with it and that I am comfortable with it, we decided to hang onto it. PHP/MySQL was going to be a requirement, no matter what.
Versioning
My boss was strongly opposed to versioning. His thoughts were that any changes we make should just be backwards compatible. And I completely agree with him! Unfortunately, I've got just enough experience to know that it isn't likely to work out that way. Someone, somewhere is going to want a breaking change. I fought hard to include versioning and eventually won. I probably won more because I was stubborn, rather than my ability to make a good argument for it, but I did manage to point out that adding versioning to the URL was going to take us 30 minutes of actual work, and if we never use a new version, then we have lost very little. If we do need it, it's going to pay off in spades. It also makes it very easy to distinguish between our old api and the new one.
My hope and dream is that I wasted my time. We will only ever change to v2 if a breaking change is introduced. Non-breaking changes will be hidden behind minor version numbers, which won't be displayed in the URL.
For our purposes, a change is not breaking if it doesn't impact what a current developer is doing. Examples of non-breaking changes would be to add a new command, add a new option to an existing command as long as the default matches existing behaviors, or adding new optional fields to the result set returned by a command. Examples of a breaking changes would be removing an existing command, changing required options in an existing command, changing the data type of a field, or removing a field from an existing result set.
I settled on putting the major version number in the URI. Something like /api/v1/customers/(customer_id) might be a valid route. I picked this because of it's ease of use and understanding. Any developer building a client for the api will quickly understand what the v1 is for.
Authentication/Security
The decision about how to properly handle authentication was a challenge for us at first. OAuth2 ended up winning, since it seems to be the defacto default for restful APIs. However, OAuth2 seems to be primarily geared for one webserver talking to another webserver. We had a problem with this because the first client we expect to consume our API is a desktop application. I'm not aware of a way for the app to receive a redirect after an Application Key has been generated.
We worked around this by allowing a user on our website to generate the key under their account page, and just having them copy/paste the key into the application. With the App Key, the application can request an access token. They include the application key or access token in the Authentication header, which is secured by SSL/TLS. At any time, a user can disable an Application Key and all access tokens associated with it from their account page on the main website.
Tying an access token to a specific user id is how we achieve Major Task #1 from above.
No access tokens or application keys should ever be included in the URI itself, which is not secured by SSL, and which is commonly stored in logs.
In the future we want a set of permissions tied to a specific application key but currently, any valid key will allow access to any api command.
URI Design
The base path for our api is /api/(version). It gets prepended to any resource URIs.
I've tried to take it to heart over the entire design process. All of our URIs at this point follow the design of /(plural noun)/(identifier)/(related plural noun) as the most complex format. As far as I can tell, there is never any reason to go deeper than that. Resources that return a single object look like /(plural noun)/(identifier). A resource that returns a list of objects looks like /(plural noun).
As an example, we have /customers to return a list of customers, /customers/(customer_id) to return details about a specific customer, and /customers/(customer_id)/products to return a list of products purchased by a specific customer.
/customers/(customer_id)/products and /products/(products_id) is how we achieve Major Tasks #2 and #3 from above.
Results
We only return JSON formatted results at this time. XML was a consideration, but we figure if we need it, we can always add it in later.
Our results are wrapped in an envelope. I didn't really want to use an envelope but it came with our existing API and it didn't seem to be a significant drawback so I didn't fight it too hard.
All of our resources that return a list of results (i.e. /customers) include a required field that is an array of objects. The array can be empty, if no objects match any filtering parameters used, but the array must be there.
Every object in a list, or individual objects being returned must include an identifying field (i.e. customers_id for a customer object). Almost any other field is optional.
A call to any resource can include the fields parameter in the query to indicate which optional fields can be included for an object. Most of our commands will have a default list of optional fields included, in case this parameter is not included.
All list-type results also support the optional parameters per_page and page_number, to allow for pagination. By default, per_page is 20 and page_number is 1.
We are using snake_case instead of CamelCase, even though it goes against JSON. The primary reason for this choice is because of our existing code base, where we use snake_case most often. Also, there is a study that says it is 20% easier to read.
HATEOAS
Don't hate me for this (hehe), but we aren't doing it. As a concept, HATEOS doesn't make sense to me. It seems like a nice thing that could be useful some day, but it is just a lot of effort for absolutely no gain right now. Maybe in v2 or 3, after I've become more comfortable with it.
Caching
We want to use caching primarily with regards to preparing a product for download. This is our most intense process, so we want to avoid it, unless necessary. When the user makes this request, we ask for an etag, which is just an MD5 hash of the downloaded file. If they don't have one, then they obviously need the file. If they have one, and it matches the current version of the file, we can respond by telling them they are up to date and don't need a fresh download. If their tag is out of date, then we initiate the process to prepare the file for download. This is involved in Major Task #4 from above.
We aren't currently planning on adding caching to other commands, though there is probably a good case to use it with the /customers/(id)/products command or other commands that return a large list. One thing that doesn't quite click yet for me is that i already need to query our datastore for the product list, and then generate the hash. That seems like a much more CPU intensive process, so if we're already generating the response in order to get the hash, why not just send the response? It might be a little more efficient, but I don't know if it is worthwhile at this point.
Outstanding Issues
I'm not 100% certain that I'm using OAuth2 properly with a stand-alone application. Maybe there is a different standard that should be applied? Maybe there is some way for an application to handle a redirect URL that I'm not aware of?
gzip isn't currently enabled, but we're planning on adding it. Just haven't had the bandwidth to apply it yet. I'm hoping it is a relatively simple process.
I'm not happy with the naming of the URIs for Task #4, but I'm not sure exactly what a better option would be.
First, I've got POST /files/(id)/downloadables
This returns the ID for the downloadables object that is created.
Second, I've got GET /downloadables/(id)
This returns the progress of the file preparation process, and the URL where the completed file can be downloaded from, once the progress reaches 100%.
The process copies the source file to a location where the user can download it. Often, the copy has a watermark applied to it during this process.
Probably the biggest problem I have with this is the use of the world 'downloadables' because it makes me think of some sort of interface instead of an object type.
This is where I'd most appreciate advice and suggestions for a better way to handle this.