File "uploads" as a resource property - REST best practice

4,142 views
Skip to first unread message

Miko Väli

unread,
Dec 19, 2012, 7:51:37 AM12/19/12
to api-...@googlegroups.com
Hi all,

I'm in the final stages of my API development and one thing I can't really wrap my head around is handling file uploads.

My use-case is the following:
I have a resource which may have n properties that are "files" (picture for a product, PDF contract for a customer, etc).

GET response to the resource is something like this:

.../myapi/products/:ID
{"name": "Name ", "otherParameter": "value", "picture": {"webPath": "http://...", "thumbNailPath": "http://...", "title": "sometitle", ...}}

Basically, picture property is a value-object containing information about the file.

My question is, how to handle POST/PUT/PATCH'ing this resource.

Is it against some API design principles that the sent value-object is completely different from the returned one, i.e. {..., "picture": {"content": "complete file content", "name": "pic.jpeg", "mimetype": "image/jpeg"}}?

Is it more of a good practice to use a separate property for uploading a file, i.e. {..., "pictureUpload": {...}}. But this doesn't make much sense to me, because I want to set a picture property, not some kind of a virtual property called pictureUpload.

I know how dropbox handles this, but they have a separate endpoint for uploading a file (file itself is the resource) but that seems overkill to my use-case.

How do you guys handle these kind of things? Are there any examples for such situations in other API's?

Any input from you guys is appreciated.

Miko

Mike Kelly

unread,
Dec 19, 2012, 7:53:22 AM12/19/12
to api-...@googlegroups.com
Why is that overkill? Creating a new resource and the code to handle
an upload should not be a big deal. What development stack are you
working with, out of interest?

Cheers,
M

Miko Väli

unread,
Dec 19, 2012, 8:08:50 AM12/19/12
to api-...@googlegroups.com
It is not a big deal, but in my perspective, these files are a requirement to the resource in question (my validation process expects these files to be present before persisting the resource).

On the server side I'm using PHP (Symfony2 + FOSRestBundle).

Miko


M

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group, send email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft?hl=en.



Jørn Wildt

unread,
Dec 19, 2012, 8:13:33 AM12/19/12
to api-...@googlegroups.com
I have the exact same kind of implementation (a Case File with associated documents). Here I model each document as a resource with a title field and a link to the binary content:

{
  Title: "My doc",
  ContentLink: "http://url-to-binary-content-of-file"
}

The title can be be updated through the document resource (POST with json or form-urlencoded data). The binary content can be PUT separately into the separate document content resource.

Seems very natural to me.

/Jørn

Miko Väli

unread,
Dec 19, 2012, 8:41:00 AM12/19/12
to api-...@googlegroups.com
Thanks for the quick responses, guys!

Jørn, as I understand, you first POST a new resource (lets call it Product for example) with all the required parameters (name, description, picture's title) and after this is persisted, you make a separate PUT request to the (returned?) picture resource URI and actually set the contents of the picture?

This is all fine and looks natural, but in my case, I COULD end up with lot's of invalid Products (the picture is mandatory for a product).

Though I have to say, this is very cool way to update the picture, IF the Product with the picture is already there.

Peter Monks

unread,
Dec 20, 2012, 2:13:34 PM12/20/12
to api-...@googlegroups.com
G'day Miko,

We have much the same requirement, given that our APIs are heavily "blob" oriented (we're a document management vendor, so binaries are a large part of what we deal with).

The scheme we've come up with is similar to what Jørn described, but with some additional elements.  For example a user profile object with a binary "avatarImage" property might look like this:

{
   "id": "pmo...@alfresco.com",
    "firstName": "Peter",
    "lastName": "Monks",
    ... other profile elements here ...
    "avatarImage": {
        "set": true,
        "url": ".../path/to/resource",
        "sizeInBytes": 14573337778,
        "mimeType": "image/png"
   }
}

The "set" property indicates whether the binary currently has a value or not - this defines whether the URL for the binary property can be retrieved (HTTP GET).  You can, of course, GET the binary URL even if it's not set, but doing so will result in a 404.  This same URL is also used when a client wishes to set or update the value of the binary property (using HTTP PUT) or unset the value of the binary property (using HTTP DELETE).

We include the size and MIME type of the binary in the parent object, so the clients can see that information without having to download the binary first - these elements are calculated by the server and are R/O (as is the entire "avatarImage" sub-object).  We've discussed adding some other binary value metadata but I'm not convinced anything beyond these two make sense (filename, in particular, is something I think would be a mistake to include - this is a binary property, not an independent resource / entity).

We've had requests to support transcluded binaries, mostly to avoid the API equivalent of the "N+1 SELECTs" problem (we support multi-valued binary properties).  I'm not convinced this is a good idea (message sizes go way up, it doesn't play nice with caching, etc.), but we've spec'ed it out anyway, to make sure we can add it in later on if the need becomes compelling.  That will look something like this, if/when we implement it:

{
    "id": "pmo...@alfresco.com",
    "firstName": "Peter",
    "lastName": "Monks",
    ... other profile elements here ...
    "avatarImage": {
        "set": true,
        "sizeInBytes": 438,
        "mimeType": "image/jpeg"
        "data": 
"/9j/4AAQSkZJRgABAQEAlgCWAAD/4QCARXhpZgAATU0AKgAAAAgABQESAAMAAAABAAEAAAEaAAUAAAABAAAASgEbAAUAAAABAAAAUgEoAAMAAAABAAIAAIdpAAQAAAABAAAAWgAAAAAAAACWAAAAAQAAAJYAAAABAAKgAgAEAAAAAQAAAECgAwAEAAAAAQAAAEAAAAAA/9sAQwAgFhgcGBQgHBocJCIgJjBQNDAsLDBiRko6UHRmenhyZnBugJC4nICIropucKDaoq6+xM7Qznya4vLgyPC4ys7G/9sAQwEiJCQwKjBeNDRexoRwhMbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbG/8AAEQgAQABAAwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/aAAwDAQACEQMRAD8A6CiioL2YwWkki9VHFAFO/wBVEDmOFQ7jqT0FZo1K8LFvNPPbAwKqFjkseSaBIFII/WgDRi1a5VhvKsPQiti1uo7pMocEdVPUVzIIPOKsWdx5NwjjhQcH6UAdLRRRQAVS1cE6dLj2/nV2qmoMGgaActIMfQetAIybKxSWLe7HntVp9HgeP5WIamWjC1iQS559BmrkV5BJwjHPoRiouXYqJo4UfNJ+GKr6haiBV2d+tacl/BGdrEk+wqtc4vEG07QG6mi4WNSEkwxk9So/lT6jgbdCvGCOMVJVkCMdq5rCnnEepkSNw3ykntWtd3CW8e58/hXMXLtc3DyEdTQNaG8ixvlSFK9vSkW3gSTdGq7s84qhYPugVGJG04zVhkkDcg8nhlNZmi1LUkMBfLqu7PGRSv5aDkqqDqe1Vwrk42sxPJZzVbUpAsJiU5LnFAPQs2OoK0rgn5S3H0rVVgwBHINcjbtwRnkV0mnSb7VQzKzAc4NWZE8yeZEy4B471zNw+JD8u3Hart5rcoyIECf7TcmsWW5kmcs7bmPUmmBqadh4pl/2siriXMQHlXHykdDWHaXRt5Nw5B6j1rTeWGePJwwNS0WmXRPAoxE29j6HNZerMA0YH3hkn8aGu4rVdsKgsfyFZ8kzSEljkk5J9aEgbJohnDCuh0cL9nZl6k4NcxG5HArd0nU4UjEEoERHRux+vpVEH//Z"
}

The data is BASE64 encoded.  It would vastly preferable if JSON had a sensible set of data type literals, including binary literals (see my earlier post on this topic [1]), but I digress.

We also support multi-valued binary properties, which we represent as follows:

{
    "id": "pmo...@alfresco.com",
    "firstName": "Peter",
    "lastName": "Monks",
    ... other profile elements here ...
    "avatarImages": {
        "url": ".../path/to/collection",
        "list": {
            "pagination" : {
                "count" : 2,
                "hasMoreItems" : false,
                "totalItems" : 2,
                "skipCount" : 0,
                "maxItems" : 100
            },
            "entries" : [ 
                "entry" : {
                    "url": ".../path/to/collection/identifier1",
                    "sizeInBytes": 14573337778,
                    "mimeType": "image/png"
                },
                "entry" : {
                    "url": ".../path/to/collection/identifier2",
                    "sizeInBytes": 7639289,
                    "mimeType": "image/jpeg"
                
            ]
       
    }
}


In this case the individual binary value URLs can be manipulated in much the same way as a singleton binary property (GET, PUT, DELETE to read/update/remove individual binary values).  In addition, the collection URL provides the means to add a new binary value to the list (via HTTP POST) and, potentially (this varies on an entity-type by entity-type basis) the ability to truncate (remove) all of the binary values (via HTTP DELETE).

Regarding atomicity of write operations against entities that happen to have binary properties, I completely share your concerns.  Our application is transactional across both "structured" and "unstructured" content (i.e. metadata & binaries), and I'd certainly like to expose that capability to REST API client apps.

Although I have no immediate plans to implement such a solution, my current thinking is that this is actually a broader requirement that allows client apps to submit arbitrary "units of work" to the server, to be processed in as atomic a fashion as practical (obviously there would be caveats / governors around such a capability - there are lots of potential security / QoS / DoS issues around such a capability).

These units of work may be a write to a single entity including binary properties (your use case), or writes to multiple otherwise independent entities (each of which may themselves have binary properties).  In short, we need a way for a client to express a series of API calls (including those that manipulate binary properties) but submit those calls just once to the server.

I haven't thought this through very far yet, but if we're to stick to HTTP it would appear our only real option here would be something based on multipart/form-data POSTs.  How we tie together dependent API calls (e.g. "create a new object, get the binary property value URL, post a binary value to that URL") is a big unknown for me at this point - I can see this starting to head towards requiring some kind of mini scripting language, which itself is a huge can of worms.

Another alternative we've discussed is to have "transaction" as a first class entity type, whereby a client would create a "transaction", POST a bunch of stuff to it (the individual API calls) and then "complete" the transaction (at which point the server goes off and does whatever "real" work needs to be done).  Having implemented this kind of solution in the dim distant past I'm reluctant to go down this road though, since (a) it doesn't solve the N+1 latency problem (it still requires multiple HTTP calls) and (b) it requires a heuristic garbage collector (to handle the case where a client goes away before a transaction is completed).

To be blunt, HTTP seems pretty sucky when you want to support multiple resources in a single request and/or response (for any reason - JSON+binary or the more general "arbitrary unit of work" case).

mca

unread,
Dec 20, 2012, 2:20:29 PM12/20/12
to api-...@googlegroups.com
<snip>
To be blunt, HTTP seems pretty sucky when you want to support multiple resources in a single request and/or response (for any reason - JSON+binary or the more general "arbitrary unit of work" case).
</snip>

there are lots of multi-part message[1] variants to choose from as well as little-used "application/http"[2] which supports a very rich model for sending multiple *requests* in a single message.

Cheers.

Peter Monks

unread,
Dec 20, 2012, 2:34:53 PM12/20/12
to api-...@googlegroups.com
As indicated in the body of the message, if we're to stick to HTTP (i.e. not invent our own multipart scheme), multipart looks like our best option.  However I was not aware of application/http - thanks for the pointer.

Cheers,
Peter

 

mca

unread,
Dec 20, 2012, 2:39:00 PM12/20/12
to api-...@googlegroups.com
np.

BTW - I'll be in SF first week of Feb. Let's be sure to meet up again.

Jørn Wildt

unread,
Dec 19, 2012, 8:52:07 AM12/19/12
to api-...@googlegroups.com
The initial POST is encoded as multipart/form-data which is exactly the same as any web-form with file uploads would do it - and thus it can contain both binary data plus various meta data like title, author and so on.

In addition to this I have Post-Once-Exactly semantics, which means the initial POST can be repeated safely in reaction to network errors (you can google that).

The whole shebang goes like this:

1) Empty POST to get POE-URL
POST http://jw-pc261/f2-restservices-4.2-rest/dossiers/198597/document-create HTTP/1.1
User-Agent: Ramone/1.0
Content-Type: application/x-www-form-urlencoded
Authorization: ...
Host: jw-pc261
Content-Length: 0

HTTP/1.1 201 Created
Cache-Control: private
Location: http://jw-pc261/f2-restservices-4.2-rest/dossiers/198597/document-create/7de7840c-13f6-4c0c-9720-89aa003e9c99
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Wed, 19 Dec 2012 13:45:30 GMT
Content-Length: 0

2) POST actual payload (in this example without a separate field for "Title"
POST http://jw-pc261/f2-restservices-4.2-rest/dossiers/198597/document-create/7de7840c-13f6-4c0c-9720-89aa003e9c99 HTTP/1.1
User-Agent: Ramone/1.0
Accept: application/vnd.cbrain.casefile+xml
Content-Type: multipart/form-data; boundary=c00818a9-6f08-48a7-833e-2f63a0a69eea
Authorization: ...
Host: jw-pc261
Content-Length: 158
Expect: 100-continue

--c00818a9-6f08-48a7-833e-2f63a0a69eea
Content-Disposition: form-data; name="File"; filename="dummy.dat"
Content-Type: application/octet-stream
... BINARY DATA ...


HTTP/1.1 201 Created
Cache-Control: private
Content-Length: 806
Content-Type: application/vnd.cbrain.casefile+xml
Location: http://jw-pc261/f2-restservices-4.2-rest/documents/198599
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Wed, 19 Dec 2012 13:45:30 GMT

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://jw-pc261/f2-restservices-4.2-rest/xml2html.xsl"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://cbrain.com/casefile/schema/">
  <Id>198599</Id>
  <Title>dummy.dat</Title>
  <Link href="http://jw-pc261/f2-restservices-4.2-rest/documents/198599" rel="self" type="application/vnd.cbrain.casefile+xml" title="dummy.dat" />
  <Link href="http://jw-pc261/f2-restservices-4.2-rest/matters/198597" rel="up" type="application/vnd.cbrain.casefile+xml" title="Parent matter" />
  <Link href="http://jw-pc261/f2-restservices-4.2-rest/documents/versions/73047/content" rel="http://cbrain.com/casefile/rel/content" title="dummy.dat" />
</Document
Reply all
Reply to author
Forward
0 new messages