Posting a File/Object and the associated meta data to RESTful WebService

4,470 views
Skip to first unread message

Antony Pulicken

unread,
Jan 7, 2014, 8:36:05 PM1/7/14
to api-...@googlegroups.com
Hi,

We are developing REST based web services for storing/retrieving large files/objects. There are lot of additional metadata that we need to persist along with the file. We need to support both XML and JSON. The metadata will be dynamic and we are planning to use key/value based approach. The implementation of the service will be in JAVA.

My question is how can we post file and the metadata in one POST request?

Regards,
Antony.

Kijana Woodard

unread,
Jan 7, 2014, 10:01:10 PM1/7/14
to api-...@googlegroups.com

http://stackoverflow.com/a/4083908

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/groups/opt_out.

Peter Hamilton

unread,
Jan 7, 2014, 10:23:03 PM1/7/14
to api-...@googlegroups.com

Following up on the stack overflow comment:

2 requests (one for the file, the other for metadata) is what we've done with video files and meta data. We create the metadata resource first and then push a video through an optional transcoding pipeline. We've swapped out our storage backend / transcoding pipeline a few times over the years and it has been nice to contain those changes to just the one request.

What advantages does a single request offer? I guess there's some degree of atomicity but usually you can get those benefits from other designs as well.

Antony Pulicken

unread,
Jan 8, 2014, 12:07:28 AM1/8/14
to api-...@googlegroups.com
Thanks Peter & Kijana.

Atomicity is one reason why we want it to be in one request and also thought it will simple for the consumer as they have to make only one request. We can of course abstract that in our client SDK.

I saw Amazon and some other cloud services uses custom http headers for achieving this ? Any thoughts on that ?

I also saw a comment about using multipart/form-data to post both the JSON and the file (or multiple files) in a single request?

Regards,
Antony.

Peter Hamilton

unread,
Jan 8, 2014, 1:39:57 PM1/8/14
to api-...@googlegroups.com
I believe if you did multipart/form-data things would get a little ugly. Either you break convention for form-data and just include straight json (probably requiring some custom data parsing on the server side) or you urlencode the json payload and included under a form-data variable. The third option is to use form data directly.

It all comes down to whether you are solving problems or creating problems. The bandwidth cost is negligible in comparison to large files, so really you are optimizing for developer happiness and maintainability.

Jørn Wildt

unread,
Jan 10, 2014, 7:01:16 AM1/10/14
to api-...@googlegroups.com
I have been toying a bit with this issue and while I fully support the "separate resources" ideas I still think it could be interesting to find some kind of standard way of posting JSON + files in one request.

As I see it we can either use multipart/form-data with both files and JSON - or application/json with files encoded in base64 and stored as strings. But I really don't think files-as-json-strings is any good idea for files of any realistic size. So multipart it is ...

Here is my approach to a multipart/json encoding:

1) Fact: multipart/form-data can contain an arbitrary number of parts/files/data, each of which can have its own content-type.

2) Have one part with application/json for the JSON encoded meta-data. Name it something arbitrarily.

3) Add as many files as required to the main multipart/form-data payload. Name each of the files in any way the client want. Allow as many files as the client want - or use a fixed number of files defined by the API. Just make sure server and client agrees.

4) Whenever the JSON data needs to refer to a file, it simply uses the name of the file-part in the multipart/form-data payload - maybe prefixed with a hashtag or similar.

Example - let us assume we want to post a job application with attached CV (PDF), a short introduction (PDF), a photo of the applicant (JPEG) and a endorsement from someone (PDF):

Primary multipart/form-data payload:

  meta [application/json]: ... JSON meta data ...
  intro [application/pdf]: ... binary PDF data ...
  cv [application/pdf]: ... binary PDF data ...
  other-1 [image/jpeg]: ... binary JPEG data ...
  other-2 [application/pdf]: ... binary PDF data.

Meta data JSON:

{
  applicant:
  {
    name: "Jason Harrison",
    born: "1965-10-24",
    ... and more ...
  },
  introduction:
  {
    attachment: "#intro"
  },
  cv:
  {
    attachment: "#cv"
  },
  additional-files:
  [
    {
      title: "Photo of me",
      attachment: "#other-1"
    },
    {
      title: "Endorsement from Tina Mülowitch",
      attachment: "#other-2"
    }
  ]
}

The server can then look for the first (only) part of type "application/json" and use that as the meta data - and then grab the attachment names in the JSON and use those to refer to the multipart/form-data files.

I think this is basically what is done with an e-mail containing attachments. You could probably just post a classic e-mail encoded bunch of data and use standard libraries to extract the information :-)

/Jørn

Antony Pulicken

unread,
Jan 14, 2014, 8:39:14 AM1/14/14
to api-...@googlegroups.com
Thanks a lot Jørn !

I wrote a small POC based on the approach you mentioned using Jersey and it seems to be working fine. I'm still trying to figure out what kind of encoding it does internally and what are the drawbacks with this approach. This doesn't seem to be very popular. Any thoughts why this is not a recommended in many places?

Reagrds,
Antony.




--

Jørn Wildt

unread,
Jan 15, 2014, 3:26:40 PM1/15/14
to api-...@googlegroups.com
 This doesn't seem to be very popular. Any thoughts why this is not a recommended in many places?

Maybe because most people handle it with multiple resources as otherwise suggested? Besides that I don't know.

BTW: you could as well use ZIP-files, TAR or any other format for combining multiple files into one instead of multipart/forms. The approach would be the same: 

Antony Pulicken

unread,
Jan 16, 2014, 8:48:15 AM1/16/14
to api-...@googlegroups.com
Thanks Jorn !

What about the GET request?
  • I need to get back the file and the metadata back in one response. Do we need to follow a similar approach to get a combined response?
  • I also have to send the search criteria (based on id , metadata attributes etc) as a parameter and I understand that we cannot use GET if you want to send a json document  as the parameter. Do you think we should go for POST in that case ? or should use only GET and send everything as part of URL or as part of header ?

Any recommendations/references ?

Regards,
Antony.

Jørn Wildt

unread,
Jan 16, 2014, 9:04:51 AM1/16/14
to api-...@googlegroups.com
> I need to get back the file and the metadata back in one response.
> Do we need to follow a similar approach to get a combined response?

Yes. Zip/tar the JSON file + attached files into one archive file and return that.


> I also have to send the search criteria (based on id , metadata attributes etc) as a parameter
> and I understand that we cannot use GET if you want to send a json document  as the parameter.

Stricly speaking, you *can* encode JSON in URLs. Just remember to URL-encode the JSON string. Then do:

  GET /query-files?q={ ... some URL-encoded JSON string }

But that gets quite ugly. Usually people use URL parameters instead of JSON properties:

  GET /query-files?id=1234&meta-x=aaa&meta-y=bbb

If you have really complex queries then POST JSON to a query resource, create a temporary representation of it on the server and do a redirect to it:

  POST /query-files
  Content-Type: application/json

  { ... JSON query ... }


  [Response]
  201 Created
  Location: ... URL of temporary resource

Now the client can GET the temporary resource as many times it wants - plus it can cache the result.

Take a look at one or more of these books: "Restful Web Services" - http://amzn.com/0596529260, "RESTful Web APIs" - http://amzn.com/B00F5BS966, "REST in Practice: Hypermedia and Systems Architecture" - http://amzn.com/B0046RERXY or "RESTful Web Services Cookbook" - http://amzn.com/B0043D2ESQ.

/Jørn

Antony Pulicken

unread,
Jan 17, 2014, 2:00:42 AM1/17/14
to api-...@googlegroups.com
> Yes. Zip/tar the JSON file + attached files into one archive file and return that.

Why zip file ? Can't we take a multipart approach for response as well ?

Jørn Wildt

unread,
Jan 17, 2014, 2:18:24 AM1/17/14
to api-...@googlegroups.com
> Why zip file ? Can't we take a multipart approach for response as well ?

Oh, you can, sure. Its just that clients seldom consume multi-part data (I haven't seen it done at least). So go ahead, use multipart, nothing wrong with that. But you may expect some client developer friction.

/Jørn

Antony Pulicken

unread,
Jan 19, 2014, 10:52:27 PM1/19/14
to api-...@googlegroups.com
Thanks Jorn !

Looking forward for some more comments on the best approach to retrieving File/Object using a RESTful API.
Reply all
Reply to author
Forward
0 new messages