Notes on implementing Restful Bag Server spec

13 views
Skip to first unread message

Littman, Justin

unread,
Jun 3, 2011, 6:40:45 AM6/3/11
to digital-...@googlegroups.com
All-

I've been experimenting with implementing the Restful Bag Server spec (https://github.com/acdha/restful-bag-server) using the Play Framework (http://www.playframework.org/). Here's a few notes:
1. The semantics of "changes" is ambiguous. I'm not convinced it can be defined in an implementation-independent fashion such that a client can make use of the feed without knowing what the server means by changes. (Which may be OK, but should be explicit.)
2. Should hrefs in links be absolute or relative?
3. Are implementers obligated to implement every feature (and if not, how should this be indicated to a client)? I'd argue no -- for example, I may not wish to (or be allowed by our security policy) to allow content submissions.
4. There is an error in the contents link -- should be /bags/ <BAG_ID> /contents/<PATH> instead of /bags/ <BAG_ID> /contents/<BAG_ID>. I'm assuming contents includes tags and payload, but should be explicit.
5. I would suggest using exactly the same terminology used in the BagIt Spec, e.g., "filename" instead of "path".
6. I don't understand what is meant by "metadata" enough to implement.
7. Assuming a server is reading the manifests from disk at runtime to produce "manifest," having to merge all manifests for a bag with a large number of files or manifests is potentially time-consuming.

--Justin

Chris Adams

unread,
Jun 3, 2011, 8:00:10 AM6/3/11
to digital-...@googlegroups.com
On Fri, Jun 3, 2011 at 6:40 AM, Littman, Justin <jl...@loc.gov> wrote:
> 1.  The semantics of "changes" is ambiguous.  I'm not convinced it can be defined in an implementation-independent fashion such that a client can make use of the feed without knowing what the server means by changes.  (Which may be OK, but should be explicit.)

What's the opinion on defining "changes" more precisely or simply
leaving the meaning as roughly "If this bag is of interest to you,
retrieve its metadata and compare with what you have to decide whether
you want to refetch".

> 2.  Should hrefs in links be absolute or relative?

My preference is for absolute but I think relative might be
unavoidable if for no reason other than substantially reducing
transfer sizes in e.g. manifest responses.

> 3.  Are implementers obligated to implement every feature (and if not, how should this be indicated to a client)?  I'd argue no -- for example, I may not wish to (or be allowed by our security policy) to allow content submissions.

Definitely not: my idea was that you could simply use the HTTP
statuses in many cases - e.g. a read-only server could return "405
Method Not Allowed" when the client attempts to POST to /bags.

> 4.  There is an error in the contents link -- should be /bags/ <BAG_ID> /contents/<PATH> instead of /bags/ <BAG_ID> /contents/<BAG_ID>.  I'm assuming contents includes tags and payload, but should be explicit.

I'll correct this, probably before merging in the versioned proposal.

> 5.  I would suggest using exactly the same terminology used in the BagIt Spec, e.g., "filename" instead of "path".

Good point.

> 6.  I don't understand what is meant by "metadata" enough to implement.

The idea was a simple, sidecar extension mechanism allowing clients to
store arbitrary data in a directory which is otherwise unprocessed by
the server. For example, if I had a bag containing journal articles
and additional metadata was being constructed by curatorial staff
after receipt, it would make sense to store an XML file with that
metadata so it can be durably stored along with the actual data but is
clearly not part of the actual delivered content and may not be useful
to anyone without the same application.

After reading over it again, the description obviously needs a bit of
improvement as it's not as clear as it could be.

> 7.  Assuming a server is reading the manifests from disk at runtime to produce "manifest," having to merge all manifests for a bag with a large number of files or manifests is potentially time-consuming.

My first impression was that servers could heavily cache this since
the server will know when bag contents are altered and could generate
a cached representation at the time of change. Do you think this is
worth reconsidering?

Chris

Brian Vargas

unread,
Jun 3, 2011, 10:08:06 AM6/3/11
to digital-...@googlegroups.com

>> 2. Should hrefs in links be absolute or relative?
>
> My preference is for absolute but I think relative might be
> unavoidable if for no reason other than substantially reducing
> transfer sizes in e.g. manifest responses.

Content-Encoding: gzip neatly solves that problem.

>> 3. Are implementers obligated to implement every feature (and if not, how should this be indicated to a client)? I'd argue no -- for example, I may not wish to (or be allowed by our security policy) to allow content submissions.
>
> Definitely not: my idea was that you could simply use the HTTP
> statuses in many cases - e.g. a read-only server could return "405
> Method Not Allowed" when the client attempts to POST to /bags.

You might consider requiring - or at least recommending - support for OPTIONS.

Brian

Chris Adams

unread,
Jun 3, 2011, 10:59:29 AM6/3/11
to digital-...@googlegroups.com
On Fri, Jun 3, 2011 at 10:08 AM, Brian Vargas <br...@ardvaark.net> wrote:
>>> 2.  Should hrefs in links be absolute or relative?
>>
>> My preference is for absolute but I think relative might be
>> unavoidable if for no reason other than substantially reducing
>> transfer sizes in e.g. manifest responses.
>
> Content-Encoding: gzip neatly solves that problem.

Good point, and it's certainly easy enough to generate absolute URLs
using the request Host header if you have some sort of cluster.

Does anyone object to simply requiring absolute URLs? It makes life a
lot easier for client authors and debugging.

>>> 3.  Are implementers obligated to implement every feature (and if not, how should this be indicated to a client)?  I'd argue no -- for example, I may not wish to (or be allowed by our security policy) to allow content submissions.
>>
>> Definitely not: my idea was that you could simply use the HTTP
>> statuses in many cases - e.g. a read-only server could return "405
>> Method Not Allowed" when the client attempts to POST to /bags.
>
> You might consider requiring - or at least recommending - support for OPTIONS.

I think that falls into the general HTTP citizenship section but is
definitely worth mentioning specifically for this purpose. A client
should reasonably expect to handle errors but it's a nice way to
detect which resources are modifiable.

Chris

Reply all
Reply to author
Forward
0 new messages