REST-ORE

4 views
Skip to first unread message

Ben O'Steen

unread,
Aug 15, 2008, 6:42:22 AM8/15/08
to lightweight-repositories
See http://crigshow.blogspot.com/2008/08/rest-ore-low-level-storage-api.html
for a video where I was trying to spark interest and debate about the
idea of low-level web storage and what it would need to compete or
replace with local or 'remote' mounted filesystems.

To put this in context for this group, I envisage a store of this ilk
being under the collection of services that would make up a
lightweight repository.

------------------------

Essentially, the aims for this system are as follows (with the current
filesystem [FS] view of the implementation in brackets under each)

1) The system should be able to stream binary and other formats of
data, storing them or offering them for download as required. (Use of
the HTTP verbs PUT and GET are very strongly recommended)
(FS View: This is the bread and butter of mounted filesystems, put
and get)

2) The system should be able to perform indexed look-ups of the items
it holds and report all the items that can be viewed.
(FS View: "What files are in this folder?", "Status of this file?",
etc)

3) The system should be able to generically bind together items into
compound items and give these groupings some semantic meaning
(FS View: Generally less developed - Put file in folder, create
symlink of file to other folders. ReiserFS4 and the fictional WinFS
http://en.wikipedia.org/wiki/WinFS metadata views of a file system are
more along the lines I am anticipating.)

4) These grouping items should exist as items in their own right with
URIs.
(FS View: Folders exist as 'items' in the FS)

5) The system should be able to be polled and to push out event
notifications when:
i) An item changes (FS View: inotify http://en.wikipedia.org/wiki/Inotify)
ii) The metadata (representation) of an item changes (FS View:
inotify)
iii) An event is caught that fulfils certain criteria, based on
the representation of the item. (FS View: likely to be a 2 stage
system, changed file -> check for criteria)

6) All information should be stored as serialised files, preferably in
a re-usable format, such as XML, YAML or json. (FS View: All metadata
is stored on disc, and indexes can rapidly be rebuilt/initialised on
mounting a filesystem.)

The basis for the choices above have been based on a widespread
pattern for storage - a filesystem, but with a few tweaks.


The idea being that a lightweight store - which has no services that
satisfy the common demands for OAI-PMH, full-text search, pretty UI,
etc - a store of this kind can sit below a 'normal' repository. Fedora
could rebuild itself from one with little change to how it operates,
for example.

So, what's REST-ORE then, specifically? Well, it stemmed from ideas in
the US Crigshow tour, a crystallisation of a few thoughts into
something a bit more concrete. I like to see it as just an
implementation of the rules above, and there is plenty more to explore
in this space.

REST-ORE is three parts, a dropbox API to put/get/replace files, a
binding/metadata API and a message polling and push registry API.

I am focussing on a real issue with my systems - getting academics to
hand over their files in a way that *is useful to them* - so the
dropbox I am going to describe is very user-orientated. Each
authorised user has a dropbox, accessible at:

http://host/box/{user-id} or simply /box/{user-id}

HTTP verbs are also crucial. A GET on this URL will get an Atom feed
of the items in the box, implementing the pagination and archiving
extension to Atom [1] (This is not intended to be the primary means of
digging through the items, this is just the barest metal to allow for
rebuilding the other parts of the system if they get corrupted.)

Now, there are some non HTTP1.1 verbs that are useful here - consider
WebDAV (MOVE for example) In fact, a WebDAV API for the dropbox might
be a really nice thing, due to the support in current software. But
WebDAV isn't a small or perfect standard, so I will be implementing
it, from HTTP verbs only and add in support for the WebDAV verbs
later.

So, we can put and get files, just like any other web storage. Where
does the value-add kick in? Well, for one, I think that the store
should declare it's policy on storage - how long it will retain
resources, how many mirrors of your resources it creates, whether the
store is high-speed (spinning platter) or low-speed (tape), or whether
it uses content=specific stores internally (media server for video,
flat files for docs, etc). Maybe a GET on http://host/policy would
return a policy document listing these facts?

The core of the value-add though are three services that I think are
crucial: the search API, the metadata CRUD API (including binding
resources into groups), and a messaging API for monitoring resources
held by the dropbox.

The search API should be very flexible, and should try most tricks in
the book to wring metadata from the files that are uploaded. On a
technical note, I am strongly considering using CouchDB and json for
serialising the metadata.

The metadata C(R)UD API and binding API are very related. IMO when you
bind together a group of resources (externally or internally held),
the resulting list should exist in its own right, with its own URI.
It's just another item, in other words. I am quite liking the idea of
simply storing lists of items, in atom format for ordered lists and
ORE for unordered lists. One thing that I'd quite like (as a
programmer) is that if I POST more parameters to a list, the API
appends the information, rather than replaces.

The messenging API is one that is very frequently missing. The reason
I brought up Twitter as a possible store earlier is because it has the
starting point for a good messaging service - it is possible to poll
for past events (twitter.com/{user-id} or API equivalent) or to
register to receive the events *as they happen* by following someone
and selecting that you'd like to receive the events over IM or txt.

In summary then, these bottom-rung services, dropbox, search, metadata
and messaging, are the base services I would need to build some very
interesting and 'mashable' services, much moreso than what I can do
with S3 or Flickr.

Being the type of person I am, I am going to explore these facets
using PoC coding. First attempts use a normal FS under the dropbox
API, which allows GET, PUT, and DELETE. The metadata search and update
API is going to be CouchDB, and the messaging API will be Atom feeds
for polling (GET /events/{URL of item}) or XMPP for push events (POST /
events/{URL of item}?username=a...@gmail.com&format=json)
Reply all
Reply to author
Forward
0 new messages