OAI-ORE as transfer syntax

15 views
Skip to first unread message

Jerome

unread,
Aug 20, 2009, 4:46:02 PM8/20/09
to OAI-ORE
Howdy,

Next in my list of OAI-ORE practicalities questions:

As part of our work on the Preserving Virtual Worlds project, I'm
working on how to transfer a packaged up game between the Univ. of
Illinois and Stanford. For various reasons, Stanford would like to
receive all data/metadata for the package using the BagIt
specification. I've got a bunch of metadata in OAI-ORE that
identifies the various digital assets I want to go in the package
delivered to Stanford (the game itself, representation information for
the game, context information for the game, provenance information for
all of the above) as well as the relationships between the assets (not
only OAI-ORE relationships, but FRBR and OAIS relationships as well,
e.g., this asset is semantic representation information for that
asset). Being nicely formed OAI-ORE, all references to assets are
protocol-based URIs.

The problem comes when I want to put this all in BagIt (this being the
digital assets and the OAI-ORE files), tar and gzip the whole caboodle
and ship it to them. I don't want the OAI-ORE referencing the copies
of the assets at my site. In fact, for a couple of reasons (the most
salient being I have to dark archive some of this material), I can't
make it available on the public web server. I want the OAI-ORE
document to reference the copies of the assets in the BagIt package
using file:/// URIs. But that's not a protocol-based URI, is it? And
so, not well-formed OAI-ORE.

My solution space for this at the moment seems to be: 1. ignore the
OAI-ORE requirement for protocol-based URIs and use file:/// URIs to
reference digital assets in the BagIt directory hierarchy; 2. go to a
certain amount of time and trouble instituting a one-time-use
authentication mechanism that insures that only a designated archivist
at Stanford can get at the restricted assets, and use BagIt fetch.txt
to reference them; or 3. Base 64 encode the digital assets, and treat
them as literals in the OAI-ORE RDF expressions. Can't say I'm
thrilled about any of those options, but #1 probably has the most
appeal to someone who A. doesn't want to engage in additional
transformations of the underlying assets (ie Base 64) and B. is
congenitally lazy.

My questions: 1. Am I missing some obvious fourth option in the
solution space; and 2. Was there any official discussion/
recommendation of how to use OAI-ORE with something like a tarball of
files to ship content between repository sites?

Benjamin O'Steen

unread,
Aug 21, 2009, 5:05:53 AM8/21/09
to oai...@googlegroups.com

I would recommend storing the resources in a pairtree fashion:

http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html

which solves a number of hacks I added when I first considered this:

http://oxfordrepo.blogspot.com/2009/02/pushing-bagit-manifest-concept-little.html

Ben

Erik Hetzner

unread,
Aug 21, 2009, 8:15:25 PM8/21/09
to oai...@googlegroups.com, Jerome
At Thu, 20 Aug 2009 13:46:02 -0700 (PDT),

I am not an OAI-ORE expert (or even a particularly well-informed
amateur), but I do have some knowledge of Bagit and Pairpath
(mentioned later in the thread).

I think there are two problems here. Laying the groundwork, you are
transferring an object between site A and site B. The first problem is
that because you are using OAI-ORE, either site A or site B needs to
lay claim to the URIs that will describe the object.

The second problem is getting the data from site A to site B.

I don’t think that you can get around the first problem. One site
needs to take responsibility for managing the URIs. This doesn’t
necessarily involve making them dereferenceable (at least not
immediately).

The second is not necessarily related to the first. If you choose to
use http://sitea/object that does not mean that site B needs to use
HTTP to transfer that object from site B.

What site A and site B do need to do is agree on way of mapping
http://sitea/object to some bytestream (representation).

I think what Ben was suggesting is that you can use pairpath to do
provide a mapping between an HTTP URI and a path on a filesystem. For
example, http://sitea/object would map to:

ht/tp/+=/=s/it/ea/=o/bj/ec/t

You could then “dereference” the URI http://sitea/object by generating
this pairpath from it and looking inside your bag to see if it
contained that path. If it does, you have the “dereferenced”
representation of that URI.

In summary, just because it starts with http:// doesn’t mean you have
to use HTTP to get it.

best,
Erik Hetzner

Benjamin O'Steen

unread,
Aug 25, 2009, 5:00:54 AM8/25/09
to oai...@googlegroups.com

Just to point out one of the devices of pairtree that may be of use for
handling objects with URI names:

(copy&pasted from
http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html)


current_directory/
| pairtree_version0_1 [which version of pairtree]
| ( This directory conforms to Pairtree Version 0.1. Updated spec: )
| ( http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html )
|
| pairtree_prefix
| ( http://n2t.info/ark:/13030/xt2 )
|
\--- pairtree_root/
|--- aa/
| |--- cd/
| | |--- foo/
| | | | README.txt
| | | | thumbnail.gif
| | ...
| |--- ab/ ...
| |--- af/ ...
| |--- ag/ ...
| ...
|--- ab/ ...
...
\--- zz/ ...
| ...

The "pairtree_prefix" contains a string that should be prepended to
every identifier inferred from the pairtree rooted at "pairtree_root".
This may be used to reduce path lengths when every identifier in a given
pairtree shares the same initial substring. In the example above, the
pairpath "/aa/cd/" would thus correspond to the identifier
"http://n2t.info/ark:/13030/xt2aacd".

-----

Personally, I am quite fond of this mechanism, both for interchange and
for on-disc storage - migration of self-contained objects (book page
scan collections for example) is made easier, as you might only need to
change the prefix file.

Ben

Reply all
Reply to author
Forward
0 new messages