I am not an OAI-ORE expert (or even a particularly well-informed
amateur), but I do have some knowledge of Bagit and Pairpath
(mentioned later in the thread).
I think there are two problems here. Laying the groundwork, you are
transferring an object between site A and site B. The first problem is
that because you are using OAI-ORE, either site A or site B needs to
lay claim to the URIs that will describe the object.
The second problem is getting the data from site A to site B.
I don’t think that you can get around the first problem. One site
needs to take responsibility for managing the URIs. This doesn’t
necessarily involve making them dereferenceable (at least not
immediately).
The second is not necessarily related to the first. If you choose to
use http://sitea/object that does not mean that site B needs to use
HTTP to transfer that object from site B.
What site A and site B do need to do is agree on way of mapping
http://sitea/object to some bytestream (representation).
I think what Ben was suggesting is that you can use pairpath to do
provide a mapping between an HTTP URI and a path on a filesystem. For
example, http://sitea/object would map to:
ht/tp/+=/=s/it/ea/=o/bj/ec/t
You could then “dereference” the URI http://sitea/object by generating
this pairpath from it and looking inside your bag to see if it
contained that path. If it does, you have the “dereferenced”
representation of that URI.
In summary, just because it starts with http:// doesn’t mean you have
to use HTTP to get it.
best,
Erik Hetzner
(copy&pasted from
http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html)
current_directory/
| pairtree_version0_1 [which version of pairtree]
| ( This directory conforms to Pairtree Version 0.1. Updated spec: )
| ( http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html )
|
| pairtree_prefix
| ( http://n2t.info/ark:/13030/xt2 )
|
\--- pairtree_root/
|--- aa/
| |--- cd/
| | |--- foo/
| | | | README.txt
| | | | thumbnail.gif
| | ...
| |--- ab/ ...
| |--- af/ ...
| |--- ag/ ...
| ...
|--- ab/ ...
...
\--- zz/ ...
| ...
The "pairtree_prefix" contains a string that should be prepended to
every identifier inferred from the pairtree rooted at "pairtree_root".
This may be used to reduce path lengths when every identifier in a given
pairtree shares the same initial substring. In the example above, the
pairpath "/aa/cd/" would thus correspond to the identifier
"http://n2t.info/ark:/13030/xt2aacd".
-----
Personally, I am quite fond of this mechanism, both for interchange and
for on-disc storage - migration of self-contained objects (book page
scan collections for example) is made easier, as you might only need to
change the prefix file.
Ben