A few beginner questions: extensibility and harvesting documents

Alma

unread,

Jan 14, 2010, 2:10:39 PM1/14/10

to OAI-ORE

Hi, I have few questions about OAI-ORE. I’m trying to read the spec
and finding it fairly confusing, being new to some of the background
information here as well, so I appreciate any help that is offered to
me in understanding this better. I work in bioimaging informatics
(digital images from microscopes, CAT scans, that sort of thing), and
we are interested in ORE for two problems we think it might address
for us. I’m hoping folks here could answer some of my questions about
it.
First, we certainly have an aggregate object problem: images often
come in series, with multiple versions, layers of analysis, multiple
attached annotations/metadata, and related publications. My laboratory
has been considering creating and implementing an ontology to keep
track of these types of relationships semantically, instead of just
having everything sitting in a shared directory or dumb links. It
appears this is at least part of what OAI-ORE’s resource maps are
intended do—correct? However, the relationships currently possible to
describe seem quite limited, and definitely designed with library
applications in mind. Is the standard designed to be extensible? Would
we be able to create our own relations to describe relationships
between aggregated resources? It seems like this is the case when
using Atom, but I wasn’t sure about otherwise.
The second problem is object transport/harvesting. Am I correct in
understanding that ORE could be used like OAI-PMH for harvesting, but
to transport multiple kinds of files (say, bioimaging ones), instead
of just XML metadata? We actually stumbled across ORE because we were
looking at PMH for a metadata harvesting project, but if we could
transport the image data with the metadata, that would serve our
purposes much better.
Again, I apologize if these answers are contained in the specs, but
this is an unfamiliar area for me, and I’m trying to get some fast
answers on the critical questions before a deadline. Thank you for
your help!

Robert Sanderson

unread,

Jan 14, 2010, 3:07:21 PM1/14/10

to oai...@googlegroups.com

Hi Alma,

On Thu, Jan 14, 2010 at 12:10 PM, Alma <brianan...@sbcglobal.net> wrote:

> First, we certainly have an aggregate object problem: images often
> come in series, with multiple versions, layers of analysis, multiple
> attached annotations/metadata, and related publications.

Sounds like an ideal scenario to use ORE in! :)

> My laboratory
> has been considering creating and implementing an ontology to keep
> track of these types of relationships semantically, instead of just
> having everything sitting in a shared directory or dumb links. It
> appears this is at least part of what OAI-ORE’s resource maps are
> intended do—correct?

Yep.

> However, the relationships currently possible to
> describe seem quite limited, and definitely designed with library
> applications in mind. Is the standard designed to be extensible? Would
> we be able to create our own relations to describe relationships
> between aggregated resources?

Yes absolutely. You can use any relationships or ontologies that are
appropriate for your use case. When designing the ontology you might
want to keep in mind the basic ones -- for example dcterms:hasFormat
to link between different formats for the images, or
dcterms:hasVersion to link between different versions, but you can
always extend them if they're not suitable.

> The second problem is object transport/harvesting. Am I correct in
> understanding that ORE could be used like OAI-PMH for harvesting, but
> to transport multiple kinds of files (say, bioimaging ones), instead
> of just XML metadata?

I'm not sure that it's entirely appropriate for medical images, but
ORE is just a description mechanism for collections of web resources.
So you would just give the URI for your image, and then a client would
retrieve it in the same way as any other web resource. For your use
case you would obviously need secure transmission (https, with good
authentication and authorization) but it seems quite possible.

I hope that helps!

Rob

Alma

unread,

Jan 15, 2010, 12:56:32 PM1/15/10

to OAI-ORE

Thanks, Rob, that does help. I have a couple more questions.

Let me be more specific about my harvesting scenario. We're working in
basic science. These are non-clinical applications for medical and
scientific imaging technologies. We have multiple laps whose annotated
imaging work we would like to harvest automatically into a central
repository (which people could also get things back out of), metadata
and data together. We don't want to retrieve things on the fly, we
want them harvested a regular intervals. I imagine this as a problem
somewhat akin to getting faculty publications into the Institutional
Repository. Wasn't someone in Texas doing this somehow with OAI-ORE/
PMH into DSpace on this board? I tried to look at that PPP, but there
was not enough detail on how the actual data harvesting could be
conducted. Can anyone point me towards something along those lines?
Could you use something like Atom to do this? Again, forgive my total
ignorance here, I'm way outside my area of expertise.

>So you would just give the URI for your image..

I understand in ORE that you use these as the identifiers for
everything, but a URI doesn't have to resolve to a retrievable
address, does it? Also, I have a vague feeling there might be
potential problems with defining a URI for everything. Can you tell me
what some objections to that might be?

One more question: How does this data model cope with an aggregate
resource that grows over time? Is that considered a version, the same
thing with a new timestamp in the last-updated property, or an
entirely new resource every time something is added?

Robert Sanderson

unread,

Jan 18, 2010, 11:14:04 AM1/18/10

to oai...@googlegroups.com

On Fri, Jan 15, 2010 at 10:56 AM, Alma <brianan...@sbcglobal.net> wrote:
We don't want to retrieve things on the fly, we
> want them harvested a regular intervals. I imagine this as a problem
> somewhat akin to getting faculty publications into the Institutional
> Repository. Wasn't someone in Texas doing this somehow with OAI-ORE/
> PMH into DSpace on this board? I tried to look at that PPP, but there
> was not enough detail on how the actual data harvesting could be
> conducted. Can anyone point me towards something along those lines?
> Could you use something like Atom to do this? Again, forgive my total
> ignorance here, I'm way outside my area of expertise.

Yes, the project you refer to is this one:
http://www.mail-archive.com/dspace...@lists.sourceforge.net/msg00182.html

The issue to consider is that ORE is a description format, not a
wrapper format like (say) METS. ORE only discusses things by
reference, so to harvest the full images (in your case) you would
still need to dereference those pointers (via HTTP, I expect). All
you would get in the ORE description is the metadata.

If it's important that all of the data as well as metadata be included
in the response, then ORE is not the way to go. I don't know all of
your requirements and current architecture, but it seems reasonable
that each image would have its own URI... in the simplest case it must
have an identifier or filename already, which can be mapped into URI
space.

>>So you would just give the URI for your image..
>
> I understand in ORE that you use these as the identifiers for
> everything, but a URI doesn't have to resolve to a retrievable
> address, does it? Also, I have a vague feeling there might be
> potential problems with defining a URI for everything. Can you tell me
> what some objections to that might be?

It doesn't have to resolve, but it's very very strongly recommended.
The potential problems resolve around keeping the URIs alive. Free
(or very very cheap) to obtain, but more expensive to maintain. This
prompts them to be compared to kittens :)

> One more question: How does this data model cope with an aggregate
> resource that grows over time? Is that considered a version, the same
> thing with a new timestamp in the last-updated property, or an
> entirely new resource every time something is added?

ORE doesn't mandate any one approach to versioning. You can either
change the Aggregation, or you can have a new version of it, it's up
to you.

Rob

Reply all

Reply to author

Forward