ORE and harvesting/discovery

14 views
Skip to first unread message

harry

unread,
Jun 23, 2008, 2:16:45 AM6/23/08
to OAI-ORE
Hi all,

I only have a limited knowledge on ORE and currently still trying to
understand it. I have questions regarding about harvesting/discovery
on ORE. In my undestanding, we can use RSS, Sitemap, or OAI-PMH.

Are there any examples on how to make batches of Resource Maps
available for harvesting using RSS, Sitemaps, and OAI-PMH? Can someone
provide more information?

Also, can someone let me know which method you prefer for harvesting:
RSS, OAI-PMH, or Sitemaps? and why?

Currently we have a harvester for ADT (Australia Digital Thesis) which
uses OAI-PMH to harvest resources metadata (DC) each night which it
stores and indexes just in case someone wants to search for a thesis.
What we currently want to achieve are to eliminate the database for
storing the harvested information and data filtering for correct
information (url, multiple unwanted dc element, etc).
If we can not eliminate the database, we hope that ORE can help us to
harvest the updated resources only.


Thanks,


Harry


Harry R. Sidhunata
University Library
The University of New South Wales
UNSW Sydney NSW 2052

MichaelNelson

unread,
Jun 23, 2008, 10:00:24 AM6/23/08
to OAI-ORE
Hello Harry,

Have you read the Discovery Document?

http://www.openarchives.org/ore/0.9/discovery

It covers examples of how to expose ReMs via OAI-PMH, Sitemaps, RSS,
Atom, etc.
I'm not sure there is a "preferred" method yet because this is all
pretty new. But
all are valid methods that we think people will want to use.

Please let us know how this turns out for you.

thanks,

Michael

Sebastian Kruk

unread,
Jun 23, 2008, 10:36:54 AM6/23/08
to oai...@googlegroups.com
Hello,

I think we might also consider extending the discovery document with
guidelines for using semantic web indexing services like e.g., http://sindice.com/
.

S.


-- sebasti...@gmail.com
-- GG: 335067,
-- Jabber: sebasti...@gmail.com
-- Skype: sebastiankruk
-- WWW: http://www.sebastiankruk.com/

Sean Gillies

unread,
Jun 23, 2008, 10:44:55 AM6/23/08
to oai...@googlegroups.com
Which method is best also depends on the discovering agent. Google
recently dropped its support for OAI-PMH, but its continuing support
for Atom feeds as sitemaps, plus its new webmaster API using Atom for
creation and update of sitemaps indicates that you'll be very
discoverable (via Google, at least) using Atom.

Cheers,
Sean

Mark Diggory

unread,
Jun 23, 2008, 12:31:18 PM6/23/08
to oai...@googlegroups.com
I agree 100% with that suggestion.

Sebastian Kruk

unread,
Jun 23, 2008, 5:01:55 PM6/23/08
to oai...@googlegroups.com
OK - I am on my way back to Galway now - I will try to pull together a
section or two regarding solutions like Sindice for OAI-ORE discovery
by the end of the week.

S.

-- sebasti...@gmail.com
-- GG: 335067,
-- Jabber: sebasti...@gmail.com
-- Skype: sebastiankruk
-- WWW: http://www.sebastiankruk.com/

harry

unread,
Jun 23, 2008, 8:06:41 PM6/23/08
to OAI-ORE
Hi all,

Thanks for the response, I will try to read further and understand on
the doc provided by Michael.


Harry

hvd...@gmail.com

unread,
Jun 24, 2008, 12:27:47 PM6/24/08
to OAI-ORE
hi all,

I recently received a private email asking a very similar question.
The mail was from someone in the Netherlands working on a European
repository federation project in which they want to experiment with
ORE. I thought I'd share the essence of the response I sent:

Which discovery approach(es) you choose really depends on who your
target consumer is:

(*) if you know that a federating system (such as the DRIVER central
system) has the tools to OAI-PMH-harvest (which it does), then it is
probably good to expose your ReMs via an OAI-PMH repository. doing so
leverages existing infrastructure both at the end of repositories and
the federating system.

(*) if you want to give web crawlers a fair chance at discovering your
ReMs, then both the HTML LINK and the HTTP LINK HEADER are good
choices.

(*) if you want the ReMs to be recognized as Atom feeds (that is when
you choose for the Atom serialization) so that users can subscribe to
them, then the HTML LINK is a good approach.

cheers

herbert van de sompel

On Jun 23, 3:01 pm, Sebastian Kruk <sebastian.k...@gmail.com> wrote:
> OK - I am on my way back to Galway now - I will try to pull together a
> section or two regarding solutions like Sindice for OAI-ORE discovery
> by the end of the week.
>
> S.
>
> -- sebastian.k...@gmail.com
> -- GG: 335067,
> -- Jabber: sebastian.k...@gmail.com
> -- Skype: sebastiankruk
> -- WWW:http://www.sebastiankruk.com/
>
> On 23 Jun 2008, at 12:31, Mark Diggory wrote:
>
>
>
> > I agree 100% with that suggestion.
>
> > On Jun 23, 2008, at 3:36 PM, Sebastian Kruk wrote:
>
> >> Hello,
>
> >> I think we might also consider extending the discovery document with
> >> guidelines for using semantic web indexing services like e.g.,http://sindice.com/
> >> .
>
> >> S.
>
> >> -- sebastian.k...@gmail.com
> >> -- GG: 335067,
> >> -- Jabber: sebastian.k...@gmail.com

Dave Tarrant

unread,
Jun 24, 2008, 12:36:50 PM6/24/08
to oai...@googlegroups.com
I've also been asked a similar questions when trying to harvest the
data from an established repository, as was demonstrated at Open
Repositories 2008 by the developer challenge winners. I'm already
considering the methods which herbert mentions more closely.

The second method is already in place, i'm currently building an atom
serialized version such that point 3 can be achived, from this I hope
to build a schema which can enable the 1st one of the points as this
is required by OAI-PMH as i understand it.

Currently this is all with an EPrints based repository, i'll let
people know the results if they are interested.

Cheers

David Tarrant
University of Southampton

Benjamin O'Steen

unread,
Jun 24, 2008, 12:48:53 PM6/24/08
to oai...@googlegroups.com
Dave, quick q - can the ResMap plugin be exposed through the OAI-PMH
plugin in EPrints?

e.g. $EPRINTS/oai2?verb=ListRecords&MetadataPrefix=ResMap

(or a less server intensive, ResMapUrl choice perhaps?)

I am finishing off a generic (python) OAI-PMH harvester that can use
metadata plugins to process the records. The Foresite[1] library is
going to be the majority of the ORE plugin for this.

The short-term goal is to synchronise (via copy-by-value initially) the
published items in an EPrints 3.1 repository, as items in a Fedora
repository - with the aim that the Eprints to Fedora linking be a many
to one relationship.

Having the resmaps or just the urls to them via OAI-PMH would be very
handy for incremental updates, and it might be a quick addition to the
OAI-PMH plugin perhaps?

[1] - Foresite - http://code.google.com/p/foresite-toolkit/ (from
http://foresite.cheshire3.org/)

Dave Tarrant

unread,
Jun 24, 2008, 1:11:22 PM6/24/08
to oai...@googlegroups.com
Is exactly what i'm doing at the moment. It will go in a development
repo.

Dave T

Reply all
Reply to author
Forward
0 new messages