OAI-PMH as a possible means of duplicating state/metadata between shared-name servers

0 views

Skip to first unread message

Douglas Burke

unread,

Apr 29, 2009, 9:20:14 PM4/29/09

to shared...@googlegroups.com

Thanks for the interesting meeting today. I briefly chatted with Alan
about whether OAI-PMH would be a good fit for sharing/duplicating the
metadata between the shared-name servers. Here's a few pointers about
this. I don't have a stake in OAI-PMH, so would not be offended if the
suggestion is shot down in flames!

OAI-PMH - The Open Archives Initiative Protocol for Metadata Harvesting,
[1] - is a way of sharing data between archives. There's a tutorial
about it at [2] which I've only just come across so I can't vouch for
it's usefulness. There's a discussion of some ways to use OAI-PMH at
[3], pointing out it need not be tied to its original objective (of
sharing data between library communities).

We use it in Astronomy to allow data to flow between our "registeries",
which are collections of metadata about named items of interest to us
(it can be organisations, people, services, data sets). This data can
then be used either as a replica of the original server or within an
application, but what Astronomers do with the data is a bit beyond the
scope of this discussion :-) If you want to know more ask me or try [4],
[5], and [6].

The transmission format is XML; we have a set of schemas to define our
domain-specific data, but there is a base set of fields based on Dublin
Core. Given the discussions today I doubt it would be hard to come up
with an XML serialization of most of the data, apart from possibly the
databank templates.

The protocol allows clients to only request data that has been added
since the last time they harvested, allows for deleted records, chunking
of output, and data is sent/accessed via HTTP GET and POST requests via
a small number of verbs (6).

One advantage I see to an approach like this is that you have tied the
data replication to an open format rather than a specific database
technology (mysql), and allows external users to mirror data in a
standard manner for uses you may not have considered yet/have thought
about but haven't the time to implement.

The flip side of this argument is that a bunch of XML statements is
not-necessarily more useful or interoperable than a bunch of SQL.

I only have a "client's eye" view of the protocol; if useful I can put
interested parties in touch with Astronomers who have real-world
experience of running our registeries/implementing the protocol.

[1]: http://www.openarchives.org/OAI/openarchivesprotocol.html
[2]: http://www.oaforum.org/tutorial/
[3]: http://www.dlib.org/dlib/july03/young/07young.html

[4]: http://www.ivoa.net/Documents/latest/RegistryInterface.html
[5]: http://www.ivoa.net/Documents/latest/RM.html
[6]: http://www.ivoa.net/Documents/latest/VOResource.html

Thanks again for the meeting,
Doug

-------------------------------------------------------------------
Doug Burke | http://hea-www.harvard.edu/~dburke/
Harvard-Smithsonian | Email: dbu...@cfa.harvard.edu
Center for Astrophysics | Phone: (617) 496 7853
60 Garden Street MS-2 | Fax: (617) 495 7356
Cambridge, MA 02138 | Office: B-440
-------------------------------------------------------------------

Douglas Burke

unread,

Apr 30, 2009, 8:27:21 AM4/30/09

to shared...@googlegroups.com

Douglas Burke wrote:
>
> Thanks for the interesting meeting today. I briefly chatted with Alan
> about whether OAI-PMH would be a good fit for sharing/duplicating the
> metadata between the shared-name servers. Here's a few pointers about
> this. I don't have a stake in OAI-PMH, so would not be offended if the
> suggestion is shot down in flames!
>

An alternative push-style (rather than the pull-style of OAI-PMH)
approach would be AtomPub, but I have no experience with this.

One problem with both these proposed syndication mechanisms is that they
do not guarantee atomic updates: you would have to create some
synchronization scheme so that all clients are updated to the latest
working version at the same time after they have downloaded the updates
(ie an atomic update across all the clients).