DSpace + OAI-ORE + (AtomPub?)

40 views
Skip to first unread message

pkeane

unread,
May 30, 2009, 12:43:14 PM5/30/09
to OAI-ORE
Hi All-

I attended a session at the Texas Conference on Digital Libraries this
past week. Included in the presentation was a project by the Texas
Digital Library to build a DSpace OAI-PMH harvester that would ingest
OAI-ORE documents [1]. Seems like a v. neat project -- I believe TDL
plans to put this into production very soon. The harverster can be
configured to save either links to aggregated resources included in
the ReM (in which case the DSpace end user will be given a link to the
aggregated resource at its original URL) OR to actually ingest that
aggregated resource, creating a new record for it in DSpace. This
project was built using the Atom serialization format of OAI-ORE.

I would very much like to see this functionality extended just a bit
to allow for AtomPub-based posting of ReMs into a DSpace
installation. All that would be requried would be a servlet that
would accept a "POST" with mime-type "application/atom
+xml;type=entry." Which could then pass along the Atom Entry ReM to
the same code that knows how ot ingest a ReM as part of the harvesting
process (with, of course, whatever validation need occur). In
addition, such functionality could be exposed and made discoverable by
an Atom Service document that listed the correct end-point for the
AtomPub OAI-ORE interface, and even a link@rel=service in the main
collection web page for easy machine discoverability.

I've built a number of such AtomPub interfaces in PHP and Python, and
the hardest part is parsing that atom entry and creating the
appropriate mappings to local metadata (all of which is already done
in the TDL project). Other than that, it's quite easy. Anyone have
any thoughts on the usefulness of such a thing and/or the difficulty
of creating the code for DSpace (my java chops are too rusty to have a
good sense of that).

I realize that the Sword protocol has some of the same functionality
and is built on AtomPub. But SWORD (as far as I know) knows nothing
of the Atom serialization of OAI-ORE. Of course I am sure that the
SWORD code has most of the spare parts that would be needed to get
this AtomPub OAI-ORE interface working.

I'd love to see such a thing happen, since my department has many 10s
of thousands of papers we'd like to put in our library's Dspace
installation. Posting Atom-based ORE resource maps would be a clean
and simple solution.

--Peter Keane
The University of Texas at Austin


[1]
http://www.tdl.org/about-tdl/events/texas-conference-on-digital-libraries-2009/keynote-abstracts/#repository-interoperability-in-the-texas-digital-library-through-the-use-of-oai-ore

Robert Sanderson

unread,
Jun 3, 2009, 8:09:30 AM6/3/09
to oai...@googlegroups.com

Do they also expose their content via ORE?  I can't find it anywhere if they do. :(

--Rob

Peter Keane

unread,
Jun 3, 2009, 8:15:33 AM6/3/09
to oai...@googlegroups.com
Hi Rob-

I'm pretty sure that the TDL project included a piece to create
resource maps and publish them in OAI-PMH (I'm not 100% certain on
that, but I do know the harvester assumes resource maps are placed in
OAI-PMH). I'll also note that they have used the Atom serialization
for the resource maps (w/ the assumption they can move to RDF-XML at
some point).

--peter

Simeon Warner

unread,
Jun 3, 2009, 9:13:47 AM6/3/09
to OAI-ORE, AMa...@lib-gw.tamu.edu
This work does involve mapping DSpace collections/objects to OAI-ORE
for exchange of the data. It was one of my favorite presentations at
OR09.
The abstract is here:

https://or09.library.gatech.edu/general83.php

but it seems that slides from OR09 presentations aren't up yet. I cc
Alexey who can perhaps point to slides?

Cheers,
Simeon


On Wed, Jun 03, 2009 at 07:15:33AM -0500, Peter Keane wrote:
> Hi Rob-
>
> I'm pretty sure that the TDL project included a piece to create
> resource maps and publish them in OAI-PMH (I'm not 100% certain on
> that, but I do know the harvester assumes resource maps are placed in
> OAI-PMH). I'll also note that they have used the Atom serialization
> for the resource maps (w/ the assumption they can move to RDF-XML at
> some point).
>
> --peter
> On Wed, Jun 3, 2009 at 7:09 AM, Robert Sanderson <azar...@gmail.com> wrote:
> >
> > Do they also expose their content via ORE??? I can't find it anywhere if they
> > do. :(
> >
> > --Rob
> >
> >
> > On Sat, May 30, 2009 at 5:43 PM, pkeane <pjk...@gmail.com> wrote:
> >>
> >> I attended a session at the Texas Conference on Digital Libraries this
> >> past week. ??Included in the presentation was a project by the Texas

Mark Diggory

unread,
Jun 3, 2009, 9:52:44 AM6/3/09
to oai...@googlegroups.com, Alexey Maslov, Aaron Zeckoski
Peter,

There is work in that project to include Crosswalks for ORE ATOM ReMs,
this should be able to be reused in ingest manifests.

https://source.tdl.org/svn/dspace/branches/dspace-1_5_0-with-harvesting/dspace-api/src/main/java/org/dspace/content/crosswalk/OREIngestionCrosswalk.java

I think Alexey can comment on if there is a Packager exposed for
SWORD/LNI that can take the next step of exposing that capability for
external agents. I took Alexey's presentation to mean they avoided
SWORD/LIN ingest issue by having the target Repository be running the
agent internally. I actually think thats an important point. Because
the big question is whos in charge of deciding what gets put into the
repository other than its maintainers, and in their case its the
maintainer, not a 3rd party as int he SWORD case.

I think it would be good to direct this to Alexey as well because he's
the developer of the Harvester, he can comment on it further. Thus
I've CC'd him.

----

For me, your question challenges me to think about how this will
relate in DSpace 2.0 where our data model becomes more flexible and we
stop thinking about content in DSpace as "Items" in "Collections". In
fact, I've left behind my original work to map ORE to DSpace
Items/Collections/Communities explicitly because DSpace 2.0 drops the
entire rigid model in favor of a simplified entity-relationship
modeling approach where "type" is just a property and "containership"
is just a relation. In this case, any URI (URI-R, URI-A, URI-AR,
URI-P) expressed in the ReM becomes an Entity in DSpace and its
properties that are NonLiteral references become "relations".

But... This still comes back to my original debate about "Content
Types" vs "ORE", in DSpace 2.0, We are trying to define "profiles" for
Entities in the DCMI DescriptionSet Profile sense of the term, thus,
DCMI Application Profiles can be encoded in the DSpace 2.0 Metadata
Registry of DSpace 2.0 and used as templates to validate and build
different "Profiles" of Composite Digitial Objects, my intention is
that its the choice of the repository designer how rigid the influence
of these profiles will be on the content expressed in the repository.

So, how does this relate to what your asking? Alexey's approach of
"making the repository the agent" still puts the job of creating that
mapping on the repository maintainer, an already overtaxed and
struggling group of archivists and librarians who want tools to make
their lives easier... not harder. However, in the DSpace Community,
Aaron Zeckoski and a GSoC student are actually working on an interface
for interacting with the DSpace Entities via the traditional REST
approaches (not APP specific package submission, I.E. not constrained
by SWORD or APP or even Atom). I dare to say the OAI-ORE community
should be considering how a simple protocol like REST applies to
OAI-ORE, how might an agent to basically interact with the repository
on an atomic level to construct the composite digital object by the
"playing" of PUT/POST commands containing fragments of ORE ReMs or any
other simple REST fragments. This is much different than making the
repository responsible for providing such a mapping (forcing DSpace
core developers to supply ingest packager support over and over again
as the next greatest "standard" becomes popular). This vaccinates the
DSpace repository maintainers against the disease of YAMMOSES (Yet
Another Manifest Mapping Of Someone Else's Standard) rampant in our
community.

cheers,
Mark

On Sat, May 30, 2009 at 9:43 AM, pkeane <pjk...@gmail.com> wrote:
>
> Hi All-
>
> I attended a session at the Texas Conference on Digital Libraries this
> past week.  Included in the presentation was a project by the Texas
> Digital Library to build a DSpace OAI-PMH harvester that would ingest
> OAI-ORE documents [1].  Seems like a v. neat project -- I believe TDL
> plans to put this into production very soon.  The harverster can be
> configured to save either links to aggregated resources included in
> the ReM (in which case the DSpace end user will be given a link to the
> aggregated resource at its original URL) OR to actually ingest that
> aggregated resource, creating a new record for it in DSpace.  This
> project was built using the Atom serialization format of OAI-ORE.
>
> I would very much like to see this functionality extended just a bit
> to allow for AtomPub-based posting of ReMs into a DSpace
> installation.  All that would be requried would be a servlet that
> would accept a "POST" with mime-type "application/atom
> +xml;type=entry."  Which could then pass along the Atom Entry ReM to
> the same code that knows how ot ingest a ReM as part of the harvesting
> process (with, of course, whatever validation need occur).  In
> addition, such functionality could be exposed and made discoverable by
> an Atom Service document that listed the correct end-point for the
> AtomPub OAI-ORE interface, and even a link@rel=service in the main
> collection web page for easy machine discoverability.

I think the ultimate intention with the the

Elliot Metsger

unread,
Jun 3, 2009, 9:53:37 AM6/3/09
to oai...@googlegroups.com, AMa...@lib-gw.tamu.edu
They wrote a metadata crosswalk that created the ReM, and exposes it via oai-pmh.... awesome presentation so if Alexey can post it somewhere?  The Dspace wiki perhaps?

Robert Sanderson

unread,
Jun 3, 2009, 10:20:45 AM6/3/09
to oai...@googlegroups.com
Mark,

You're saying that we should consider what happens if you PUT/POST a Resource Map?  Isn't that up to the recipient of the operation to determine?  ORE is a data model, not a protocol with expected server behaviour, and especially not for create rather than retrieve.

--Rob

Peter Keane

unread,
Jun 3, 2009, 10:30:37 AM6/3/09
to oai...@googlegroups.com
That's exactly the reason I've been so interested in the Atom
serialization of OAI-ORE -- since AtomPub IS a protocol, if you have
an atom:entry ReM you have a protocol on which to base the function of
PUT/POST operations. Of course it leaves the burden of a mapping (as
Mark said) on the repository owner, but that's been addressed in the
TDL PMH/ORE harvester.

Certainly there would need to be an HTTP-based authentication for
posting, but the AtomPub functionality would need only a AtomPub
service doc describing the endpoint, and a simple AtomPub handler for
POST/PUT.

I suspect our use case is not atypical: UT College of Liberal Arts is
providing faculty with an interface to upload papers into a "Faculty
sandbox" for departmental websites. We'd like a way to programmatical
place a copy in the library's IR (DSpace). ORE/AtomPub seems natural.


> --Rob
>
> >
>

Peter Keane

unread,
Jun 3, 2009, 5:37:27 PM6/3/09
to oai...@googlegroups.com, AMa...@lib-gw.tamu.edu
On Wed, Jun 3, 2009 at 8:13 AM, Simeon Warner <arxiv...@gmail.com> wrote:
>
> This work does involve mapping DSpace collections/objects to OAI-ORE
> for exchange of the data. It was one of my favorite presentations at
> OR09.
> The abstract is here:
>
> https://or09.library.gatech.edu/general83.php
>
> but it seems that slides from OR09 presentations aren't up yet. I cc
> Alexey who can perhaps point to slides?

Simeon-

Alexey's presentation slides from TCDL were just put up at
http://www.tdl.org/about-tdl/events/texas-conference-on-digital-libraries-2009/tcdl-2009-presentations/
(second listing under "Texas Digital Library" heading).

--peter
Reply all
Reply to author
Forward
0 new messages