> I'm rather new to ORE, and its rather different to working with
> 'normal' Atom feeds - could there be simple code snippets to crib
> from?
>
You might find what you are looking for in the ATOM serialisation of the
specification:
http://www.openarchives.org/ore/0.9/atom-implementation
> In particular I'm trying to generate a resource map from user's
> collections of aggregated Atom entries for submission via SWORD, and
> I'm not exactly clear on how I go about it using foresite!
>
I'm not sure if I've precisely understood, but I think that I'm doing
something similar.
I'm posting ATOM feed documents (serialised resource maps) as mime
attachments in SWORD, and have written a custom ingester to turn these
into DSpace items. Is this similar?
The basic process is to take the atom feed document and run it through
the foresite atom parser:
InputStream inputStream = new FileInputStream("file:///my/atom/feed.atom");
OREParser parser = OREParserFactory.getInstance("ATOM-1.0");
ResourceMap rem = parser.parse(inputStream);
The ATOM parser and the ATOM serialiser are not yet complete,
unfortunately. I have spent some time trying to make the serialiser
work for the 0.9 specification, but the parser is set up as per 0.3 so
there is some asymmetry in reading and writing (i.e. YMMV :) ). These
are now basically at the top of my stack of things to do next, so any
feedback would be welcome.
If the ATOM feeds that you are talking about are just "normal" ones, how
they get parsed may be a bit unpredictable. ORE specifies a
serialisation of a resource map using atom, but does not claim that any
atom feed could be considered a true resource map; I'd be interested to
hear how you get on.
Cheers,
Richard
--
=======================================================================
Richard Jones | Hewlett-Packard Limited
Research Engineer, HP Labs | registered office:
Bristol, UK | Cain Road, Bracknell,
| Berks, RG12 1HN.
| Registered No: 690597 England
eml: richard...@hp.com -------------------------------------
blg: http://chronicles-of-richard.blogspot.com/
-----------------------------------------------------------------------
The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error, you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated you should consider this message and attachments as "HP
CONFIDENTIAL".
========================================================================
On Wed, 2008-06-11 at 02:14 -0700, scottw wrote:
> I'm rather new to ORE, and its rather different to working with
> 'normal' Atom feeds - could there be simple code snippets to crib
> from?
There's currently only a few examples in the wiki, but we're working on
extending the documentation for both implementations.
http://code.google.com/p/foresite-toolkit/w/list
If there's anything in particular that you think should be written up
first, please let us know and we'll do our best to prioritise that
aspect.
> In particular I'm trying to generate a resource map from user's
> collections of aggregated Atom entries for submission via SWORD, and
> I'm not exactly clear on how I go about it using foresite!
That sounds quite cool! Could you explain a bit more about it? Are the
atom entries just from any old atom feed, or other ORE aggregations?
The issue that I see off the bat is that aggregated resources need to
have their own resolvable URIs, and atom entry ids are often tags or
uuids.
Thanks!
Rob
> Success! I think!
>
Fantastic!
> OK, after a few changes to the Atom Parser and a few other places
> (just for sanity-checking whether source values are present), and
> inventing a completely spurious URI for the unpublished resource map,
> I import my Atom feed of aggregated "things" from various sources
> (some Atom feeds, some RSS feeds) and generate a Resource Map in RDF
> XML.
>
Did you have problems with the model itself for parsing the atom feed,
or was it all in the parser? I'll get onto fixing the parser up proper
in the next week or so, but I'd be grateful for any code feedback if you
have some.
> INPUT:
>
>
<snip/>
> OUTPUT:
>
> <rdf:RDF
> xmlns:foaf="http://xmlns.com/foaf/0.1/"
> xmlns:dcterms="http://purl.org/dc/terms/"
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns:owl="http://www.w3.org/2002/07/owl#"
> xmlns:ore="http://www.openarchives.org/ore/terms/"
> xmlns:dc="http://purl.org/dc/elements/1.1/" >
>
> <rdf:Description rdf:about="http://test.com">
> <dc:title>Package for JORUM</dc:title>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> index/~3/252594141/click.phdo"/>
> <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/
> Aggregation"/>
> <ore:aggregates rdf:resource="http://www.flickr.com/photos/
> jhrphotos/2278016494/in/pool-52241664802@N01"/>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> index/~3/252093275/click.phdo"/>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> index/~3/252594142/click.phdo"/>
> <ore:isDescribedBy rdf:resource="http://www.test.com"/>
> <ore:aggregates rdf:resource="http://news.bbc.co.uk/go/rss/-/2/hi/
> americas/7253491.stm"/>
> <ore:aggregates rdf:resource="http://www.flickr.com/photos/
> 47914087@N00/2276506733/in/pool-52241664802@N01"/>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> index/~3/252594139/click.phdo"/>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> index/~3/252747611/click.phdo"/>
> <ore:aggregates rdf:resource="http://feeds.wired.com/~r/wired/
> topheadlines/~3/235881988/click.phdo"/>
> </rdf:Description>
>
> <rdf:Description rdf:about="http://www.test.com">
> <dc:creator rdf:nodeID="A0"/>
> <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/
> ResourceMap"/>
> <dcterms:modified rdf:datatype="http://www.w3.org/2001/
> XMLSchema#date">2008-06-11T16:17:20+0100</dcterms:modified>
> <dcterms:created rdf:datatype="http://www.w3.org/2001/
> XMLSchema#date">2008-06-11T16:17:21+0100</dcterms:created>
> <dc:format rdf:datatype="http://www.w3.org/2001/
> XMLSchema#string">application/octet-stream</dc:format>
> <ore:describes rdf:resource="http://test.com"/>
> </rdf:Description>
>
>
so, with a bit of snipping I checked the ReM and Aggregation and things
are looking good! Did you notice any data loss in the transformation?
> If I use the Atom serializer, however, its a bit bare as you might
> expect at this stage!
>
Yes, sorry about that. If you read the source it says things like:
// FIXME: this bit is boring, come back and do it later
:)
Basically I have been able to build the structure of the atom document,
but the process of building the extra bits (the additional atom terms
etc, and all the embedded rdf) is quite tortuous and requires spending a
bit more time staring at the spec. Promise that this is top of my
list. The feedback from someone like you who understands atom better
than I do will be very useful.
> Next step: deposit into repository with SWORD. Anyone got a handy ORE-
> enabled SWORD server for testing?
>
What do you want to put stuff in? I have pretty much complete code to
turn a DSpace SWORD server into a ORE enabled one which I plan to share
sometime next week. I'm happy to send you the changes in their current
state, but I don't have a server that you can just play with sorry.
Cheers, I'll incorporate this today.
Richard
I've just checked in some changes to do much better at serialising and
parsing atom documents. While not yet complete, they only leave out
what I think are edge and corner cases, and where there is some
ambiguity in the specification as to how best to serialise additional
RDF into the documents.
To make it a bit easier to test with, I've also added a basic CLI, so
you can now throw documents at the library as follows:
java org.dspace.foresite.cli.ForesiteCLI -t -i /path/to/file.xml -o
/path/to/output.xml -f ATOM-1.0 -r RDF/XML
This instructs it to transform (-t) the input file (-i) which is of
format ATOM-1.0 (-f) into an RDF/XML document (-r) which will be written
to the output path (-o). If you omit -o, then the output will be
written to stdout. There is some basic documentation on the wiki:
http://code.google.com/p/foresite-toolkit/wiki/JavaLibrary
and if you run the class without arguments it will output the usage
information.
Cheers,
Richard