Embedding ORE resource maps ScHTML style

7 views
Skip to first unread message

Peter Sefton

unread,
May 11, 2011, 2:25:16 AM5/11/11
to scholar...@googlegroups.com, "Sam Adams, David F Flanders, Theo Andrew
Hi all,

I want to ask about an idea that came up post the ScHTML hackfest
weekend, of embedding an ORE resource map in a scholarly HTML document
as per the ORE spec: http://www.openarchives.org/ore/1.0/rdfa

The use case I want to consider first is a table of contents for a set
of pages, such as a journal issue, or a personal portfolio, or a JISC
project report, or the like, as a resource map. The immediate use case
is automatic harvest by a tool like Calibre which compiles ebooks.

Lets take the ToC for the Ontologesis project, a sort of encycolpedia
to do with ontology developmen as our 'aggregation'. It is made using
a WordPress tool called ktoc.


http://ontogenesis.knowledgeblog.org/table-of-contents

The 'resource map' looks like this. A list of resources with no added semantics.

<p><strong>Articles</strong></p>
<ul>
<li><a href="http://ontogenesis.knowledgeblog.org/49">Automatic
maintenance of multiple inheritance ontologies</a> by Mikel Egana
Aranguren</li>
<li><a href="http://ontogenesis.knowledgeblog.org/257">Characterising
Representation</a> by Sean Bechhofer and Robert Stevens</li>
<li><a href="http://ontogenesis.knowledgeblog.org/1001">Closing Down
the Open World: Covering Axioms and Closure Axioms</a> by Robert
Stevens</li>

</ul>

I thought I would put together an example here and see if people
think it makes sense, then put it to the ORE list, then to tool
developers. I want to use RDFa 1.1 as per the ScHTML approach so that
namespaces do not have to be declared and managed - this will probably
break strict ORE compliance but it would be impossible for people to
support it as is in heterogeneous toolsets.

The other ScHTML thing will be to define ONE microformat-style way to
do it so that it is easy to write tools without having to implement
RDFa1.1.

Based on the example on the ORE site I have this proposed markup. One
of the tricky parts of this is that aggregations and resource maps
should have different URIs. This is quite abstract for a world where
people produce and use simple tools like ktoc so I wonder if we can
make this happen by using #s. There might be more than one of these on
a page.

What do people think?

<div id="Aggregation1" class="aggregation"
rel="http://www.openarchives.org/ore/terms/DescribedBy"
about="#ResourceMap1"/>
<div id="ResourceMap1"
rel="http://www.openarchives.org/ore/terms/describes"
about="#aggregation1" class="resource-map">
<!-- we don't have much metadata about the RM but we do have a title
and must have a creator and date -->
<div property="http://purl.org/dc/terms/title"><strong>Articles</strong></div>

<span rel="http://purl.org/dc/terms/creator"><span
property="http://purl.org/dc/terms/Agent"
resource="http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin"/></span>

<span rel="http://purl.org/dc/terms/modified" content="2011-05-01"></span>

<ul>
<!-- Using rel and resource on the li element to wrap more information
about the resource that way tool developers can look for this element
then look within it for more metadata if they like -->
<li rel="http://www.openarchives.org/ore/terms/aggregates"
resource="http://ontogenesis.knowledgeblog.org/49"><a
href="http://ontogenesis.knowledgeblog.org/49">Automatic maintenance
of multiple inheritance ontologies</a> by <span
property="http://purl.org/dc/terms/creator" >Mikel Egana
Aranguren</span></li>
<li rel="http://www.openarchives.org/ore/terms/aggregates"
resource="http://ontogenesis.knowledgeblog.org/257"><a
href="http://ontogenesis.knowledgeblog.org/257">Characterising
Representation</a> by <span
property="http://purl.org/dc/terms/creator" >Sean Bechhofer</span> and
<span property="http://purl.org/dc/terms/creator" >Robert
Stevens</span></li>
<li rel="http://www.openarchives.org/ore/terms/aggregates"
resource="http://ontogenesis.knowledgeblog.org/1001"><a
href="http://ontogenesis.knowledgeblog.org/1001">Closing Down the Open
World: Covering Axioms and Closure Axioms</a> by <span
property="http://purl.org/dc/terms/creator" >Robert
Stevens</span></li>

</ul>
</div>


--
http://ptsefton.com

Peter Sefton

unread,
May 16, 2011, 3:57:39 AM5/16/11
to scholar...@googlegroups.com, Sam Adams, David F Flanders, Theo Andrew
Hi all,

I'd really appreciate some feedback about what I have proposed here.

I have refined this idea slightly and put it into a new Scholarly HTML
draft page: http://okfnpad.org/schtml-packaging

I have also implemented it to work with the KnowledgeBlogs TOC plugin,
which I have tentatively hacked to (a) add extra metadata to a KTOC
table of contents to make it (I hope) compatible with ORE encoded in
RDFa (v1.1) and (b) create output for each post which marks it as
Scholarly HTML. This raised an interesting issue - how to encode
metadata like title and author when they are located in different
parts of the HTML tree in the host WordPress site? This is important
so tool makers don't have to keep building screen scraping
applications to match particular WP themes or their equivalents in
other CMSs. I certainly didn't want to be building plugins that
require changes to the WordPress theme, so I have embedded redundant
metadata in a non-human readable way by extending the ktoc plugin to
wrap scholarly HTML tags around every post/page.

See the TOC: http://anthologize.ptsefton.com/toc/
See a post: http://anthologize.ptsefton.com/2011/05/12/test-post-1/


If anyone has time to help out it would be much appreciated. I would
like to get this page up on the ScHTML site soon - even with TODOs in
it, and to repost the Schtml core page which has had an update
courtesy of feedback from Les Carr. http://okfnpad.org/schtml-core

Peter

--
http://ptsefton.com

David F. Flanders

unread,
May 16, 2011, 4:02:12 AM5/16/11
to scholarly-html
Busy week, but will feedback anon. Looking forward to other people's
thoughts. /dff

Peter Sefton

unread,
May 16, 2011, 10:22:21 PM5/16/11
to se...@cam.ac.uk, scholar...@googlegroups.com, David F Flanders, Theo Andrew
Thanks Sam

Some questions and comments inline.

...

> First off can I suggest that (at least for now) we stick with RDFa 1. I know
> that the namespace declarations and prefixing can be a pain, but it means
> that we get tool support - I find the RDFa Developer Firefox Plugin
> (http://rdfadev.sourceforge.net/), which shows you all the RDF statements in
> a page, incredibly useful. So far as I know, there is nothing similar with
> support for RDFa 2 at the moment.
I understand the argument about tool support, for now, - but I really
think that ScHTML is not going to be workable with namespaces.
Tool support for developers eg using JQuery is just not there and
people will end up ignoring ns declarations and using strings. We are
then back to very loose microformat conventions - there are all sorts
of issues with XML parsers etc as well.


>
> To be valid RDFa, a document has to have the correct DOCTYPE declaration,
> and be xhtml:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
>                      "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"
>
We can aim for HTML+RDFa I think:
http://dev.w3.org/html5/rdfa/#document-conformance

I think the idea here is to provide something that will work as RDFa /
ORE if someone pastes it into the right context - the idea of
specifying a ScHTML convention is to give people a simple recipe they
can code against without having to try to support the whole of RDFa
etc. I will try to set up a test tool that can parse RFA1.1 and see
what it tells me.


> I don't think the current output parses as intended right now... I think
> what you're aiming for is something like:
> <div about='#ResourceMap' typeof="ore:ResourceMap">

The typeof seems like the right idea but the example on the OAI site
does not actually show an resource map saying that it is a resource
map explicitly. It has <div
about="http://arxiv.org/rem/xhtml/astro-ph/0601007"
class="ResourceMap">. It says:

"A Resource Map MUST include one triple with an ore:describes
predicate. The subject of this triple MUST be the URI-R of the
Resource Map. The object is the URI-A of the Aggregation described by
the Resource Map."


>
>
> Sam

Phillip Lord

unread,
May 17, 2011, 4:45:31 AM5/17/11
to scholar...@googlegroups.com, Sam Adams, David F Flanders, Theo Andrew
Peter Sefton <ptse...@gmail.com> writes:

--
Phillip Lord, Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: philli...@newcastle.ac.uk
School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower, skype: russet_apples
Newcastle University, msn: m...@russet.org.uk
NE1 7RU twitter: phillord

Phillip Lord

unread,
May 17, 2011, 4:59:06 AM5/17/11
to scholar...@googlegroups.com, Sam Adams, David F Flanders, Theo Andrew
Peter Sefton <ptse...@gmail.com> writes:
> I'd really appreciate some feedback about what I have proposed here.
>
> I have refined this idea slightly and put it into a new Scholarly HTML
> draft page: http://okfnpad.org/schtml-packaging
>
> I have also implemented it to work with the KnowledgeBlogs TOC plugin,
> which I have tentatively hacked to (a) add extra metadata to a KTOC
> table of contents to make it (I hope) compatible with ORE encoded in
> RDFa (v1.1) and (b) create output for each post which marks it as
> Scholarly HTML. This raised an interesting issue - how to encode
> metadata like title and author when they are located in different
> parts of the HTML tree in the host WordPress site?

While I can see that this is an important concern in general, I am not
totally convinced that it should be a significant problem. KnowledgeBlog
for example goes slightly nuts on the metadata front, and publishes
title, author and other metadata in quite a few different formats,
including DC, meta tags and coins.

It wasn't that we wanted to do this, but we needed it for support from
various tools. My concern, as it where, from the point of view of a
content producer, rather than consumer is putting too much load at my
end.

So my question would be, are we achieving anything that cannot be
achieved with any of the techniques that already exist for embedding
metadata in posts? I would rather see scholarly HTML fit in with the
existing morass of standards than add to it.

Which is, of course, not to say that we will not add to our
knowledgeblog content more metadata, especially if someone else writes
the code for us:-)

Phil

Phillip Lord

unread,
May 17, 2011, 6:00:13 AM5/17/11
to scholar...@googlegroups.com

Apologies all. I appear to have sent an email out with no appreciable
new content. I blame the tools.

Phil

Peter Sefton

unread,
May 17, 2011, 5:07:15 PM5/17/11
to scholar...@googlegroups.com, Sam Adams, David F Flanders, Theo Andrew
On Tue, May 17, 2011 at 6:59 PM, Phillip Lord
<philli...@newcastle.ac.uk> wrote:

> While I can see that this is an important concern in general, I am not
> totally convinced that it should be a significant problem. KnowledgeBlog
> for example goes slightly nuts on the metadata front, and publishes
> title, author and other metadata in quite a few different formats,
> including DC, meta tags and coins.
>
> It wasn't that we wanted to do this, but we needed it for support from
> various tools. My concern, as it where, from the point of view of a
> content producer, rather than consumer is putting too much load at my
> end.

Well this is more - in a slightly different part of the post - to make
it easier to process in the use-case I'm looking at. In the case of
ktoc if you want people to be able to use it without your other
plugins my approach will reduce problems down the track for people
processing Scholarly HTML fragments assembled from other resources.

> So my question would be, are we achieving anything that cannot be
> achieved with any of the techniques that already exist for embedding
> metadata in posts? I would rather see scholarly HTML fit in with the
> existing morass of standards than add to it.

The main thing I'm doing here is quite different to metadata about the
post, it's about listing other resources - so the technique I'm
talking about is adding metatata that you don't have elsewhere. Of
course you COULD put it in the head with the other metadata instead of
or as well as doing it itinline but that makes the plugin much more
complicated to write.

>
> Which is, of course, not to say that we will not add to our
> knowledgeblog content more metadata, especially if someone else writes
> the code for us:-)

I wrote some :)
>
> Phil
>
pt


--
http://ptsefton.com

Phillip Lord

unread,
May 18, 2011, 4:49:16 AM5/18/11
to scholar...@googlegroups.com, Sam Adams, David F Flanders, Theo Andrew

Peter Sefton <ptse...@gmail.com> writes:
>> It wasn't that we wanted to do this, but we needed it for support from
>> various tools. My concern, as it where, from the point of view of a
>> content producer, rather than consumer is putting too much load at my
>> end.
> Well this is more - in a slightly different part of the post - to make
> it easier to process in the use-case I'm looking at. In the case of
> ktoc if you want people to be able to use it without your other
> plugins my approach will reduce problems down the track for people
> processing Scholarly HTML fragments assembled from other resources.

You are certainly right that it would make ktoc more useful in a stand
alone sense, which is a good thing. And from the knowledgeblog point of
view, it's a perfectly good thing to add; I have no problem with that.

But, my point remains, I think; if I were delivering content in general,
then this increases the complexity of provision of metadata. This is so
because I am likely to have to provide the individual post metadata
anyway.

Would something like a XOXO list pointing to the resources which expose
their own metadata be enough?

>> So my question would be, are we achieving anything that cannot be
>> achieved with any of the techniques that already exist for embedding
>> metadata in posts? I would rather see scholarly HTML fit in with the
>> existing morass of standards than add to it.
> The main thing I'm doing here is quite different to metadata about the
> post, it's about listing other resources - so the technique I'm
> talking about is adding metatata that you don't have elsewhere. Of
> course you COULD put it in the head with the other metadata instead of
> or as well as doing it itinline but that makes the plugin much more
> complicated to write.

I can appreciate all of this. I am just thinking that for scHTML, there
are should be as few standards as possible (but no fewer).

>> Which is, of course, not to say that we will not add to our
>> knowledgeblog content more metadata, especially if someone else writes
>> the code for us:-)
> I wrote some :)

And very welcome it is too.

Phil

Peter Sefton

unread,
May 23, 2011, 3:44:04 AM5/23/11
to se...@cam.ac.uk, scholar...@googlegroups.com, David F Flanders, Theo Andrew
Sam,

I took your advice and replaced the explicit URIs with namespaces and
debugged using this service: http://rdf-in-html.appspot.com/

Taking this bit by bit.

I have wrapped the TOC in a div with an ID so it can be the
'aggregation'. The name used here in the ID is derived from the
WordPress category.
<div id="AggregationUncategorized">

Inside this is the resource map which says that it describes the aggregation.

<div rel="ore:describes" resource="#AggregationUncategorized"
id="ResourceMapUncategorized" about="#ResourceMapUncategorized">

This gives us correct RDF - I think:
<urn:invalid:posted#:ResourceMapUncategorized>
<ore:describes>
<urn:invalid:posted#AggregationUncategorized>

There is some metadata about the resource map, including a title:
<h1 property="dc:title" about="#ResourceMapUncategorized">The Hand
Painted Films of Margaret Tait - by Joss Winn</h1>

<urn:invalid:posted#ResourceMapUncategorized>
<http://purl.org/dc/elements/1.1/title>
"The Hand Painted Films of Margaret Tait - by Joss Winn"@en .

And the aggregation triples where the subject is the aggregation.
<ul>
<li rel='http://www.openarchives.org/ore/terms/aggregates'
resource='http://anthologize.ptsefton.com/on-the-idea-of-permanence/'>
<a href='http://anthologize.ptsefton.com/on-the-idea-of-permanence/'>1.
‘On The Idea of Permanence’</a> <span
property='http://purl.org/dc/terms/creator' content='Joss Winn'><!--
--></span>
</li>

Which gives us:
<urn:invalid:posted#AggregationUncategorized>
<ore:aggregates>
<http://anthologize.ptsefton.com/on-the-idea-of-permanence/> .
<http://anthologize.ptsefton.com/on-the-idea-of-permanence/>
<http://purl.org/dc/elements/1.1/creator>
"Joss Winn"@en .

So, this is significantly more complicated than what one might invent
as a simple convention for tables of contents, but I think there is
value in supporting ORE so that ORE tools can process the pages. BUT,
I still think we should recommend the use of URIs as per RDFa 1.1 and
avoid CURIES - as there is no tool support for this kind of namespace
handling in HTML tools such as JQuery.


On Mon, May 16, 2011 at 11:30 PM, <se...@cam.ac.uk> wrote:
>
> Hi,
>
> I'm afraid I don't have a lot of time to spend on this, I've gone through
> the table of contents page and have some quick comments...


>
> First off can I suggest that (at least for now) we stick with RDFa 1. I know
> that the namespace declarations and prefixing can be a pain, but it means
> that we get tool support - I find the RDFa Developer Firefox Plugin
> (http://rdfadev.sourceforge.net/), which shows you all the RDF statements in
> a page, incredibly useful. So far as I know, there is nothing similar with
> support for RDFa 2 at the moment.
>

> To be valid RDFa, a document has to have the correct DOCTYPE declaration,
> and be xhtml:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
>                      "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"

>        ...


>
> I don't think the current output parses as intended right now... I think
> what you're aiming for is something like:
>
> <div about='#ResourceMap' typeof="ore:ResourceMap">

>  <div property='dct:title' content="Contents">
>    <strong>Contents</strong>
>  </div>
>  <span property='dct:modified'
>        content='2000-07-01T00:00:00+00:00'></span>
>  <span rel='dct:creator'
> resource='http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin'>
>    <span rel="rdf:type" resource="dct:Agent"></span>
>    <span property="foaf:name"
>          content="knowledgeblog-table-of-contents-plugin"></span>
>  </span>
>
>  <div rel='ore:describes'>
>
>    <ul about='#Aggregation' typeof="ore:Aggregation">
>      <li rel='ore:aggregates'
> resource='http://anthologize.ptsefton.com/2011/05/12/test-post-1/'>
>
>        <a
> href='http://anthologize.ptsefton.com/2011/05/12/test-post-1/'
>           property="dct:title">Test post 1</a> by
>        <span property='dct:creator'>admin</span>
>      </li>
>
>      <li rel='ore:aggregates'
> resource='http://anthologize.ptsefton.com/2011/05/12/test-post-2/'>
>
>
>        <a
> href='http://anthologize.ptsefton.com/2011/05/12/test-post-2/'
>           property="dct:title">Test post 2</a> by
>        <span  property='dct:creator'>admin</span>
>
>      </li>
>    </ul>
>
>  </div>
> </div>
>
> Sorry I haven't got time to write a fuller comment right now, but I hope
> that helps...
>
> There are some useful examples in the ORE documentation:
> http://www.openarchives.org/ore/1.0/rdfa.html
>
>
> Sam

Reply all
Reply to author
Forward
0 new messages