OR 08 and 0.3 specification fallout

4 views
Skip to first unread message

Graham Triggs

unread,
Apr 8, 2008, 7:08:58 PM4/8/08
to OAI-ORE
Well, judging by the number of people talking to me about ORE at OR
08, I seem to be getting something of a reputation. I don't know,
maybe it's like the whole blogging phenomenon - in fact, if I actually
made a blog post about this, I might even be considered the authority
and make the specs redundant ;)

OK, I have a big laundry list of things to get through - points that
often tie together, which means that there isn't an obvious place to
start. Apart from to apologise in advance for the verbosity, and to
dive in with -

-- Proxies --

Looking at the specification, I can see why they were introduced. You
need some way of identifying the context of the aggregation, and
that's something I hadn't quite noticed as missing from the 0.2 specs
- so good work in identifying the problem. But let's start with the
less controversial issues, and move on from there...

So, the non-controversial comment is that based on the specifications
as written, I couldn't see anywhere that states that the URI of the
Proxy (URI-P - I was beginning to suspect that with all these URIs,
some were going to be taking the P ;), that the URI-P must be
actionable, and must yield or lead to a Resource Map when de-
referenced.

Without such a guarantee, you can not cite the URI-P.

Simple observation out of the way, let's move on to something that
will put the cat amongst the pigeons. When I first saw 'Proxy' on the
agenda for Friday, I nearly cried.

But, as I stated, looking at the description in the specification, I
can see there was a problem to be solved. This still isn't the way to
solve it though.

In the specification, the section on proxies is headed "Aggregation-
Specific Identities for Aggregated Resources". And that is precisely
what we need - an identity for the assertions within an aggregation
for a particular resource. In Atom, we already provide an identity for
an entry - and the examples even call it the URI-P in the open
issues.

In RDF, we can just assert an identity with a given <rdf:description>
- at most adding one predicate (although it could potentially re-use
ore:describes). It means we can remove the predicates for proxy, and
the concept of proxies - which brings me on to:

-- URIs --

A resource that we are aggregating is just a resource. It's URI is
just the URI of a resource. It doesn't become an aggregated resource
until you have the assertion(s) that it is part of an aggregation.

Once we assign an [separate] identity to those assertions, we can do
something rather nice. The aggregated resource is the assertion of a
resource that it is part of an aggregation. The identity of that
assertion is the identity of the aggregated resource. So, the identity
of those assertions - what was above described as URI-P - could
actually be called URI-AR.

It gives us a URI-AR which is distinct from the URI of the resource
itself. It removes the issues of a URI-AR also possibly being a URI-A,
if you are aggregating another aggregation. It removes any need to
talk about URI-P, and fewer of these URI variations we have floating
around, the better.

URI-AR can be the identity of the aggregated resource, the context of
including a resource as part of a specific aggregation. It makes the
specification smaller, tighter and less confusing.

-- Being Resolved --

So, now I've covered the proliferation of URI types and predicates,
what about the issue of resolution? At OR 08, about 8 people initiated
a discussion with me on this topic, and all shared the same view -
that they liked the fragment identifier format, and were concerned
about the round trip costs of having to resolve URI-As to find out the
Resource Map url.

Of course, these discussions were all before the introduction of what
is currently referred to as URI-P (although I've described above why I
believe that should be removed as a term) - and whilst it's stated as
an open issue in the specifications, it quite clearly isn't. If you
cite one of those URIs, it has to yield or lead to a Resource Map for
the context of the aggregation in exactly the same way as the URI-A.
Otherwise, you have a meaningless dead end.

So, for any URI-A or URI-P that can't yield a Resource Map url from
the use of a fragment identifier, then they have to be resolved to get
the Resource Map url. That's a significant amount of work.

But actually, the problem is worse than that. What happens if you try
to walk an aggregation to learn it's structure? Well, the
specification states that an aggregation as described in a resource
map can only be one level deep. So, for a structured 'item', you break
it into multiple aggregations, and build the structure by aggregating
the URI-A as resources in other aggregations.

What that means is that unless you can guarantee on the fragment
identifier being used for all URI-As, then to understand/walk the
structure of any aggregation, you have to action *every* aggregated
resource (without a fragment identifier) to determine if it is an
aggregation - and if it does lead to a resource map, do the same for
that one, etc.

So, if it wasn't bad enough before that you have to do a lot of extra
round trips, coping with the possibility that aggregations can
themselves be aggregated quickly runs into scalability issues.

To my mind, this clearly needs to be addressed. However, mandating
fragment identifiers [on http/https urls] is problematic -
particularly when it has to be done for URI-Ps as well as URI-As, so
you can't mandated what the fragment name is. Which leads me to
conclude that we need to define an ore[/ores] protocol for describing
URI-As and URI-Ps that can easily be converted into actionable [http/
https] urls that will deliver a resource map. Identifying and parsing
these URIs would then not rely on tricks or hacks, and we can walk
structures without having to hit every resource to find out whether it
is structural.

-- Serializations --

Now, if - as in my experience most people are requesting - that we
have URI-As/URI-Ps that yield to URI-Rs without requiring an
additional round trip, then we have to deal with the situation that we
may [need to] offer multiple serializations of that resource map.

Which brings us back to the basic principles of the ORE specification
- leaning on the existing web infrastructure. Why can't we mandate
having a single URI-R for a given aggregation, that will deliver the
different serializations through content negotiation? Maintain a 1:1
mapping; if you have different URI-Rs, you have different URI-As and
therefore different aggregations - even if the assertions of those
aggregations are the same.

I can't see why that should be a problem - except that the
specification does need to state how that negotiation will take place
ie. what is the preferred format when multiple can be accepted /
disseminated? Guarantee that the 'best' format offered must be
complete and accurate, otherwise it can't be made available as an
option, etc. But then these statements about the serializations would
be useful anyway, content negotiation or not.

-- Final words --

Yes, this has been gargantuan!! Well done making it this far. And note
I haven't even covered my previous points about innaccurate statements
made through indirect resource maps and semantically weak contexts on
forward links (which again, everyone that spoke to me about it at OR
08 had the same opinion that these simply need to be removed /
transformed into full strength semantic statements).

What was also quite clear from discussions at OR 08 is that when
talking to people involved in the ORE specification about these issues
(and I do genuinely mean more than one person here) it's often like we
are speaking in different languages. I don't know what other
discussions were had at OR 08, but on the basis of the numbers that
sought to engage me on this topic, then we need to find some way to
come together on these issues.

And with that, it's gone midnight in the UK, and I'm going to bed.

G

Ben O'Steen

unread,
Apr 9, 2008, 7:09:12 AM4/9/08
to OAI-ORE
Re: Resolutions and serialisations

In terms of costs of performing the resolutions, I think that the real
time taken is a little overstated. One of the largest costs will be
the DNS lookup of a host, which will be done once, and then locally
cached (unless some very poor software implementations are being
used). Then the rate limiting processes are either the generation/
serialisation of the map or the network transfer time to auto-discover
a preferred resource map serialisation, and I would strongly lean
towards the serialisation step being the limiter, at least on my
implementation anyway:

Request->net->Serialisation->net->receive ave time - 180ms for simple
maps, 600ms for complex (300+ aggregates)

Request->net->Content-negotiation->303 redirect->net->receive ave time
- 20-45ms independant of map complexity

I really don't think that the majority of resource map discover will
be done in this way. The methods for listing resource maps are likely
to be implemented in a 'flavoured' manner - i.e. OAI-PMH ->
metadataPrefix=ore_atom, metadataPrefix=ore_rdfxml, etc

What might be a good compromise is to have the content negotiation
tactic available on the URI-A, but have some method (like the fragment
trick from 0.2) to 'guess' the resource map serialisations from the
URI-A.

i.e. from 0.2

If you have http://host/thing/rem, then via the fragment trick, you
can guess that the URI-A is http://host/thing/rem#aggregation and vice-
versa

Likewise:
http://host/thing/aggregation - URI-A (can use content negotiation to
find URI-Rs)
http://host/thing/aggregation.xml - URI-R (atom)
http://host/thing/aggregation.rdf - URI-R (RDF/XML)
http://host/thing/aggregation.nt - URI-R (N-Triples format)
etc.

(I am biased as I'd like to have .xml -> RDF/XML and .atom -> atom as
it is implemented on my system, but I think the mapping above is more
widespread/desired? if the general consensus is yes, I can swap them
around easily enough.)

Mark Diggory

unread,
Apr 9, 2008, 9:57:44 AM4/9/08
to oai...@googlegroups.com

On Apr 9, 2008, at 4:09 AM, Ben O'Steen wrote:
>
> Re: Resolutions and serialisations
>
> In terms of costs of performing the resolutions, I think that the real
> time taken is a little overstated. One of the largest costs will be
> the DNS lookup of a host, which will be done once, and then locally
> cached (unless some very poor software implementations are being
> used). Then the rate limiting processes are either the generation/
> serialisation of the map or the network transfer time to auto-discover
> a preferred resource map serialisation, and I would strongly lean
> towards the serialisation step being the limiter, at least on my
> implementation anyway:

I tend to agree, the cost of doing a redirect is low in comparison to
serializing and transporting what might be a rather large aggregation.

> I really don't think that the majority of resource map discover will
> be done in this way. The methods for listing resource maps are likely
> to be implemented in a 'flavoured' manner - i.e. OAI-PMH ->
> metadataPrefix=ore_atom, metadataPrefix=ore_rdfxml, etc

Though OAI-PMH is a tried and true technology, I hope that this is
not the predominant case and that we see more ubiquitous approaches
that are more general to the entire web become the norm (Atom, plain
old RDF, RDFa).

> What might be a good compromise is to have the content negotiation
> tactic available on the URI-A, but have some method (like the fragment
> trick from 0.2) to 'guess' the resource map serialisations from the
> URI-A.
>
> i.e. from 0.2
>
> If you have http://host/thing/rem, then via the fragment trick, you
> can guess that the URI-A is http://host/thing/rem#aggregation and
> vice-
> versa

I think this whole emphasis on the "benefit of fragment identifiers"
is a bit overrated, I see no tool/client in existence yet that takes
a fragment identifier and appropriately returns just that fragment.
With "#rem", its rather inefficient to pull aggregate content (0..*)
not intended to be used to get at what is a small ResourceMap and
think its not a good idea to design into the spec as mechanism. It
should be left up to the implementer to decide if ReM and AG reside
at different URI or at the same URI and identified with fragment
id's. The spec should avoid locking implementations into solutions
that are inefficient when the implementers are providing feedback
that its an issue. I totally agree, the flexibility and granularity
introduced of being able to seperate ones Resource Map from ones
Aggregation far outweighs the cost of roundtriping to get the
aggregation for a particular ResourceMap.

>
> Likewise:
> http://host/thing/aggregation - URI-A (can use content negotiation to
> find URI-Rs)
> http://host/thing/aggregation.xml - URI-R (atom)
> http://host/thing/aggregation.rdf - URI-R (RDF/XML)
> http://host/thing/aggregation.nt - URI-R (N-Triples format)
> etc.
>
> (I am biased as I'd like to have .xml -> RDF/XML and .atom -> atom as
> it is implemented on my system, but I think the mapping above is more
> widespread/desired? if the general consensus is yes, I can swap them
> around easily enough.)

I think its bad to have file extensions identify "formats" even
though its in the cool urls recommendations. But, it appears to be a
convention. I too, don't know what a *.rdf file would contain? n3?
rdf-xml? nor do I know what a *.xml file would contain, atom? xmlrdf?
rss? xhtml? These URI will more than likely also have content types
associated with them when in html link elements (i.e. application/rdf
+xml, application/rdf+n3) and that would provide disambiguation.
Again I don't think its the spec responsibility to impose a
convention in this area, but instead rely on what happens to be most
popular ATM.

Cheers,
Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology

Simeon Warner

unread,
Apr 9, 2008, 10:27:24 AM4/9/08
to oai...@googlegroups.com
On Wed, Apr 09, 2008 at 06:57:44AM -0700, Mark Diggory wrote:
> > Likewise:
> > http://host/thing/aggregation - URI-A (can use content negotiation to
> > find URI-Rs)
> > http://host/thing/aggregation.xml - URI-R (atom)
> > http://host/thing/aggregation.rdf - URI-R (RDF/XML)
> > http://host/thing/aggregation.nt - URI-R (N-Triples format)
> > etc.
> >
> > (I am biased as I'd like to have .xml -> RDF/XML and .atom -> atom as
> > it is implemented on my system, but I think the mapping above is more
> > widespread/desired? if the general consensus is yes, I can swap them
> > around easily enough.)
>
> I think its bad to have file extensions identify "formats" even
> though its in the cool urls recommendations.

I don't think file extensions identify formats in the Cool URI
recommendations, they simply distinguish them as separate resources
with distinct URIs. The content/mime type (somewhat) identifies
the format.

> But, it appears to be a
> convention. I too, don't know what a *.rdf file would contain? n3?
> rdf-xml? nor do I know what a *.xml file would contain, atom? xmlrdf?
> rss? xhtml? These URI will more than likely also have content types
> associated with them when in html link elements (i.e. application/rdf
> +xml, application/rdf+n3) and that would provide disambiguation.
> Again I don't think its the spec responsibility to impose a
> convention in this area, but instead rely on what happens to be most
> popular ATM.

It .atom is the more normal for atom and .xml for rdfxml then we should
perhaps change to use them in examples. Will have to work out the appropriate
Google popularity contest queries... Beyond de facto reliance upon
extension by the two most popular computing platforms (macos and windows)
it really shouldn't matter ;-)

Cheers,
Simeon

Ben O'Steen

unread,
Apr 9, 2008, 11:44:19 AM4/9/08
to OAI-ORE

Just wanted to raise the point that the filename for a download is not
the same as the last part of the URL

i.e. The URL http://host/path/$item/aggregation.atom can result in a
download of "$item_resource_map.xml"

I am currently doing this for item downloads (not rmaps yet) by
setting a Content-Disposition header in the HTTP response, it is
automatically picked up by most browsers. (See http://www.ietf.org/rfc/rfc2183.txt)

e.g.

$ curl -I http://archive.sers.ox.ac.uk:5000/objects/ora:admin/datastreams/DC
HTTP/1.0 200 OK
Server: PasteWSGIServer/0.5 Python/2.5.1
Date: Wed, 09 Apr 2008 15:41:18 GMT
content-length: 278
content-disposition: attachment; filename="uuid_admin-DC.xml"
accept-ranges: bytes
last-modified: Wed, 09 Apr 2008 15:41:18 GMT
content-range: 0-277/278
etag: 1207755678.0-278
pragma: no-cache
cache-control: no-cache
content-type: application/xml
x-pingback: http://archive.sers.ox.ac.uk:5000/pingback


But as this has been mentioned, I think that I'll add it in :)

Ben O'Steen

unread,
Apr 9, 2008, 11:52:21 AM4/9/08
to OAI-ORE
Right, content-dispositions have been added :)

http://archive.sers.ox.ac.uk:5000/objects/ora:admin/aggregation.nt
will result in you downloading "admin_resource_map.nt.txt"

Mark Diggory

unread,
Apr 9, 2008, 3:14:59 PM4/9/08
to oai...@googlegroups.com

On Apr 9, 2008, at 8:44 AM, Ben O'Steen wrote:
>
>
> Just wanted to raise the point that the filename for a download is not
> the same as the last part of the URL
>
> i.e. The URL http://host/path/$item/aggregation.atom can result in a
> download of "$item_resource_map.xml"

I'm not really seeing anything in the Cool URI's notes about Content
Disposition. So I feel this is outside it.

http://www.w3.org/TR/cooluris/

Neither in LOD notes:

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial

>
> I am currently doing this for item downloads (not rmaps yet) by
> setting a Content-Disposition header in the HTTP response, it is
> automatically picked up by most browsers. (See http://www.ietf.org/
> rfc/rfc2183.txt)

I've actually been more focused on just getting the content-type
correct on my machine, which allows for Semantic Web Browsers to
properly retrieve the content, disposition immediately forces most
browsers into treating the content as a download, when that may not
be the appropriate action if its html, xml or even text.

Finally, what if, for instance, we are just talking about a static
set of files that someone wishes to place in an Apache server where
Content Disposition isn't something they can control? Thats probably
a significant portion of web publishing activity, seems like the
defaults should lean towards the suggestion of these actually being
representable as static resources that are relative to one another to
support all possible use cases.

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#staticRDF

Likewise, reusing existing LOD/SW technology, one might just want to
put their stuff into a triplestore and use something like pubby as an
implementation to deliver those representations.

http://www4.wiwiss.fu-berlin.de/pubby/

While a recommendation of setting content-disposition might seem
innocent at first, a proliferation of the approach might create
impedances when sharing ORE as LOD. I tried opening your link in
Firefox with Tabulator installed and it circumvented tabulator and
opened a download dialog instead.

Mark Diggory

unread,
Apr 9, 2008, 3:55:00 PM4/9/08
to oai...@googlegroups.com
I also see that

http://archive.sers.ox.ac.uk:5000/objects/ora:admin/relationships?
format=xml

sets the request content type = application/rdf+xml

while

http://archive.sers.ox.ac.uk:5000/objects/ora:admin/relationships?
format=n3

sets it to text/plain

if that were instead text/n3 then it would engage Tabulator.

But this nailing down of mimetypes is still all new work it
appears... digging in the dirt:
http://www.w3.org/2008/01/rdf-media-types
http://thor.roe.ac.uk/quaestor/mime.html

-Mark

MichaelNelson

unread,
Apr 9, 2008, 5:31:01 PM4/9/08
to OAI-ORE
Hi Mark,

>
> I think this whole emphasis on the "benefit of fragment identifiers"
> is a bit overrated, I see no tool/client in existence yet that takes
> a fragment identifier and appropriately returns just that fragment.
> With "#rem", its rather inefficient to pull aggregate content (0..*)
> not intended to be used to get at what is a small ResourceMap and
> think its not a good idea to design into the spec as mechanism. It
> should be left up to the implementer to decide if ReM and AG reside
> at different URI or at the same URI and identified with fragment
> id's. The spec should avoid locking implementations into solutions
> that are inefficient when the implementers are providing feedback
> that its an issue. I totally agree, the flexibility and granularity
> introduced of being able to seperate ones Resource Map from ones
> Aggregation far outweighs the cost of roundtriping to get the
> aggregation for a particular ResourceMap.

just a quick note to say that the approach of:

URI-A = foo.abc
URI-R = foo.abc#rem

is proposed to address only the scenario where you want 2 distinct
URIs and have only 1 serialization. it is proposed as an alternative
to:

URI-A = foo.abc#aggregation
URI-R = foo.abc

it comes from the combination of these 2 ideas:

1. always link/cite URI-A
2. make URI-A what someone would want to link/cite anyway

The #rem approach is nice b/c presumably people will never
accidentally link to URI-R when they meant to link to URI-A.
But in private discussions, Simeon has pointed out the #rem
approach doesn't have a nice migration path if someone
eventually adopts cool URIs:

URI-A = foo
URI-R = foo.abc, foo.xyz

In that case, "foo.abc" goes from URI-A to a URI-R,
and we can't just say:

foo.abc#aggregation owl:sameAs foo

to "fix" all instances of the old URI-A. (see section 3
of http://www.openarchives.org/ore/0.3/http).

I'm sympathetic to that part of the #aggregation approach,
but I also like how the #rem approach hides the
distinction from those that don't care (i.e., cite what is "normal"
to cite).

The #rem portion is not about actually specifying a
portion of an aggregation or rem, it is just a light-weight
trick to get 2 URIs from 1 "file".

regards,

Michael

Graham Triggs

unread,
Apr 9, 2008, 6:07:41 PM4/9/08
to OAI-ORE
On Apr 9, 12:09 pm, "Ben O'Steen" <bost...@gmail.com> wrote:
> Then the rate limiting processes are either the generation/
> serialisation of the map or the network transfer time to auto-discover
> a preferred resource map serialisation, and I would strongly lean
> towards the serialisation step being the limiter, at least on my
> implementation anyway:
>
> Request->net->Serialisation->net->receive ave time - 180ms for simple
> maps, 600ms for complex (300+ aggregates)
>
> Request->net->Content-negotiation->303 redirect->net->receive ave time
> - 20-45ms independant of map complexity

Yes, when you say "I have the URI of an aggregation, and I want to
retrieve it's ReM", then these issues are largely negligible. But
that's just the one, most basic scenario.

If I have already retrieved your ReM with 300 aggregates, what happens
when I want to understand the strucutre of your aggregation? Any one
of the resources you aggregate may itself be an aggregation, and there
is nothing (apart from the fragment identifier, and that - for these
purposes - is a hack) in the specification that allows you to
determine if any of those resources are aggregations. So, you *have*
to action every one of those URIs, just to determine if it is an
aggregation or not.

And when you do, as the specification currently doesn't state that the
discovery of the ReM has to return an ore:isDescribedBy predicate, you
have to GET all ReMs (there may be multiple that are returned,
depending on whether it is being aggregated elsewhere!) and parse them
to determine if any of the ReMs are describing that resource as an
aggregation.

The final nail in the coffin is that currently the specs suggest that
you could return a link: header for discovering ReMs, but it doesn't
require it. So you can't rely on doing a HEAD on each of those URIs.
So, to determine if you have a multi-level aggregation, you can only
rely on doing a GET on every single aggregated URI - there you are,
trying to determine the structure of an aggregation, and all of a
sudden you could be downloading a 2GB media file...

Your 600ms example can suddenly become 3 minutes+ just to understand
the structure of an aggregation. Under ideal conditions. It doesn't
take into account the effect on servers that may have large resources
being included in an aggregation. It doesn't include any real world
scenarios like network latency between remote locations. Or any of
these requests being queued by the server.

> I really don't think that the majority of resource map discover will
> be done in this way. The methods for listing resource maps are likely
> to be implemented in a 'flavoured' manner - i.e.  OAI-PMH ->
> metadataPrefix=ore_atom, metadataPrefix=ore_rdfxml, etc

That's a very different scenario - discovering what ReMs /
aggregations have been published. That probably isn't ever going to be
an issue, because you are making a number of known assumptions when
you attempt to discover them.

What I'm concerning myself with is the issues of actually using a
ReM / aggregation when you have discovered it.

Graham Triggs

unread,
Apr 9, 2008, 6:19:50 PM4/9/08
to OAI-ORE
On Apr 9, 2:57 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
> I think this whole emphasis on the "benefit of fragment identifiers"  
> is a bit overrated, I see no tool/client in existence yet that takes  
> a fragment identifier and appropriately returns just that fragment.

The benefit of fragment identifiers isn't that you can pull just a
fragment.

The benefit is that you can guess what that URI means, and you can
yield the URI of a ReM without having to do round trips.

If you look at the example I post elsewhere in this thread, that means
that in parsing just a single ReM / aggregation, you could be talking
about removing HUNDREDs of queries, potentially avoiding GETs on
arbitrarily large resources, and minimising your exposure to real
world networking delays.

Even if you are always pulling the full content of any ReMs, the above
is still a *gigantic* difference in real world practicality.

But the fragment identifiers are still a hack, particularly when you
try to infer everything you 'need' from them - hence why I'm leaning
towards the definition of a protocol that will allow precise
determination of this information, but built around the premise of
generating URLs in other protocols for actually retrieving artefacts
as necessary.

Graham Triggs

unread,
Apr 9, 2008, 6:40:13 PM4/9/08
to OAI-ORE
On Apr 9, 10:31 pm, MichaelNelson <RhodeWarri...@gmail.com> wrote:
> The #rem approach is nice b/c presumably people will never
>  accidentally link to URI-R when they meant to link to URI-A.
> But in private discussions, Simeon has pointed out the #rem
> approach doesn't have a nice migration path if someone
> eventually adopts cool URIs:

The big question is why should we care if it has a migration path?

We are laying out a specification that needs to work practically, and
so we can put whatever restrictions we need to into / onto that. If
things work in a prescribed way, then that's that - they won't be able
to (and won't need to) be migrated to anything else.

I actually don't see any need to have a URI-R that is different from a
URI-A. All you need is a URI-A, which will GET to multiple ReM
serialisations through content negotiation. You can then never have
accidental links to the wrong thing, and the issues of migration are
irrelevant.

But what you do need is to be able to determine a URI-A from any other
resource that may be aggregated. Otherwise, you have the problems of
understanding a ReM that I've already laid out.

MichaelNelson

unread,
Apr 9, 2008, 7:00:15 PM4/9/08
to OAI-ORE
Hi Graham,

Good to meet you @ OR08. Quick replies below.




>
> Of course, these discussions were all before the introduction of what
> is currently referred to as URI-P (although I've described above why I
> believe that should be removed as a term) - and whilst it's stated as
> an open issue in the specifications, it quite clearly isn't. If you
> cite one of those URIs, it has to yield or lead to a Resource Map for
> the context of the aggregation in exactly the same way as the URI-A.
> Otherwise, you have a meaningless dead end.
>

the nature of what is returned from URI-P (or /feed/entry/id in
Atom speak) is still TBD. It could be either a ReM or the AR itself.

I have an informal page regarding various approaches for URI-P
values:

http://www.cs.odu.edu/~mln/ore/2008-03-19/tuple.html

>
> But actually, the problem is worse than that. What happens if you try
> to walk an aggregation to learn it's structure? Well, the
> specification states that an aggregation as described in a resource
> map can only be one level deep. So, for a structured 'item', you break
> it into multiple aggregations, and build the structure by aggregating
> the URI-A as resources in other aggregations.
>
> What that means is that unless you can guarantee on the fragment
> identifier being used for all URI-As, then to understand/walk the
> structure of any aggregation, you have to action *every* aggregated
> resource (without a fragment identifier) to determine if it is an
> aggregation - and if it does lead to a resource map, do the same for
> that one, etc.
>

in Atom speak, we encourage (e.g., "SHOULD") the use of
/feed/entry/category to mark an AR as also being an A:

<category scheme="http://www.openarchives.org/ore/terms/"
term="http://www.openarchives.org/ore/terms/Aggregation"
label="Aggregation" />

similarly for RDF.

> -- Serializations --
>
> Now, if - as in my experience most people are requesting - that we
> have URI-As/URI-Ps that yield to URI-Rs without requiring an
> additional round trip, then we have to deal with the situation that we
> may [need to] offer multiple serializations of that resource map.
>
> Which brings us back to the basic principles of the ORE specification
> - leaning on the existing web infrastructure. Why can't we mandate
> having a single URI-R for a given aggregation, that will deliver the
> different serializations through content negotiation? Maintain a 1:1
> mapping; if you have different URI-Rs, you have different URI-As and
> therefore different aggregations - even if the assertions of those
> aggregations are the same.
>

We had that in an earlier version (between 0.2 and 0.3) and pursued
it at quite some length, but there was a general feeling that the
resulting documents & diagrams actually came out more
complicated than they are now.

regards,

Michael

> G

Ben O'Steen

unread,
Apr 9, 2008, 8:58:46 PM4/9/08
to OAI-ORE
N3 export has been given the response mimetype of text/rdf+n3 and
Tabulator now picks it up. e.g. http://archive.sers.ox.ac.uk:5000/objects/ora:admin/relationships?format=n3
or http://archive.sers.ox.ac.uk:5000/objects/ora:admin/aggregation.n3

On Apr 9, 8:55 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
> I also see that
>
> http://archive.sers.ox.ac.uk:5000/objects/ora:admin/relationships?
> format=xml
>
> sets the request content type = application/rdf+xml
>
> while
>
> http://archive.sers.ox.ac.uk:5000/objects/ora:admin/relationships?
> format=n3
>
> sets it to text/plain
>
> if that were instead text/n3 then it would engage Tabulator.
>
> But this nailing down of mimetypes is still all new work it
> appears... digging in the dirt:http://www.w3.org/2008/01/rdf-media-typeshttp://thor.roe.ac.uk/quaestor/mime.html
>
> -Mark
>
> On Apr 9, 2008, at 12:14 PM, Mark Diggory wrote:
>
>
>
> > On Apr 9, 2008, at 8:44 AM, Ben O'Steen wrote:
>
> >> Just wanted to raise the point that the filename for a download is
> >> not
> >> the same as the last part of the URL
>
> >> i.e. The URLhttp://host/path/$item/aggregation.atomcan result in a
> >> download of "$item_resource_map.xml"
>
> > I'm not really seeing anything in the Cool URI's notes about Content
> > Disposition. So I feel this is outside it.
>
> >http://www.w3.org/TR/cooluris/
>
> > Neither in LOD notes:
>
> >http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial
>
> >> I am currently doing this for item downloads (not rmaps yet) by
> >> setting a Content-Disposition header in the HTTP response, it is
> >> automatically picked up by most browsers. (Seehttp://www.ietf.org/

Ben O'Steen

unread,
Apr 9, 2008, 9:45:55 PM4/9/08
to OAI-ORE


On Apr 9, 11:07 pm, Graham Triggs <grahamtri...@gmail.com> wrote:
> If I have already retrieved your ReM with 300 aggregates, what happens
> when I want to understand the strucutre of your aggregation?

From following the specs, I ended up with RDF that helped my
understand the nature of an aggregation - The "URI-AR rdf:type
ore:Aggregation" triple asserts that a given resource is an
Aggregation.
<rdf:Description about="http://archive.sers.ox.ac.uk:5000/objects/uuid
%3A33de1615-0b12-434f-8836-7f7fa7b8b576/datastreams/MODS">
<dc:format>text/xml</dc:format>
<dc:title>MODS v3.2 Metadata</dc:title>
<dcterms:modified>2008-03-19T15:41:54.586Z</dcterms:modified>
</rdf:Description>
<rdf:Description about="http://archive.sers.ox.ac.uk:5000/objects/uuid:
006b236b-88f6-45fd-957c-bffb19a11370/aggregation">
<dc:title>18. Chapitre XIV: De l'action des glaciers sur leur
fond.</dc:title>
<rdf:type rdf:resource="http://www.openarchives.org/ore/terms/
Aggregation"/>
</rdf:Description>
> So, you *have*
> to action every one of those URIs, just to determine if it is an
> aggregation or not.

Maybe to prove that a given URI is truly an aggregation, but not to
determine that. (And using the Atom serialisation, an <entry>'s
<category> element provides a similar assertion)

> The final nail in the coffin is that currently the specs suggest that
> you could return a link: header for discovering ReMs, but it doesn't
> require it. So you can't rely on doing a HEAD on each of those URIs.

I agree - for many web frameworks, it should be a reasonably easy
thing to add. We'd need to tackle the subject of 'static' sites
though, those relying solely on a server such as Apache httpd to serve
pre-made resource maps.

> So, to determine if you have a multi-level aggregation, you can only
> rely on doing a GET on every single aggregated URI - there you are,
> trying to determine the structure of an aggregation, and all of a
> sudden you could be downloading a 2GB media file...

I'm not sure this is a realistic situation, due to the use of
dc:format, and rdf:type assertions on a per URI-AR basis. I am not
sure though where these properties lie in the MAY, SHOULD and MUST
scale of the spec, but I'd like to say that in using resource maps
practically, I have found these properties (and dcterms:conformsTo)
invaluable to be able to use a given resource map 'blind'.
>
> Your 600ms example can suddenly become 3 minutes+ just to understand
> the structure of an aggregation. Under ideal conditions.

Let me put some figures here to aid discussion (I am not arguing with
your point as you will see!)

The largest, most complex 'item' I have, is a scanned book. It is
broken down into a book obj -> chapters objs -> page objs, with each
object containing metadata, some additional RDF information and the
page objects contain the thumbnails,images, etc. All resource maps are
generated on the fly, in a high-level fashion and could be optimised
and cached significantly.

The average times taken to generate an in-memory graph of the entire
book on an external machine (~400 rmaps) - (all levels and all
aggregations derefereced) are as follows:

All resourcemaps found via content-negotiations: 135s
All resourcemaps found via a 'hack' (frag id or add .xml to URIA):
100-110s (75-85% of pure content-negotiation method)

Server - a VM on a 1.8GHz dual core host (11 other servers hosted) 1Gb
RAM, running Ubuntu JeOS 7.10, Pylons, a python web MVC framework
Harvester - laptop 1.6 Centrino Duo, 1Gb RAM on a home ADSL link,
harvester written in python using high-level processing (parsing the
maps into a single in-memory rdf graph object and writing it to disk.)

So for pure content-neg. 15-20% of the time is just a roundtrip to
find the URIR from the URIA, I'm estimating 20% to 30% being the
transfer time for the actual map, leaving the remaining 50% (~1min in
the example) due to serialisation.

So, from this, a harvester working on a server holding pre-fabricated
resource maps would really benefit from some kind of 'trick' or
convention for finding a particular URIR from the URIA.

Looking at it from a triples harvested per second POV, 15,197 triples
were pulled in the 135s harvest (110s with trick). This works out to
be just 112 triples a second. (NB By talking about rates, we are
verging into problems with my implementation and not with the
standard.)

I guess it comes down to whether there would be a compromise: keep the
simplicity of a single level rmap, have an extension for varying the
depth of a serialisation, or finally at the other extreme, include
everything in the top-level rmap. I'll need to do some testing to see
what the performance gain is to building the larger maps server-side,
before a single net transfer to the client.

Mark Diggory

unread,
Apr 9, 2008, 10:15:01 PM4/9/08
to oai...@googlegroups.com

On Apr 9, 2008, at 3:40 PM, Graham Triggs wrote:
>
> On Apr 9, 10:31 pm, MichaelNelson <RhodeWarri...@gmail.com> wrote:
>> The #rem approach is nice b/c presumably people will never
>> accidentally link to URI-R when they meant to link to URI-A.
>> But in private discussions, Simeon has pointed out the #rem
>> approach doesn't have a nice migration path if someone
>> eventually adopts cool URIs:
>
> I actually don't see any need to have a URI-R that is different from a
> URI-A. All you need is a URI-A, which will GET to multiple ReM
> serialisations through content negotiation. You can then never have
> accidental links to the wrong thing, and the issues of migration are
> irrelevant.

Starting to agree, See below, seems the use for a different uri (the
fragment that is) is just to separate the subjects in the instance.
clearly you can't expect every service out there to provide a content
negotiated rdf description of its resources to test if
"ore:aggregates </bitstream/1721.1/34155/1/69018697.pdf>;" is another
aggregation without risking the chance of pulling that resource in
its entirety.

> But what you do need is to be able to determine a URI-A from any other
> resource that may be aggregated. Otherwise, you have the problems of
> understanding a ReM that I've already laid out.


I do feel that is correct, and I'll rephrase and you can tell me if I
got it right, "Its the model that should tell you if the target of
an ore:aggregates is another aggregation or an actual resource,
right now its in the URI and not the model that this is determined
by". Some of us don't like this being in the URI because its
restrictive and we think URI should be rather opaque to the spec.

If this is the case, having some thing like ore:hasAggregation in the
Aggregation would enable this in the model where it can be enforced
irrespective of what is in the URI. ore:hasAggregation would be a
subproperty of dcterms:hasPart specifically referring to antoher
ore:Aggregation whose inverse would be ore:isAggregatedBy, while
ore:aggregates would be a SubProperty of dcterms:hasPart specifically
referring to non-aggregations and may not actually have an inverse

@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.

</handle/1721.1/34155#WHATEVER>
dcterms:created
"2008-02-28";
dcterms:modified
"2008-03-10";
ore:describes
</handle/1721.1/34155>;
a ore:ResourceMap.

</handle/1721.1/34155>
ore:aggregates
<-------------------------- points at a resource.
</bitstream/1721.1/34155/1/69018697.pdf>;
ore:isAggregatedBy
</handle/1721.1/7629>;
ore:hasAggregation
<------------------------- new property points at another aggregation.
</handle/XXX/YYY>;
a ore:Aggregation.


This is much more useful in RDF (and enforceable and maintainable in
the actual ontology) than expecting a specific structure to the URI.

-Mark

Mark Diggory

unread,
Apr 9, 2008, 11:13:45 PM4/9/08
to oai...@googlegroups.com
On Apr 9, 2008, at 4:00 PM, MichaelNelson wrote:

>> What that means is that unless you can guarantee on the fragment
>> identifier being used for all URI-As, then to understand/walk the
>> structure of any aggregation, you have to action *every* aggregated
>> resource (without a fragment identifier) to determine if it is an
>> aggregation - and if it does lead to a resource map, do the same for
>> that one, etc.
>>
>
> in Atom speak, we encourage (e.g., "SHOULD") the use of
> /feed/entry/category to mark an AR as also being an A:
>
> <category scheme="http://www.openarchives.org/ore/terms/"
> term="http://www.openarchives.org/ore/terms/Aggregation"
> label="Aggregation" />
>
> similarly for RDF.

But is there a predicate for this? Please see my previous email, the
way I would see to do this in RDF would be to define a new predicate
for Aggregations rather than expecting special structure in the URI.


>
>> -- Serializations --
>>
>> Now, if - as in my experience most people are requesting - that we
>> have URI-As/URI-Ps that yield to URI-Rs without requiring an
>> additional round trip, then we have to deal with the situation
>> that we
>> may [need to] offer multiple serializations of that resource map.
>>
>> Which brings us back to the basic principles of the ORE specification
>> - leaning on the existing web infrastructure. Why can't we mandate
>> having a single URI-R for a given aggregation, that will deliver the
>> different serializations through content negotiation? Maintain a 1:1
>> mapping; if you have different URI-Rs, you have different URI-As and
>> therefore different aggregations - even if the assertions of those
>> aggregations are the same.
>>
>
> We had that in an earlier version (between 0.2 and 0.3) and pursued
> it at quite some length, but there was a general feeling that the
> resulting documents & diagrams actually came out more
> complicated than they are now.

And I still think trying to mandate a mechanism; Cool URI, content
negotiation, fragment, whatever, will be too restrictive in certain
domains (especially in the SW/LOD) and be detrimental to ore having
any traction there. If it can't be easily applied there, I'll
certainly be looking for something else because my interest is in get
DSpace functional in the domain of SW/LOD and not necessarily getting
DSpace functional on ORE. ORE is currently just a "means to an end"
in my little SW/LOD project.

MichaelNelson

unread,
Apr 10, 2008, 12:16:30 AM4/10/08
to OAI-ORE

>
> The big question is why should we care if it has a migration path?

Some folks care, some don't and some aren't sure. That's why
it is still an open issue.

>
> I actually don't see any need to have a URI-R that is different from a
> URI-A. All you need is a URI-A, which will GET to multiple ReM
> serialisations through content negotiation. You can then never have
> accidental links to the wrong thing, and the issues of migration are
> irrelevant.

Those ReM representations are rarely anonymous; they almost always
have distinct values in the Content-Location http response header.

But separate URIs for URI-A and URI-R is a data model issue
(cf. http://www.openarchives.org/ore/0.3/datamodel#Resource_Map).
There are other people more qualified than I to defend that point

regards,

Michael

MichaelNelson

unread,
Apr 10, 2008, 12:21:40 AM4/10/08
to OAI-ORE

>
> But is there a predicate for this? Please see my previous email, the
> way I would see to do this in RDF would be to define a new predicate
> for Aggregations rather than expecting special structure in the URI.
>
>

maybe I'm not understanding the question? but quoting from table 1
of:
http://www.openarchives.org/ore/0.3/atom#ORE_ATOM

The Aggregated Resource that itself is an Aggregation,
indicated by an rdf:type for the Aggregated Resource with a value
of ore:Aggregation and/or a ore:isDescribedBy relationship from
the Aggregated Resource to a Resource Map.


does this answer your question?

regards,

Michael

Mark Diggory

unread,
Apr 10, 2008, 2:47:34 AM4/10/08
to oai...@googlegroups.com

On Apr 9, 2008, at 9:21 PM, MichaelNelson wrote:
>
>
>>
>> But is there a predicate for this? Please see my previous email, the
>> way I would see to do this in RDF would be to define a new predicate
>> for Aggregations rather than expecting special structure in the URI.
>>
>>
>
> maybe I'm not understanding the question? but quoting from table 1
> of:
> http://www.openarchives.org/ore/0.3/atom#ORE_ATOM
>
> The Aggregated Resource that itself is an Aggregation,
> indicated by an rdf:type for the Aggregated Resource with a value
> of ore:Aggregation and/or a ore:isDescribedBy relationship from
> the Aggregated Resource to a Resource Map.
>
>
> does this answer your question?

Close, but the answer misses the nuance of the question. The point
is that if the object of the following statement is described in
another rdf instance) then to figure out if

</handle/1721.1/34155> ore:aggregates </bitstream/
1721.1/34155/1/69018697>;

references a resource or another aggregation, it has to be traversed
and discovered that there is and ore:Aggregation at the other end.

To get around this issue, it appears the 0.2 ore syntax document
(http://www.openarchives.org/ore/0.2/rdfsyntax) suggests the use of
the fragment "#aggregation" to identify that it is an aggreagtion is
the resource being referenced in ore:aggregates.

<http://dlib.org/dlib/february06/smith/02smith/rem/>
ore:describes
<http://dlib.org/dlib/february06/smith/02smith/rem/
#aggregation>

where

<http://dlib.org/dlib/february06/smith/02smith/rem/#aggregation>
ore:aggregates
<http://dlib.org/dlib/february06/smith/02smith.html>


The 0.3 rdfsyntax document continues on this track, However, the 0.3
data model document (http://www.openarchives.org/ore/0.3/datamodel)
uses URI that are not fragments and instead attempts to represent the
URI as Cool URI

<http://dlib.org/dlib/february06/smith/02smith/aggregation/rem.xml>
ore:describes
<http://dlib.org/dlib/february06/smith/02smith/
aggregation>

where

<http://dlib.org/dlib/february06/smith/02smith/aggregation>
ore:aggregates
<http://dlib.org/dlib/february06/smith/02smith.html>

Thus there is a discontinuity and its unclear what to expect in the
structure of the URI if an aggregation aggregates is another
aggregation.

<http://dlib.org/dlib/february06/smith/02smith/rem/#aggregation>
ore:aggregates
<http://foo.com/thing#aggregation>

or

<http://dlib.org/dlib/february06/smith/02smith/aggregation>
ore:aggregates
<http://foo.com/thing/aggregation>

And which suggests its more appropriate that URI (ReM, A, RA) are all
Opaque (which I consider a good thing). Thus, in the model, there is
no way to tell if what is aggregated at the other end is another
aggregation or a resource specifically when that Aggregation or
resource is not described locally and resides outside the current
instance of RDF. If I get it right, Graham is suggesting that it
would be beneficial to have a mechanism by which to identify if the
object of an ore:aggregates is a another aggregation or resource
within the "Aggregation" itself so the client does not have to incur
the cost of downloading the resource when attempting to resolve the
uri to determine if it is an aggregate or not. I'm suggesting
separate predicates for ,<#aggregation> ore:hasAggregation <some-
aggregation> vs <#aggregation> ore:aggregates <some-resource> would
be a solution which would be enforceable in the model and more easily
inform the client as to the nature of the object of the statement
without having to traverse and retrieve that object physically.

Mark

Graham Triggs

unread,
Apr 10, 2008, 7:36:32 AM4/10/08
to OAI-ORE
On Apr 10, 12:00 am, MichaelNelson <RhodeWarri...@gmail.com> wrote:
> Good to meet you @ OR08. Quick replies below.

you too.

> the nature of what is returned from URI-P (or /feed/entry/id in
> Atom speak) is still TBD. It could be either a ReM or the AR itself.
>
> I have an informal page regarding various approaches for URI-P
> values:
>
> http://www.cs.odu.edu/~mln/ore/2008-03-19/tuple.html

Except in certain content negotiation circumstances, the URI-P
absolutely cannot return the AR itself.

The point of the URI-P is to stand in for a combination of semantic
information - a specific context for the resource. If you cite the URI-
P elsewhere (which is what you are meant to do with it), then you
*must* be able to get back to that context. That's not optional, or an
open issue, it's a cold hard fact of using a URI-P.

Does URI-P lead to a ReM, or return a 'proxy document' from which you
can derive the semantic information? Maybe it can return the AR *if*
you don't content negotiate for the semantic information. But being
able to get to that semantic information in a prescribed manner from a
URI-P is an absolute given, otherwise the URI-P cannot do the job for
which it is intended.

> in Atom speak, we encourage (e.g., "SHOULD") the use of
> /feed/entry/category to mark an AR as also being an A:
>
> <category scheme="http://www.openarchives.org/ore/terms/"
> term="http://www.openarchives.org/ore/terms/Aggregation"
> label="Aggregation" />
>
> similarly for RDF.

That might work. But SHOULD is too weak. If clients can't rely on it
being used [correctly], then they have to ignore it and try hitting
everything anyway. Even if you said the presence of one such assertion
is enough to be reasonably confident that the ReM is using it
consistently and correctly, what do you do if it doesn't appear at
all? Can you trust that there really are no aggregations being
aggregated, or does it just mean that the serialization hasn't
included the assertion because it didn't need to.

But then it's also difficult to validate, and an easy to miss part of
the specification - so even if the spec mandated it there is a
reasonable chance that it might be missed and people may generate out
of spec ReMs.

Oh the dilemmas ;)

> > Which brings us back to the basic principles of the ORE specification
> > - leaning on the existing web infrastructure. Why can't we mandate
> > having a single URI-R for a given aggregation, that will deliver the
> > different serializations through content negotiation? Maintain a 1:1
> > mapping; if you have different URI-Rs, you have different URI-As and
> > therefore different aggregations - even if the assertions of those
> > aggregations are the same.
>
> We had that in an earlier version (between 0.2 and 0.3) and pursued
> it at quite some length, but there was a general feeling that the
> resulting documents & diagrams actually came out more
> complicated than they are now.

Does that documentation still exist? It would be interesting to see
how it came out. I personally find it hard to believe that it would be
more complicated than the current situation of god knows how many
different URI- types, and confusion over what to link to.

I know it's a cheap shot, but how many slides were shown at the
presentation that were citing the wrong url? ;)

MichaelNelson

unread,
Apr 10, 2008, 10:59:32 AM4/10/08
to OAI-ORE

> Close, but the answer misses the nuance of the question. The point
> is that if the object of the following statement is described in
> another rdf instance) then to figure out if
>
> </handle/1721.1/34155> ore:aggregates </bitstream/
> 1721.1/34155/1/69018697>;
>
> references a resource or another aggregation, it has to be traversed
> and discovered that there is and ore:Aggregation at the other end.
>

yes, if that is the only piece of RDF you have, you need to
deref to see what is on the other end.

> To get around this issue, it appears the 0.2 ore syntax document
> (http://www.openarchives.org/ore/0.2/rdfsyntax) suggests the use of
> the fragment "#aggregation" to identify that it is an aggreagtion is
> the resource being referenced in ore:aggregates.
>

you have to be careful here: #aggregation is just a convention
to cheaply get 2 URIs out of 1 file. Its presence doesn't
really mean that the thing on the other end is an aggregation.
This is just a limitation of the web -- there is no way to *really*
know what's on the other end w/o getting it. You can generally
trust people to be honest about mime types, follow conventions, etc.,
but its not enforceable.

My favorite example is that every YouTube video page has this in its
head:

<link rel="alternate" type="application/rss+xml" title="YouTube -
[RSS]" href="/rssls">

but if you deref http://www.youtube.com/rssls you always get an HTML
file ;-)

>
> The 0.3 rdfsyntax document continues on this track, However, the 0.3

note: the RDF document has not yet been updated for 0.3 -- it is just
the 0.2 document pushed forward.

> Thus there is a discontinuity and its unclear what to expect in the
> structure of the URI if an aggregation aggregates is another
> aggregation.
>
> <http://dlib.org/dlib/february06/smith/02smith/rem/#aggregation>
> ore:aggregates
> <http://foo.com/thing#aggregation>
>
> or
>
> <http://dlib.org/dlib/february06/smith/02smith/aggregation>
> ore:aggregates
> <http://foo.com/thing/aggregation>
>
> And which suggests its more appropriate that URI (ReM, A, RA) are all
> Opaque (which I consider a good thing). Thus, in the model, there is
> no way to tell if what is aggregated at the other end is another
> aggregation or a resource specifically when that Aggregation or
> resource is not described locally and resides outside the current
> instance of RDF.

but you just check the big triple store in the sky, right? ;-)
or use Atom? ;-)

>If I get it right, Graham is suggesting that it
> would be beneficial to have a mechanism by which to identify if the
> object of an ore:aggregates is a another aggregation or resource
> within the "Aggregation" itself so the client does not have to incur
> the cost of downloading the resource when attempting to resolve the
> uri to determine if it is an aggregate or not. I'm suggesting
> separate predicates for ,<#aggregation> ore:hasAggregation <some-
> aggregation> vs <#aggregation> ore:aggregates <some-resource> would

how about ore:aggregatesAggregation ?
mnemonic: it is much longer than ore:aggregates, meaning
there is still more to get ;-)

noted: the suggestion of a different predicate to indicate aggregating
an Aggregation as a stronger hint regarding the nature of what
is on the other end (esp. wrt to the bare naked RDF example).

I don't know what the RDF experts will say. In Atom it doesn't
really matter since they'll both be link[@rel="alternate"] and
it will depend on the presence/absence of the appropriate
/feed/entry/category.

regards,

Michael

MichaelNelson

unread,
Apr 10, 2008, 11:20:27 AM4/10/08
to OAI-ORE

> Except in certain content negotiation circumstances, the URI-P
> absolutely cannot return the AR itself.
>
> The point of the URI-P is to stand in for a combination of semantic
> information - a specific context for the resource. If you cite the URI-
> P elsewhere (which is what you are meant to do with it), then you
> *must* be able to get back to that context. That's not optional, or an
> open issue, it's a cold hard fact of using a URI-P.
>
> Does URI-P lead to a ReM, or return a 'proxy document' from which you
> can derive the semantic information? Maybe it can return the AR *if*
> you don't content negotiate for the semantic information. But being
> able to get to that semantic information in a prescribed manner from a
> URI-P is an absolute given, otherwise the URI-P cannot do the job for
> which it is intended.

it should be noted that the whole URI-A/URI-R issue is repeated
at the level of the Proxy. In Atom speak, you have

URI-P = /feed/entry/id
URI-PD = /feed/entry/link[@rel="self"]

It is clear what /feed/entry/link[@rel="self"] should return, but
that does not necessarily constrain what URI-P returns.

Yes, the word "registry" should send shivers down your spine
(it does mine). But grant it now for the sake of argument. If we
had one of (taking from: http://www.cs.odu.edu/~mln/ore/2008-03-19/tuple.html;
other variations possible):

URI-P = http://purl.org/ore/URI-A/*/URI-AR-1
URI-P = http://purl.org/ore/12/URI-AR-1
URI-P = http://purl.org/ore/2lja89

Where they redirected to URI-AR-1 and provided URI-A in
an http link response header, then that would be rather useful.

>
> > in Atom speak, we encourage (e.g., "SHOULD") the use of
> > /feed/entry/category to mark an AR as also being an A:
>
> > <category scheme="http://www.openarchives.org/ore/terms/"
> > term="http://www.openarchives.org/ore/terms/Aggregation"
> > label="Aggregation" />
>
> > similarly for RDF.
>
> That might work. But SHOULD is too weak. If clients can't rely on it
> being used [correctly], then they have to ignore it and try hitting
> everything anyway. Even if you said the presence of one such assertion
> is enough to be reasonably confident that the ReM is using it
> consistently and correctly, what do you do if it doesn't appear at
> all? Can you trust that there really are no aggregations being
> aggregated, or does it just mean that the serialization hasn't
> included the assertion because it didn't need to.
>
> But then it's also difficult to validate, and an easy to miss part of
> the specification - so even if the spec mandated it there is a
> reasonable chance that it might be missed and people may generate out
> of spec ReMs.
>
> Oh the dilemmas ;)

yes, that's the problem. It could be a MUST, but it would be
quite costly to validate (i.e., it would require the scenario
you're trying to avoid). Hopefully if one reaches the point
where they're aggregating aggregations, they'll understand
why they should be marked as such ;-)

>
> > > Which brings us back to the basic principles of the ORE specification
> > > - leaning on the existing web infrastructure. Why can't we mandate
> > > having a single URI-R for a given aggregation, that will deliver the
> > > different serializations through content negotiation? Maintain a 1:1
> > > mapping; if you have different URI-Rs, you have different URI-As and
> > > therefore different aggregations - even if the assertions of those
> > > aggregations are the same.
>
> > We had that in an earlier version (between 0.2 and 0.3) and pursued
> > it at quite some length, but there was a general feeling that the
> > resulting documents & diagrams actually came out more
> > complicated than they are now.
>
> Does that documentation still exist? It would be interesting to see
> how it came out. I personally find it hard to believe that it would be
> more complicated than the current situation of god knows how many
> different URI- types, and confusion over what to link to.

I'm not sure what persists -- I wasn't the note taker at those
meetings.

regards,

Michael

Graham Triggs

unread,
Apr 10, 2008, 11:40:34 AM4/10/08
to OAI-ORE
On Apr 10, 3:59 pm, MichaelNelson <RhodeWarri...@gmail.com> wrote:
> I don't know what the RDF experts will say. In Atom it doesn't
> really matter since they'll both be link[@rel="alternate"] and
> it will depend on the presence/absence of the appropriate
> /feed/entry/category.

I don't care what the RDF experts will say ;)

Well, we have to have legal documents. But there are legal documents
and philosophically pure ones. Pragmatism and the practicality of
implementation has to weighed against it.

For example, IF the main concerns for URI-R = URI-A were purist ones
based on the possibility to assert nonsense triples, is that more
important than the practical issues of people not knowing which URI
they should be linking to?

Graham Triggs

unread,
Apr 10, 2008, 11:55:13 AM4/10/08
to OAI-ORE
On Apr 10, 4:20 pm, MichaelNelson <RhodeWarri...@gmail.com> wrote:
> yes, that's the problem. It could be a MUST, but it would be
> quite costly to validate (i.e., it would require the scenario
> you're trying to avoid). Hopefully if one reaches the point
> where they're aggregating aggregations, they'll understand
> why they should be marked as such ;-)

You don't always have to do a full validation, and you have a certain
amount of choice and forewarning as to when things are validated. (But
then it's also why a custom protocol definition isn't a bad idea -
it's easy to validate and enforce, and just as informative)

If the specification doesn't take a strong stance on most of these
issues, then it will effectively be unusable for clients in moderately
complicated cases.

Strong specifications allow clients to trust the information they are
given - which doesn't just help the clients, it lessens the impact on
the internet and servers too. Of course, it won't mean everything will
work 100% of the time due to people mis-implementing the specs - but
you'll be able to point the blame.

I'm sure most people will understand why aggregated aggregations
should be marked, but unless we beat everyone over the head with it,
those that have done it correctly are going to suffer from everyone
else's incompetence.

G

Mark Diggory

unread,
Apr 10, 2008, 3:12:00 PM4/10/08
to oai...@googlegroups.com

On Apr 10, 2008, at 8:55 AM, Graham Triggs wrote:
>
> On Apr 10, 4:20 pm, MichaelNelson <RhodeWarri...@gmail.com> wrote:
>> yes, that's the problem. It could be a MUST, but it would be
>> quite costly to validate (i.e., it would require the scenario
>> you're trying to avoid). Hopefully if one reaches the point
>> where they're aggregating aggregations, they'll understand
>> why they should be marked as such ;-)
>
> You don't always have to do a full validation, and you have a certain
> amount of choice and forewarning as to when things are validated. (But
> then it's also why a custom protocol definition isn't a bad idea -
> it's easy to validate and enforce, and just as informative)

I concerned this is where we deviate in opinion. I'm not really
getting what a custom protocol would be and why one would need it if
the aggregated aggregations were properly predicated "differently"
than the aggregated resources in RDF, just like they can be in in
Atom with the usage of a category (which is really just a refinement
anyways? Why does one need a whole "protocol" when it can be captured
just fine in an RDF predicate? On top of this, its very analogous to
what is happening in Atom and the mapping is clear.

> If the specification doesn't take a strong stance on most of these
> issues, then it will effectively be unusable for clients in moderately
> complicated cases.

Really depends on the clients... RDF is a fairly liberal model when
you follow best practices.

> Strong specifications allow clients to trust the information they are
> given - which doesn't just help the clients, it lessens the impact on
> the internet and servers too. Of course, it won't mean everything will
> work 100% of the time due to people mis-implementing the specs - but
> you'll be able to point the blame.

think your preaching to the choir here...

> I'm sure most people will understand why aggregated aggregations

> should be marked, <!-- Rant removed by Rant Filter here -->

Sure.

-Mark

Graham Triggs

unread,
Apr 10, 2008, 7:57:26 PM4/10/08
to OAI-ORE
On Apr 10, 8:12 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
> I concerned this is where we deviate in opinion. I'm not really  
> getting what a custom protocol would be and why one would need it if  
> the aggregated aggregations were properly predicated "differently"  
> than the aggregated resources in RDF, just like they can be in in  
> Atom with the usage of a category (which is really just a refinement  
> anyways? Why does one need a whole "protocol" when it can be captured  
> just fine in an RDF predicate? On top of this, its very analogous to  
> what is happening in Atom and the mapping is clear.

OK, this could be due to me reading too many resources recently that
describe protocols, but what I should really be calling a URI scheme.
So, I'm suggesting a custom scheme along the lines of:

ore:aggregation;http://server.com/resource

(rather than http://server.com/resource#aggregation)

You can easily derive the URI-R (http://server.com/resource), and
unlike, #aggregation, it's precise and reliable that the URI is
identifying an aggregation.

And no, you don't *need* to encode this information into the URI when
you can capture it in the predicates - but you do need the
specification to state that you MUST use them.

Robert Sanderson

unread,
Apr 11, 2008, 5:19:45 PM4/11/08
to oai...@googlegroups.com
Okay, to throw in my pennies...

___Proxies.

We need proxies. Proxies must have the semantics of "Resource R in Aggregation A". 

In RDF this means we need a new URI because it's a new resource. It is not Resource R. I hope that the 'follows' use case is convincing:  R can appear in multiple aggregations, and in each it follows a different resource, therefore 'follows' must be refering to R in A, therefore we need a way to say (R in A) follows (R2 in A). Only URIs can appear in the subject and object slot of triples, therefore (R in A) needs a URI.
We also don't need new predicates due to this.

The more interesting part is what you get if you dereference URI-P.  There has been some discussion but no decision, as far as I know.  One possibility is to return an atom:entry document. Or it could return a chunk of RDF concerning the resource which it is a proxy for. Or, it could return the resource itself.  I think this is another good slot for content negotiation, personally.

We could say that URI-P MUST be dereferencable and MUST return X, Y and/or Z representations.  But that's not going to go down well with the vast majority of people, as it means they have to maintain a lot of new actionable URIs.  RDF doesn't care what the URIs in the triples look like, and as much as we all hate the big URI registry in the sky requirement, that's what RDF seems to want and in this case I'm not sure it's a bad thing.
(gasp!) The reason being that 99% of the time, we don't need URI-P at all.  The vast majority of the time, you would only want to reference the URI for the base resource.  Not the URI for the resource as it appears in a particular aggregation.

So... we don't have a URI-AR.  We could rename URI-P to URI-AR, but I don't see any advantage other than confusion with URI-A and URI-R.

___URI-A vs URI-R

I don't think you actually mean that URI-A can be exactly equal to URI-R.  They identify very different things, as is shown when you look at created timestamp for the objects they identify.

___Resolving and Aggregated Aggregations

Firstly, practically every RDF triple by itself is worthless.  It's only once they become part of a graph with common predicates that they become useful.  So to say that X Y Z means you need to dereference Z to see if it's a member of class A is disingenious.  Unless you know what X is and what Y means, you'd also have to dereference them and hope that Y resolves to some OWL that describes its semantics.
If you have the statement A aggregates B, then unless the resource map is not very well written, you will also have the statement B rdf:type ore:Aggregation. Problem solved.

Secondly, it's possible to do the METS approach. Put everything in one big bag (a single aggregation) and have some more information which describes the internal structure.  There's lots of fun predicates like hasPart, hasFormat, and so on in order to establish more complicated internal structure rather than having to split everything up into millions of itsy-bitsy Aggregations/Resource Maps.  What you can't do is then refer to the structure as a separate entity. If you need to do that, then you need to assign it a URI to identify it.  If it has a URI, it should be dereferencable. And ta da, the split between internal structure (not referencable) and nested Aggregations (referencable).
 
Finally on this topic, I'm not sure I see the need for a specialisation of ore:aggregates.  If we distinguish aggregations as special, what else might we want to distinguish? ore:aggregatesText vs ore:aggregatesImage?  ore:aggregatesMetadata vs ore:aggregatesResource? and so on.  If you need to know what the aggregated resource is, then there should be further descriptive information available in the resource map.

Hope that helps :)

Rob

Mark Diggory

unread,
Apr 11, 2008, 6:30:16 PM4/11/08
to oai...@googlegroups.com
I'll bite!

On Apr 11, 2008, at 2:19 PM, Robert Sanderson wrote:
> Okay, to throw in my pennies...
>
> ___Proxies.
>
> We need proxies. Proxies must have the semantics of "Resource R in
> Aggregation A".
>
> In RDF this means we need a new URI because it's a new resource. It
> is not Resource R. I hope that the 'follows' use case is
> convincing: R can appear in multiple aggregations, and in each it
> follows a different resource, therefore 'follows' must be refering
> to R in A, therefore we need a way to say (R in A) follows (R2 in
> A). Only URIs can appear in the subject and object slot of triples,
> therefore (R in A) needs a URI.
> We also don't need new predicates due to this.
>
> The more interesting part is what you get if you dereference URI-

> P. There has been some discussion but no decision, as far as I


An aside, I say thanks for putting your response out there, please
understand my approach is always that a healthy debate assists in
coming to consensus, and if not, at least common understanding of
differing viewpoints ;-)

Just entering into this party in the last month or so, I'm stuck
having to catchup with everything thats gone on up to this date,
unfortunately there is no transparent source for this body of
research as most of it seems to have gone on behind closed doors up
to this point. But, those who know me, know transparency is my
mantra. So, I'll drop that thread, and I will just lay down my
position on the subject of ORE's model in RDF and leave it at that.

1.) I think a specification defining URI that are not "opaque" is
overly restrictive and introduces a significant overhead in
complexity that developers will have to implement on top of the
existing RDF tool set. Its unclear when we are talking about URI-R
and URI-A, URI-AR and URI-P are we talking about opaque URI strings?
I think the spec should be blatantly and verbosely clear about this
and it should take the stance that URI are always opaque outside the
purview of the spec. Is this true? This includes not restricting
whether a URI is "internal structure (not referencable) or nested
Aggregations (referencable)".

2.) If its not the case that URI are opaque, I make the argument that
new RDF predicates are cheaper and more easily enforced in the model
and require much less development effort to work with than a wholly
custom URI syntax. Placing semantics in the URI makes it necessary to
require tooling to parse and interpret the URI, which makes the
information lost to generic SW/LOD clients and tooling like RDF
triplestores/SPARQL etc... this is very very very very very horribly
bad IMO. It will totally hinder the uptake of ORE and make it very
poorly mismatched with existing RDF/SW tooling. IMO semantics should
be reserved for the model (RDFS, OWL, Atom, whatever) and for the
record that does not necessarily include "internal structure (not
referencable) and nested Aggregations (referencable)". Why do you
consider have more predicates a "bad thing" when its where the
"rubber meets the road" in terms of modeling in RDF? Why create such
a custom and complex layer on top of something that can already model
it when applied correctly?

Finally, the point that we make for having a "distinction by
predicate" for an aggregated resource vs an aggregated aggregation is
strictly to provide for that functionality in the ORE model, not to
introduce the idea that "anyone can introduce their own distinctions"
in the model. And the idea is proposed because its not always the
case that your resources and other aggregations will be represented
in the same RDF "transmission" or "document" and to allow clients to
be better informed about following links to those "things", it would
be good to have a distinction. And following my above position, that
distinction should be in the model and not hidden in the syntax of a
URI.

Sincerely,
Mark

Robert Sanderson

unread,
Apr 11, 2008, 7:25:37 PM4/11/08
to oai...@googlegroups.com
On Fri, Apr 11, 2008 at 11:30 PM, Mark Diggory <mdig...@mit.edu> wrote:
> ___Resolving and Aggregated Aggregations
>
> Firstly, practically every RDF triple by itself is worthless.  It's
> only once they become part of a graph with common predicates that
> they become useful.
> Secondly, it's possible to do the METS approach. Put everything in
> one big bag (a single aggregation) and have some more information
> which describes the internal structure.  There's lots of fun
> predicates like hasPart, hasFormat, and so on in order to establish
> more complicated internal structure rather than having to split
> everything up into millions of itsy-bitsy Aggregations/Resource
> Maps.  What you can't do is then refer to the structure as a
> separate entity. If you need to do that, then you need to assign it
> a URI to identify it.  If it has a URI, it should be
> dereferencable. And ta da, the split between internal structure
> (not referencable) and nested Aggregations (referencable).

1.) I think a specification defining URI that are not "opaque" is
overly restrictive and introduces a significant overhead in
complexity that developers will have to implement on top of the
existing RDF tool set.

Yes. Agreed. All URIs in ORE are opaque.

They became slightly more opaque in 0.3 (if that's possible) by losing the rule that URI-A == URI-R + "#aggregation".

However, I'd like to point out that URIs with fragment identifiers are still opaque, even if there's a rule for how to *construct* them.  You don't need to know the rule and  how to *deconstruct* the URI in order to use it as it is intended to be used.


Is this true? This includes not restricting
whether a URI is "internal structure (not referencable) or nested
Aggregations (referencable)".

That's not a URI is opaque/not opaque distinction, but a URI exists/doesn't exist distinction :)

If you have a flat aggregation with all of the structure described in RDF (say a three level deep tree), then there isn't a URI which identifies the objects at level 2 branch 6. That's what I mean by internal structure not being referencable.

On the other hand, if you had nested aggregations where each branch of the tree is an aggregation with its own Resource Map(s) and Aggregation URIs, then they are referencable by those URIs.

I should write some examples, but that might have to wait for the cookbook wiki.  On that front, I hope that you (all) will contribute to such a wiki!

IMO semantics should
be reserved for the model (RDFS, OWL, Atom, whatever) and for
record that does not necessarily include "internal structure (not
referencable) and nested Aggregations (referencable)". Why do
consider have more predicates a "bad thing" when its where the
"rubber meets the road" in terms of modeling in RDF?

Absolutely.  The only time I consider more predicates a bad thing is when they duplicate existing predicates' semantics.
 
Finally, the point that we make for having a "distinction by
predicate" for an aggregated resource vs an aggregated aggregation is
strictly to provide for that functionality in the ORE model, not to
introduce the idea that "anyone can introduce their own distinctions"
in the model. And the idea is proposed because its not always the
case that your resources and other aggregations will be represented
in the same RDF "transmission"  or "document" and to allow clients to
be better informed about following links to those "things", it would
be good to have a distinction. And following my above position, that
distinction should be in the model and not hidden in the syntax of
URI.

I'm still not convinced I'm afraid.  Yes it's possible to have access to the information that A aggregates AR1, but not what sort of thing AR1 is.  However that's true for ALL predicates, not just aggregates, and I'm sure you're not proposing new predicates for all combinations of predicates, types and roles.

Here's a second example:

I have some external metadata about my aggregated resources.  I put that into the aggregation using the regular ore:aggregates.  For example NISO MIX descriptions of a collection of images.  Here the MIX descriptions describe other aggregated resources.

My colleague then comes along and creates a second aggregation, this time of MIX descriptions as the 'primary' aggregated resources, and some file level metadata descriptions of those descriptions are put into his aggregation.

In the first case it would seem useful to distinguish between Objects and Metadata-About-Objects.  Especially as Metadata can be Objects themselves, as shown in the second aggregation.

Or perhaps the metadata is metadata about the aggregation, not the aggregated resources. Or... (and so forth)

My point is that for as many different examples as we can come up with, there could be that many predicates which are sub-properties of ore:aggregates.

So it seems to me that either:

1)  We should always use ore:aggregates, regardless of the type or role of the object. This seems simpler to me!  If the type or role is important, then add additional triples.

2)  We should come up with a list of common scenarios in which the aggregated resource is somehow special within the aggregation by role (eg metadata) or type (eg aggregation) and provide a short list, accompanied by motivating examples and use cases, of sub-properties for ore:aggregates.  This seems less simple for no significant advantage, compared to having a type or role ontology, as now there's a choice of aggregation predicates. It also fails to map into ATOM where the ore:aggregates predicate is implicit in the feed/entry syntax rather than explicit.


Rob

Graham Triggs

unread,
Apr 14, 2008, 9:56:34 AM4/14/08
to OAI-ORE
On Apr 11, 11:30 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
> I'll bite!
> 1.) I think a specification defining URI that are not "opaque" is
> overly restrictive and introduces a significant overhead in
> complexity that developers will have to implement on top of the
> existing RDF tool set.

I'm pretty sure you aren't arguing for 'true' opaque URIs ;)

Besides, any overhead from processing an invented URI schema is
entirely dependent on what the details of such a scheme would be.

> 2.) If its not the case that URI are opaque, I make the argument that
> new RDF predicates are cheaper and more easily enforced in the model
> and require much less development effort to work with than a wholly
> custom URI syntax.

New predicates add verbosity to the document (increased transfer
time), and more interactions with the XML serialization toolkits. At
this point, you would have to define your concept of cheaper ;)

Also, it's only easily enforced in the model if the model allows it to
be enforceable. This is the problem with the current specification -
the ore:isAggregate (or whatever it is) predicate is not a MUST, which
means it can't be enforced, and if you are presented with an RDF
document that does not contain any of these predicates, how do you
determine if they were simply omitted, or if the aggregation is truly
not aggregating other aggregations?

For this to be practical in the real world for anything more than a
niche set of use cases for a niche community, this kind of details
needs to be:

a) enforceable
b) enforced
c) clearly and obviously described in the specification, not tucked
away where it is easily missed



> Placing semantics in the URI makes it necessary to
> require tooling to parse and interpret the URI, which makes the
> information lost to generic SW/LOD clients and tooling like RDF
> triplestores/SPARQL etc...

If a generic client encountered the URI for an aggregation, would it
know how to derefence the URI to obtain a ReM, and continue processing
that ReM?

Besides, these clients are bound to potentially encounter URIs that
aren't regular http/https/ftp, etc. schemas (as there is no RDF
specification that limits the URIs to being just that) - so wouldn't
they have some way of coping with arbitrary URI schemas?

> And following my above position, that
> distinction should be in the model and not hidden in the syntax of a
> URI.

Or it could be both. One other argument to consider is that URI-As,
etc. can be cited... you may not be seeing those URIs only within the
confines of a semantic document describing aggregations (or semantic
links to those documents from resources). So you could argue that it
should be clear from the URI that you are talking about an
aggregation, or you can argue that the URI should be resolvable in a
web browser and redirect / display human readable information about
that aggregation. Either way, we would benefit from strong
recommendations as to how this should be approached, rather than
hoping it works itself out.

G

Robert Sanderson

unread,
Apr 15, 2008, 12:50:40 PM4/15/08
to oai...@googlegroups.com

Okay, I think I totally missed the real point of this!

In the model, we have nested aggregations. In RDF it's pretty straight forwards:
AggrParent ore:aggregates AggrChild
AggrChild rdf:type ore:Aggregation

In Atom, when we aggregate resources with a mimetype, no problem:

<atom:link rel="alternate" type="application/pdf" href="bla bla bla"/>

The type attribute isn't rdf:type, but dc:format. Oh dear, so where do we put ore:Aggregation?

The answer at the moment, I'm afraid, is that it has to go into an <rdf:Description about="bla bla bla"> block, as the *Aggregation* doesn't have a mime type and you can't tell a priori which resource map serialisation of the aggregation you're going to get. Foo on you, Cool URIs.
Even if you could tell, it wouldn't be correct to say that an Aggregation dc:format text/atom+xml as it's the resource map that has that property.

Rob

Robert Sanderson

unread,
Apr 15, 2008, 12:56:37 PM4/15/08
to oai...@googlegroups.com

No, this is the mistake I made in my slides!  It's atom:category!

<atom:entry>
  <atom:category scheme="http://www.openarchives.org/ore/terms/"

      term="http://www.openarchives.org/ore/terms/Aggregation"
      label="Aggregation"/>

Sorry, my mistake!  But a relevant one :)

R

On Tue, Apr 15, 2008 at 5:50 PM, Robert Sanderson <azar...@gmail.com> wrote:

Okay, I think I totally missed the real point of this!

Mark Diggory

unread,
Apr 15, 2008, 1:06:09 PM4/15/08
to oai...@googlegroups.com

On Apr 14, 2008, at 6:56 AM, Graham Triggs wrote:
>
> On Apr 11, 11:30 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
>> I'll bite!
>> 1.) I think a specification defining URI that are not "opaque" is
>> overly restrictive and introduces a significant overhead in
>> complexity that developers will have to implement on top of the
>> existing RDF tool set.
>
> I'm pretty sure you aren't arguing for 'true' opaque URIs ;)
>

Correct, I simply mean the spec does not define what the structure of
a URI should and leaves that open to the implementer to decide.
However, there may be recommendation or best practices on the usage
of such URI and that such should be clearly identified as such if
used within "the spec".

> Besides, any overhead from processing an invented URI schema is
> entirely dependent on what the details of such a scheme would be.

Just the word "invented" is enough to ruffle my feathers on this
subject. Your example: "ore:aggregation;http://server.com/resource"
alone presents:

1.) The requirement for the specification to define the internal
structure of a URI.

2.) And as such that development of tooling to support parsing that
representation when

<#myAggregation> ore:aggregatesAggregation <http://server.com/some/url>
<#myAggregation> ore:aggregates <http://server.com/some/other/url>

Is ultimately transparent and requires no tooling beyond an RDF
parser and an a Http Client to acquire. Besides this, if in Atom,
something like a <category> is used to identify if an ore:aggregates
points at a resource or an aggregation (as captured in the Atom
schema and not some "invented URI" why impose on the RDF community
that some invented URI be used over a properly defined, schema
enforced validate-able approach?

>
>> 2.) If its not the case that URI are opaque, I make the argument that
>> new RDF predicates are cheaper and more easily enforced in the model
>> and require much less development effort to work with than a wholly
>> custom URI syntax.
>
> New predicates add verbosity to the document (increased transfer
> time), and more interactions with the XML serialization toolkits. At
> this point, you would have to define your concept of cheaper ;)

Thats such a load of bunk... is custom code somehow guaranteed to be
more efficient than a standard parser?

> Also, it's only easily enforced in the model if the model allows it to
> be enforceable. This is the problem with the current specification -
> the ore:isAggregate (or whatever it is) predicate is not a MUST, which
> means it can't be enforced, and if you are presented with an RDF
> document that does not contain any of these predicates, how do you
> determine if they were simply omitted, or if the aggregation is truly
> not aggregating other aggregations?

Not sure what this has to do with URI, seems more schema/ontology
related.

>
> For this to be practical in the real world for anything more than a
> niche set of use cases for a niche community, this kind of details
> needs to be:
>
> a) enforceable
> b) enforced
> c) clearly and obviously described in the specification, not tucked
> away where it is easily missed

[warning... stereotyping ahead]

This is an argument on a continuum of evolutionism vs. creationism...
Evolutionist say, establish the smallest possible set of laws to
enforce on a system and see what emergent behavior arises... while
the Creationist say, define the entire mechanism, top to bottom,
written in stone, and damn all who do not comply. Seems to me, folks
that come from the RDF World ascribe to the former and those from XML
Schema world tend to the later, both could stand to learn a little
from each other.

>> Placing semantics in the URI makes it necessary to
>> require tooling to parse and interpret the URI, which makes the
>> information lost to generic SW/LOD clients and tooling like RDF
>> triplestores/SPARQL etc...
>
> If a generic client encountered the URI for an aggregation, would it
> know how to derefence the URI to obtain a ReM, and continue processing
> that ReM?

If its a #name, that is something outside the spec that suggests an
identifier referenced within the document. If its a relative or
absolute uri, it suggests that its resolvable separately from the
Aggregation, and in RDF it doesn't need to be either, it could be a
bare node or and id reference that is used to locate the ReM wholly
independent of any uri based rdf:reference, which is what Robert
Sanderson is talking about.


> Besides, these clients are bound to potentially encounter URIs that
> aren't regular http/https/ftp, etc. schemas (as there is no RDF
> specification that limits the URIs to being just that) - so wouldn't
> they have some way of coping with arbitrary URI schemas?

No not necessarily... Why overcomplicate things with such complexity.

>
>> And following my above position, that
>> distinction should be in the model and not hidden in the syntax of a
>> URI.
>
> Or it could be both. One other argument to consider is that URI-As,
> etc. can be cited... you may not be seeing those URIs only within the
> confines of a semantic document describing aggregations (or semantic
> links to those documents from resources). So you could argue that it
> should be clear from the URI that you are talking about an
> aggregation, or you can argue that the URI should be resolvable in a
> web browser and redirect / display human readable information about
> that aggregation. Either way, we would benefit from strong
> recommendations as to how this should be approached, rather than
> hoping it works itself out.

I wouldn't rely on the URI's structure, I've not ever seen an RDFS/
OWL ontology that explicitly says a URI representing this object has
to adhere to "X structure"... The closest thing I can think of is
DCMI Encoding schemes. But even then, I'm not convinced I know of any
validation engine outside of those for w3c XML Schema that might be
capable of validating such structure. And even then, I think it'd be
a real pain to get right and would be much easier to do in the
"model" rather than its "referencing mechanism".

In programming, do you code a variable that may be dereferenced by
the languages standard dereferencing mechanism, or do you make up a
wholly new and different dereferencing mechanism for that language
because you think the the byte addresses being dereferenced should
use N-1 bits instead of N bits? I don't think so. Same here, why go
outside the scope of the language to force the developers to do
unnecessary work to manufacture semantically encoded URI when they
could just use a predicate?

Ultimately, I finally got what Rob is saying about

AggrParent ore:aggregates AggrChild
AggrChild rdf:type ore:Aggregation

And I rather agree now, just define another statement about the
aggregation in the instance, it may not be "everything" about that
aggregation, it is simply "stuff about this aggregation in relation
to the current Aggregation". I.E. just add a third level to your rdf
with the statements necessary to identify that something is an
aggregation. This is something I am now adding to my prototype.

-Mark

Mark Diggory

unread,
Apr 15, 2008, 1:09:55 PM4/15/08
to oai...@googlegroups.com

On Apr 15, 2008, at 9:50 AM, Robert Sanderson wrote:
>
> Okay, I think I totally missed the real point of this!
>
> In the model, we have nested aggregations. In RDF it's pretty
> straight forwards:
> AggrParent ore:aggregates AggrChild
> AggrChild rdf:type ore:Aggregation
>

Yep, I now grok it and agree.

> In Atom, when we aggregate resources with a mimetype, no problem:
>
> <atom:link rel="alternate" type="application/pdf" href="bla bla bla"/>
>
> The type attribute isn't rdf:type, but dc:format. Oh dear, so where
> do we put ore:Aggregation?
>
> The answer at the moment, I'm afraid, is that it has to go into an
> <rdf:Description about="bla bla bla"> block, as the *Aggregation*
> doesn't have a mime type and you can't tell a priori which resource
> map serialisation of the aggregation you're going to get. Foo on
> you, Cool URIs.
> Even if you could tell, it wouldn't be correct to say that an
> Aggregation dc:format text/atom+xml as it's the resource map that
> has that property.

I assume you wouldn't know the format until the uri was resolved
(content negotiation or plain ole http response) and you had the
header in hand. Thats not so bad if theres other detail in the RDF
suggesting the resource is another aggregation.

-Mark

Pete Johnston

unread,
Apr 15, 2008, 1:22:02 PM4/15/08
to oai...@googlegroups.com
Just jumping in to one point in this discussion...

(BTW, I strongly agree with Mark's comments about the introduction of a
new URI scheme. It has huge cost implications for every potential
consumer of the data, and I don't really understand why it would be
required.)

...but cutting to this point, which I admit has been worrying me a bit:

Graham said:

> > Also, it's only easily enforced in the model if the model
> allows it to
> > be enforceable. This is the problem with the current
> specification -
> > the ore:isAggregate (or whatever it is) predicate is not a
> MUST, which
> > means it can't be enforced, and if you are presented with an RDF
> > document that does not contain any of these predicates, how do you
> > determine if they were simply omitted, or if the
> aggregation is truly
> > not aggregating other aggregations?

Mark said:

> Not sure what this has to do with URI, seems more
> schema/ontology related.

Yes, I think so too.

If I understood correctly, Graham wants a way to be certain that given
only a ReM graph, an aggregated resource referred to in that graph is
_not_ itself an Aggregation.

But I'm not sure there's really any way, given (a) the "open world"/"you
never know everything about X" nature of RDF and (b) a ReM containing an
ore:aggregates triple with object URI some:resource, of knowing for sure
that that aggregated resource is not itself an Aggregation.

I can add information to my ReM to indicate that the thing identified by
some:resource _is_ an Aggregation, whether that is in the form of an
additional explicit rdf:type triple or using an alternative to
ore:aggregates with a different range so that the type can be inferred.

But I'm not sure I see how I can rule out the option that, independently
of my ReM and beknownst to me, the owner of some:resource describes
some:resource as an Aggregation.

Or have I mininterpreted the question?

Pete
---
Pete Johnston
Technical Researcher, Eduserv Foundation
Web: http://www.eduserv.org.uk/foundation/people/petejohnston/
Weblog: http://efoundations.typepad.com/efoundations/
Email: pete.j...@eduserv.org.uk
Tel: +44 (0)1225 474323

Graham Triggs

unread,
Apr 16, 2008, 5:16:27 AM4/16/08
to OAI-ORE
On Apr 15, 6:06 pm, Mark Diggory <mdigg...@MIT.EDU> wrote:
> > Besides, any overhead from processing an invented URI schema is
> > entirely dependent on what the details of such a scheme would be.
>
> Just the word "invented" is enough to ruffle my feathers on this
> subject. Your example: "ore:aggregation;http://server.com/resource"
> alone presents:
>
> 1.) The requirement for the specification to define the internal
> structure of a URI.
>
> 2.) And as such that development of tooling to support parsing that
> representation when
>
> <#myAggregation> ore:aggregatesAggregation <http://server.com/some/url>
> <#myAggregation> ore:aggregates <http://server.com/some/other/url>
>
> Is ultimately transparent and requires no tooling beyond an RDF
> parser and an a Http Client to acquire. Besides this, if in Atom,
> something like a <category> is used to identify if an ore:aggregates
> points at a resource or an aggregation (as captured in the Atom
> schema and not some "invented URI" why impose on the RDF community
> that some invented URI be used over a properly defined, schema
> enforced validate-able approach?

It still requires tooling within the client to recognise the RDF
predicates, action the URI-A to discover the URI-R, and then pull the
ReM.

The "invented URI" is just an idea that has merits as well as
drawbacks. So does having semantic predicates.

But what we don't have right now is any guarantees about how this will
work, and that's necessary for this to be scalable.

> >> 2.) If its not the case that URI are opaque, I make the argument that
> >> new RDF predicates are cheaper and more easily enforced in the model
> >> and require much less development effort to work with than a wholly
> >> custom URI syntax.
>
> > New predicates add verbosity to the document (increased transfer
> > time), and more interactions with the XML serialization toolkits. At
> > this point, you would have to define your concept of cheaper ;)
>
> Thats such a load of bunk... is custom code somehow guaranteed to be
> more efficient than a standard parser?

I never said that. But a standard parser isn't guaranteed to be more
efficient than custom code either.

> > For this to be practical in the real world for anything more than a
> > niche set of use cases for a niche community, this kind of details
> > needs to be:
>
> > a) enforceable
> > b) enforced
> > c) clearly and obviously described in the specification, not tucked
> > away where it is easily missed
>
> [warning... stereotyping ahead]
>
> This is an argument on a continuum of evolutionism vs. creationism...
> Evolutionist say, establish the smallest possible set of laws to
> enforce on a system and see what emergent behavior arises... while
> the Creationist say, define the entire mechanism, top to bottom,
> written in stone, and damn all who do not comply. Seems to me, folks
> that come from the RDF World ascribe to the former and those from XML
> Schema world tend to the later, both could stand to learn a little
> from each other.

I'm not trying to stop any possible evolution of spec, but just as
there are things it has to dictate for the spec to even say anything
useful, there are other things that need to be nailed down to make it
work in practise.

For example, I can't publish ReMs if my server is going to be hosed by
aggressive clients that are simply trying to deal with the fact that
the specification states that ReMs can only be one level deep and
aggregations can be nested, but doesn't make any guarantees about how
you identify aggregations as aggregated resources.

> > Or it could be both. One other argument to consider is that URI-As,
> > etc. can be cited... you may not be seeing those URIs only within the
> > confines of a semantic document describing aggregations (or semantic
> > links to those documents from resources). So you could argue that it
> > should be clear from the URI that you are talking about an
> > aggregation, or you can argue that the URI should be resolvable in a
> > web browser and redirect / display human readable information about
> > that aggregation. Either way, we would benefit from strong
> > recommendations as to how this should be approached, rather than
> > hoping it works itself out.
>
> I wouldn't rely on the URI's structure, I've not ever seen an RDFS/
> OWL ontology that explicitly says a URI representing this object has
> to adhere to "X structure"... The closest thing I can think of is
> DCMI Encoding schemes. But even then, I'm not convinced I know of any
> validation engine outside of those for w3c XML Schema that might be
> capable of validating such structure. And even then, I think it'd be
> a real pain to get right and would be much easier to do in the
> "model" rather than its "referencing mechanism".

No, but ORE goes slightly beyond mere ontologies. It defines new
resources - Aggregations - and it's entirely within scope of defining
what those resources are to say how they should be identified (if you
chose to).

G

Phil Barker

unread,
Apr 16, 2008, 8:17:28 AM4/16/08
to oai...@googlegroups.com
Hello all, I'm really not sure whether this is more relevant to this
thread or the one on use cases, but it relates to proxies and sequencing
resources which has come up here.

My background is in eLearning, I know that this isn't high on the list
target uses for ORE, but it would be better for all if we in eLearning
could adopt a specification being used for research outputs rather than
invent our own. For teaching and learning in general we're used to
treating resources as aggregations rather than indivisible atomic
units,-- so much so that rather than talk about aggregations having the
potential to "break open the package" we tend to focus more on how we
can package the disparate resources that are being brought together to
teach a concept.

I think it is fair to say that if ORE is to be used in teaching and
learning then re-ordering/sequencing resources cannot be described as an
"edge case". It's pretty typical for a teacher to create an aggregation
of resources and say something along the lines of: read resource A
first; then look at B,C & D (in any order) which discuss A; do
exercise/activity/assignment E (and perhaps they would go on to say: if
you do well in E then you're through to the next topic, if you don't do
well in E then here are some other resources to look at). Of course
another teacher or the student reusing this aggregation of resources
might want to modify the sequence.

At first when I heard [I think it was] Carl talk about the metadata for
the resource map giving information about who had created the resource
map and made the assertions therein, I thought that was enough to give a
provenance to the assertions made about the aggregated resources. And I
thought that was enough to allow assertions to be made that couldn't be
trusted to be generally true of the resource outside the
aggregation--after all, if it's only me who says it's an aggregation in
the first place, and if I'm not the creator of the aggregated-resources
what would I know? So I found the argument that we need proxies surprising.

Anyway, it seems to me that if proxies are the solution then perhaps the
problem is too difficult to solve this way. We have ways of specifying
the order of resources (the organization section in an IMS Content
Package manifest, IMS Simple Sequencing, and maybe there's something
from struct maps in METS?), why not say that ordering is out of scope of
ORE, but that a file in the aggregation provides information on the
desired ordering of resources for this resource map. It'ld be useful to
have a relationship type along the lines of isSequencedBy, and perhaps a
vocabulary for the types files that specify the sequence--though I would
be happy enough for those to be left as potential extensions.

Regards, Phil.


--
Phil Barker Learning Technology Adviser
ICBL, School of Mathematical and Computer Sciences
Mountbatten Building, Heriot-Watt University,
Edinburgh, EH14 4AS
Tel: 0131 451 3278 Fax: 0131 451 3327
Web: http://www.icbl.hw.ac.uk/~philb/


Sean Gillies

unread,
Apr 16, 2008, 1:52:10 PM4/16/08
to oai...@googlegroups.com

Hosed by aggressive clients? I wonder if the solution to that isn't to
get client developers to abide by instructions in a good old robots.txt
file. There is some precedent: the Swooglebot will attempt to fetch and
read robots.txt.

http://swoogle.umbc.edu/index.php?option=com_swoogle_manual&manual=swooglebot

Sean

Reply all
Reply to author
Forward
0 new messages