Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Fwd: ORE JSON-LD context

47 views
Skip to first unread message

Herbert Van de Sompel

unread,
Dec 17, 2014, 7:21:07 PM12/17/14
to oai...@googlegroups.com
hi all,

I forward the below conversation related to the JSON-LD serialization of OAI-ORE. It started privately but is of interest to this list.

Greetings

Herbert

Forwarded conversation
Subject: ORE JSON-LD context
------------------------

From: Robert Sanderson <azar...@gmail.com>
Date: Mon, Dec 15, 2014 at 1:58 PM
To: Stian Soiland-Reyes <soilan...@cs.manchester.ac.uk>, Simeon Warner <simeon...@cornell.edu>, Herbert van de Sompel <hvd...@gmail.com>



Hi all,

We're looking to implement ORE as a generic aggregation and list construction in Hydra, over top of Fedora4, which is an implementation of LDP.

In looking at the ORE in JSON-LD spec, I have a concern with the proxies context for our use cases.  We want to use them primarily for ordering the resources, when needed. This was the selling point of ORE rather than the Ordered List Ontology, rdf:List, the Collections Ontology and so forth, where the proxy node equivalent is always required, and hence you can't use the same construction for both ordered lists and unordered sets.

The JSON pattern that seems closest to us would be instead of @reverse on the proxyIn, to use @reverse on proxyFor, resulting in a single tree structure like:

{
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
    {
      "@id": "http://example.org/1",
      "proxy": 
        {
           "@id": "_:p1",
          "@type": "Proxy",
           "iana:next": "_:p2",
           "proxyIn":  "http://example.com/aggr"
        }
      },
    {
      "@id": "http://example.org/2",
      "proxy": 
        {
           "@id": "_:p2",
          "@type": "Proxy",
           "iana:prev": "_:p1",
           "proxyIn": "http://example.com/aggr"
        }
      }
    ]
}

Rather than two flat lists:

{
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
     "http://example.org/2"   
  ],
  "proxies": [
       {
          "@id": "_:p1",
          "@type": "Proxy",
          "iana:next": "_:p2",
          "proxyFor": "http://example.org/1"
       },
       {
          "@id": "_:p2",
          "@type": "Proxy",
          "iana:prev": "_:p1",
          "proxyFor": "http://example.org/2"
       }
  ]
}

Where the proxy is separated from the resource that it is a proxyFor.
 
Could we add proxy (or similar) as a key for the @reverse of proxyFor?

Thanks!

Rob

Links:




--
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

----------
From: Stian Soiland-Reyes <soilan...@cs.manchester.ac.uk>
Date: Wed, Dec 17, 2014 at 12:43 PM
To: Robert Sanderson <azar...@gmail.com>
Cc: Herbert Van de Sompel <hvd...@gmail.com>, Simeon Warner <simeon...@cornell.edu>

One reason why I did not consider "proxy" below the aggregated resource is that it means repeated duplication of the proxyIn key, which people might even forget to put in. I felt that proxies identify as much the link to the resource as to the resource map, but if anyone "owns" the proxy then it is the map as it can only be proxyFor a single map.

In the ORE jsonld spec I wanted to emphasize the map as the root container and make it clearer what are statement s about this aggregation and what are for others. Would for instance "proxy" also be usable to list proxies from other aggregations? This could become confusing.

I can however see reasons for keeping them under the aggregated resource, e.g. you can easily check that you remembered to add it or did not have any dangling proxies after modifications.

What is behind your requirement of nesting the proxies?


----------
From: Robert Sanderson <azar...@gmail.com>
Date: Wed, Dec 17, 2014 at 12:57 PM
To: Stian Soiland-Reyes <soilan...@cs.manchester.ac.uk>
Cc: Herbert Van de Sompel <hvd...@gmail.com>, Simeon Warner <simeon...@cornell.edu>



Hi Stian,

I guess that proxy /could/ list other aggregations' proxies, but I don't think anyone would do that.  As you note in the document, if you're aggregating a proxy or otherwise referring to one, it doesn't live in the same space as the proxies for the aggregated resources of this aggregation.

The rationale behind the nesting was the maintenance and management of the data outside of a triplestore.

For example, imagine a 1000 resource aggregation.  To find the proxy for each resource you'd [in the naive implementation without pre-indexing] need to scan the list of 1000 proxies for each of the 1000 resources.  As opposed to having it nested, with is O(0) as you already have it :)  With indexing you still need to scan the list of proxies once to build the index, and then look up the proxies in the list per aggregated resource ... which is pretty much the same as inverting the proxyIn/proxyFor mapping ... so might as well do that in the JSON rather than in code.

If you've gone to the trouble of pre-indexing the proxies, you could just use an rdf:List.  The manipulation requirements seem approximately equal.

As a resource can appear more than once in an aggregation by including multiple proxies, the number of aggregated resources and number of proxies could be different.  To me this makes more sense when the proxies are attached to the resource, otherwise the assumption that N aggregated resources = N proxies would lead to poor implementations.

If you remove an aggregated resource from an aggregation, you'd also want to remove the proxy for it ... which means searching for it in the proxies list.  Or any other operation ... it seems like if there is a proxy for a resource, you want to have both the proxy and the description of the resource to process at the same time.

Thanks!

Rob


Stian Soiland-Reyes

unread,
Dec 24, 2014, 12:47:50 AM12/24/14
to oai...@googlegroups.com

I think you are touching on something important. Each aggregated resource have at least one proxy by definition.

Your indexing problem is the same if you are looking up a proxy in the other direction, e.g. you have an annotation about a proxy and you need to resolve to the resource.

However I think I buy your big-O argument, you would be more likely to look up the proxy for each aggregated resource rather than all the resource for each proxy. Getting a list of all the proxies is however trickier, but I agree it should be better optimized for lookup, particularly as the client of JSONLD structure could be lightweight JS code.

Another argument in favour is that generating the resource map then can be done as a single take without temporary data structures.

A potential disadvantage would be that you are forced to use the full object form within "aggregates" just to include proxies, and so the JSONLD structure could look intimidating (e.g. Semantic Web-like) even for simple cases; in my existing structure "proxies" are a separate branch you don't need to worry about unless you want to.

I must admit that in our spec https://w3id.org/bundle/ (which predates ORE JSONLD) we use the nested "proxy" form, but with "proxyIn" implied through OWL inferences. (Hence our "proxy" maps to an inverse sub property of ore:proxyFor)

One question, if we were to include "proxy", the we will deprecate "proxies", right? It would be confusing to keep both as available options.

Would either of you have time to propose a patch to the HTML and JSONLD context? I believe I have it under https://github.com/stain/ somewhere (search for ORE)

--
You received this message because you are subscribed to the Google Groups "OAI-ORE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oai-ore+u...@googlegroups.com.
To post to this group, send email to oai...@googlegroups.com.
Visit this group at http://groups.google.com/group/oai-ore.
For more options, visit https://groups.google.com/d/optout.

Robert Sanderson

unread,
Jan 5, 2015, 12:03:05 PM1/5/15
to oai...@googlegroups.com, soilan...@cs.manchester.ac.uk

Hi Stian,

On Tuesday, December 23, 2014 9:47:50 PM UTC-8, Stian Soiland-Reyes wrote:

I think you are touching on something important. Each aggregated resource have at least one proxy by definition. Your indexing problem is the same if you are looking up a proxy in the other direction, e.g. you have an annotation about a proxy and you need to resolve to the resource.

Agreed, but of the two, I would expect that most implementations would walk through the resource list and check to see if there is a proxy, rather than walking the proxy list and finding the resource.  The rationale for this being that you don't have to have a proxy for every resource, or even any resource.  If you have a reference to a proxy from an external source, then the complexity is the same in both cases of O(n) 

Nested:  Iterate resource list, check x['proxy']['@id'] == PROXYURI
Separate:  Iterate proxy list, check x['@id'] == PROXYURI
 

However I think I buy your big-O argument, you would be more likely to look up the proxy for each aggregated resource rather than all the resource for each proxy. Getting a list of all the proxies is however trickier, but I agree it should be better optimized for lookup, particularly as the client of JSONLD structure could be lightweight JS code.

I think that optimizing for lightweight JS code is the strongest argument in any JSON-LD discussion.  In my mind at least, having the same content accessible to both full stack RDF engines and lightweight browser based javascript systems is the main (and extremely significant) reason for using JSON-LD.

 

Another argument in favour is that generating the resource map then can be done as a single take without temporary data structures.

True.  Unlikely to be a problem in practice unless the aggregation had millions of resources, but does make the implementation one step easier.

 

A potential disadvantage would be that you are forced to use the full object form within "aggregates" just to include proxies, and so the JSONLD structure could look intimidating (e.g. Semantic Web-like) even for simple cases; in my existing structure "proxies" are a separate branch you don't need to worry about unless you want to.

Ahh, true. The proxy always needs to be an object (as it always has to have at least proxyIn or proxyFor), but the aggregated resource could otherwise just be a string.  The ORE docs don't recommend any relationships for aggregated resources.  
On the other hand, it forces a single structure -- the convenience of the short form is often outweighed by the annoyance of having to check for it.

 

I must admit that in our spec https://w3id.org/bundle/ (which predates ORE JSONLD) we use the nested "proxy" form, but with "proxyIn" implied through OWL inferences. (Hence our "proxy" maps to an inverse sub property of ore:proxyFor)

While we're admitting things, I did forget to put in proxyIn the first time I wrote out the JSON and wished to be able to infer it :) 
 

One question, if we were to include "proxy", the we will deprecate "proxies", right? It would be confusing to keep both as available options.

I think so.  Having one structure is better than two and having to test.

 

Would either of you have time to propose a patch to the HTML and JSONLD context? I believe I have it under https://github.com/stain/ somewhere (search for ORE)

I'll have a look :)

Rob

Stian Soiland-Reyes

unread,
Jan 5, 2015, 12:14:22 PM1/5/15
to oai...@googlegroups.com
Agree with all the below. :)

Source code located: https://github.com/stain/ore
--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

Robert Sanderson

unread,
Jan 5, 2015, 12:54:23 PM1/5/15
to oai...@googlegroups.com, soilan...@cs.manchester.ac.uk

Pull request submitted :D

Rob
Reply all
Reply to author
Forward
0 new messages