Using Gremlin + Linked Data Sail to query schema.org (microdata, rdfa?)

243 views
Skip to first unread message

Dan Brickley

unread,
Mar 2, 2012, 6:32:23 PM3/2/12
to gremli...@googlegroups.com, Joshua Shinavier, danbri
Hi folks

I had fun last year with g = new LinkedDataSailGraph(new MemoryStoreSailGraph())

...exploring linked data from DBpedia, Freebase, Identi.ca, FOAF etc.
thanks to Gremlin plus the LinkedDataSail work.

Some notes, http://danbri.org/words/2011/05/10/675

I'm now working for Google on schema.org, a project that's getting a
lot of semantic markup into mainstream Web sites.

I thought it would be fun to try Gremlin again with such sites,
especially to ensure cross-page linkages work out well. There are some
fiddly issues especially around identifying things versus pages about
those things (usual semweb stuff). For example you can find some
schema.org markup on IMDB's site, but I don't know how well the
actors/movies graph links together between sites. Gremlin seems a good
tool to help explore the issues.

To do this, the LinkedDataSail would need to consume HTML5 Microdata
syntax, and ideally also RDFa 1.1 too. Is this feasible yet?

Thanks for any pointers.

cheers,

Dan

p.s. for a Java Microdata extractor, see
http://incubator.apache.org/any23/dev-microdata-extractor.html
and for RDFa 1.1 I find http://code.google.com/p/rdfa-core-java/ but
haven't tried it.

Joshua Shinavier

unread,
Mar 2, 2012, 9:12:16 PM3/2/12
to Dan Brickley, gremli...@googlegroups.com, danbri
On Fri, Mar 2, 2012 at 6:32 PM, Dan Brickley <dan...@danbri.org> wrote:
> Hi folks
>
> I had fun last year with g = new LinkedDataSailGraph(new MemoryStoreSailGraph())
>
> ...exploring linked data from DBpedia, Freebase, Identi.ca, FOAF etc.
> thanks to Gremlin plus the LinkedDataSail work.
[...]

> actors/movies graph links together between sites. Gremlin seems a good
> tool to help explore the issues.

Gremlin, shmemlin. Anyway...

> To do this, the LinkedDataSail would need to consume HTML5 Microdata
> syntax, and ideally also RDFa 1.1 too. Is this feasible yet?


Well, there's no explicit support for RDFa or Microdata yet in
LinkedDataSail, although one could drop in an RDFizer [1] for either
format using ldSail.getWebClosure().addRdfizer. These could also be
added to the 0.8 release if they don't introduce too many large
dependencies.

> Thanks for any pointers.
>
> cheers,
>
> Dan
>
> p.s. for a Java Microdata extractor, see
> http://incubator.apache.org/any23/dev-microdata-extractor.html
> and for RDFa 1.1 I find http://code.google.com/p/rdfa-core-java/ but
> haven't tried it.


I'll look into those. I've been playing with an RDFa (1.0) library
from Aduna which would probably be a nice addition, as well.

Thanks for the suggestion.

Josh

[1] http://ripple.fortytwo.net/java/apidocs/net/fortytwo/linkeddata/Rdfizer.html

Dan Brickley

unread,
Jan 15, 2013, 12:03:40 PM1/15/13
to Joshua Shinavier, Dan Brickley, gremli...@googlegroups.com
(Oh, I see now I asked the same question already; I had thought the
mail hadn't got through.)

On 3 March 2012 02:12, Joshua Shinavier <shi...@rpi.edu> wrote:
> On Fri, Mar 2, 2012 at 6:32 PM, Dan Brickley <dan...@danbri.org> wrote:
>> Hi folks
>>
>> I had fun last year with g = new LinkedDataSailGraph(new MemoryStoreSailGraph())
...
>> To do this, the LinkedDataSail would need to consume HTML5 Microdata
>> syntax, and ideally also RDFa 1.1 too. Is this feasible yet?
>
> Well, there's no explicit support for RDFa or Microdata yet in
> LinkedDataSail, although one could drop in an RDFizer [1] for either
> format using ldSail.getWebClosure().addRdfizer. These could also be
> added to the 0.8 release if they don't introduce too many large
> dependencies.

So the new news is that https://github.com/levkhomich/semargl is said
to be pretty good. I'd definitely say RDFa 1.1 is a good focus (esp.
Lite, but note that there is no such thing as a lite parser, parser
writers need to anticipate encountering full 1.1 markup).

Dan

Joshua Shinavier

unread,
Jan 16, 2013, 1:17:18 AM1/16/13
to Dan Brickley, Dan Brickley, gremli...@googlegroups.com
Hi Dan,

Thanks for the ping and the link.  I'm happy to report that we now have RDFa support in LinkedDataSail.  Not through Semarql (which appears to be an integrated solution rather than just a parser, and depends on Jena), but through a new collection [1] of Sesame writers and parsers which has very confusingly been dubbed "Sesame Tools".  "SesameTools" [2] being the collection of Sesame writers, parsers, and other components which LinkedDataSail depends on.  Name conflict aside, the parser is suitably licensed and appears to work perfectly well, so in one way or another, RDFa support will be part of the next Ripple release (==> probably the next TinkerPop stack release).

Best,

Josh


Dan Brickley

unread,
Jan 16, 2013, 4:41:38 AM1/16/13
to Joshua Shinavier, gremli...@googlegroups.com, Dan Brickley


On 16 Jan 2013 06:17, "Joshua Shinavier" <shi...@rpi.edu> wrote:
>
> Hi Dan,
>
> Thanks for the ping and the link.  I'm happy to report that we now have RDFa support in LinkedDataSail.  Not through Semarql (which appears to be an integrated solution rather than just a parser, and depends on Jena), but through a new collection [1] of Sesame writers and parsers which has very confusingly been dubbed "Sesame Tools".  "SesameTools" [2] being the collection of Sesame writers, parsers, and other components which LinkedDataSail depends on.  Name conflict aside, the parser is suitably licensed and appears to work perfectly well, so in one way or another, RDFa support will be part of the next Ripple release (==> probably the next TinkerPop stack release).

That's great news! :) Does it claim to handle RDFa 1.1?

Dan

Joshua Shinavier

unread,
Jan 16, 2013, 2:22:47 PM1/16/13
to gremli...@googlegroups.com, Dan Brickley


On Wed, Jan 16, 2013 at 4:41 AM, Dan Brickley <dan...@danbri.org> wrote:

[...] 

That's great news! :) Does it claim to handle RDFa 1.1?



Still initiating communication with them, and the documentation doesn't specify, but the parser does correctly extract all of the RDFa 1.1 metadata in the web pages I have tried so far, e.g. this one with schema.org + Datasets [1] markup:


$ ./ripple.sh 

          rdf:type
             dcat:Dataset,
             <http://schema.org/Dataset>;
          <http://schema.org/url>
          <http://schema.org/name>
             "Seismic Hazard Zones";
             "2011";
             "This is a dataset of liquefaction and landslide zones in the state of California.";
             dbr:United_States;
             <urn:uuid:node17h2c5dfsx1>;
          <http://schema.org/about>
             dbr:Seismic_hazard;
          <http://schema.org/keyword>
             "gis",
             "maps",
             "layers",
             "geography";
          xhv:license
             "person (\"geotechnical investigation\")";
             "single study".




Josh


 
--
 
 

Joshua Shinavier

unread,
Jan 25, 2013, 6:13:20 AM1/25/13
to gremli...@googlegroups.com, Dan Brickley, Dan Brickley, Lev Khomich
Hi Stéphane,

[note: repeating some info from a side-conversation on this topic]

Indeed, Semargl is much more frugal with its dependencies than I initially thought.  In fact, the semargl-sesame module does not introduce any external dependencies which LinkedDataSail does not already have, so it's ideal in that respect.  I have tested the parser with LinkedDataSail, by which I mean that I have hit a few web pages with RDFa and found that it handles them appropriately, just as the "Sesame Tools" parser does.  A more thorough comparison is needed, but not necessarily at the Linked Data client level.  Given that operations on Linked Data are so network bound, I would favor the parser which distinguishes itself in terms of compliance and error handling, as opposed to performance.

At any rate, there are not one but two good RDFa options for use with LinkedDataSail.  Definitely not too late to use Semargl's.  Peter Ansell has suggested [1] an RDFFormat.RDFA constant in Sesame which would allow developers to choose their RDFa parser implementation (via the RDFParserFactory service loader) rather than having one or the other hard-coded into LinkedDataSail.

Thanks.

Josh



On Thu, Jan 24, 2013 at 1:21 PM, Stéphane Corlosquet <scorl...@gmail.com> wrote:
Hi Josh,


On Wednesday, January 16, 2013 1:17:18 AM UTC-5, Joshua Shinavier wrote:
Hi Dan,

Thanks for the ping and the link.  I'm happy to report that we now have RDFa support in LinkedDataSail.  Not through Semarql (which appears to be an integrated solution rather than just a parser, and depends on Jena),

I'm no semargl expert, but from reading the docs on semargl, there is a Jena integration, but it doesn't seem to be dependent on Jena. I'm cc'ing Lev, maintainer of the semargl library.
 
but through a new collection [1] of Sesame writers and parsers which has very confusingly been dubbed "Sesame Tools".  "SesameTools" [2] being the collection of Sesame writers, parsers, and other components which LinkedDataSail depends on.  Name conflict aside, the parser is suitably licensed and appears to work perfectly well, so in one way or another, RDFa support will be part of the next Ripple release (==> probably the next TinkerPop stack release).

Great to hear you have added RDFa support via Sesame tools. Note also that Peter Ansell recently contributed a Sesame module for semargl [1] which is now part of semargl [2] in case you are interested (though it might be too late now).

hope that helps.
Steph.

--
 
 

Stéphane Corlosquet

unread,
Jan 25, 2013, 11:03:13 AM1/25/13
to gremli...@googlegroups.com, Dan Brickley, Dan Brickley, Lev Khomich
Hi Josh,

On Fri, Jan 25, 2013 at 6:13 AM, Joshua Shinavier <jo...@fortytwo.net> wrote:
Hi Stéphane,

[note: repeating some info from a side-conversation on this topic]

Indeed, Semargl is much more frugal with its dependencies than I initially thought.  In fact, the semargl-sesame module does not introduce any external dependencies which LinkedDataSail does not already have, so it's ideal in that respect.  I have tested the parser with LinkedDataSail, by which I mean that I have hit a few web pages with RDFa and found that it handles them appropriately, just as the "Sesame Tools" parser does.  A more thorough comparison is needed, but not necessarily at the Linked Data client level.  Given that operations on Linked Data are so network bound, I would favor the parser which distinguishes itself in terms of compliance and error handling, as opposed to performance.

One can already see how semargl performs with regards to the official RDFa 1.1 test suite at [1], it's currently passing 100% of the test for RDFa 1.1 in HTML5, XHTML5, XML and SVG! (you need to log in to run the suite, it's required to keep the load on the servers low).

One could test Sesame Tools conformance easily by hosting it as a service and putting the URI in the processor field, see [2] for more info on how to create a processor endpoint.
 

At any rate, there are not one but two good RDFa options for use with LinkedDataSail.  Definitely not too late to use Semargl's.  Peter Ansell has suggested [1] an RDFFormat.RDFA constant in Sesame which would allow developers to choose their RDFa parser implementation (via the RDFParserFactory service loader) rather than having one or the other hard-coded into LinkedDataSail.

Reply all
Reply to author
Forward
0 new messages