Hi everyone,
We're still missing a couple of people here, but most everyone I
invited are now signed up and the rest can read the archives--so let's
get started. Please do go ahead and invite any interested colleagues
and anyone we might have overlooked; the member list [1] is visible to
all current members.
The impetus for this group is the considerable interest at the SWAT4LS
hackathon in Cambridge [2] last week in the use of the compact HDT
binary file format for RDF data [3] to publish bioinformatics
datasets. The question posed several times was "why didn't we know
about this?"
Several major bio datasets are now in the process of being converted
to HDT. Alexander had already converted the Reactome dataset [4,5] to
HDT previously, and Michel has been working on converting Bio2RDF [6].
During the hackathon, Evan & Gang began a conversion of PubChemRDF [7]
and Atsuko & Yasunori completed converting Allie [8]. Núria also
expressed an interest in converting DisGeNET datasets.
Egon and Atsuko & Yasunori independently succeeded in executing SPARQL
queries directly on local HDT files using the Jena adapter from the
HDT/Java [9] project. Atsuko & Yasunori & Arto also published part of
the Allie ontology on Dydra using Dydra's native HDT storage backend
[10].
Egon began work on a Bioclipse plugin for HDT, which he has since then
completed. I've invited Egon to post a summary of that work here, as
I'm sure many of you are interested in checking that out.
Evan presented a summary of the day's findings at the wrap-up session
for the hackathon. If the slide deck could be considered public,
perhaps Evan might be so kind as to post it to the list?
The purpose of this group is to bring all these conversations about
HDT for bioinformatics to one channel to the benefit of everyone.
(Currently, I have a dozen follow-up email threads going on, which is
unwieldy and also inevitably excludes some interested parties.)
I've also invited the original HDT spec & tooling authors to join the
group, and Mario Arias [11] already has. So, we have a lot of
expertise in this room, let's make use of it to figure out and realize
the benefits that HDT can bring to the bioinformatics community!
[1]
https://groups.google.com/forum/#!members/biohdt
[2]
http://www.swat4ls.org/workshops/cambridge2015/programme/hackathon/
[3]
http://www.rdfhdt.org
[4]
https://www.ebi.ac.uk/rdf/services/reactome/
[5]
https://github.com/alexgarciac/gittemp
[6]
http://bio2rdf.org/
[7]
https://pubchem.ncbi.nlm.nih.gov/rdf/
[8]
http://allie.dbcls.jp/
[9]
https://github.com/rdfhdt/hdt-java
[10]
http://dydra.com/bendiken/allie-ontology
[11]
https://github.com/MarioAriasGa
--
Arto Bendiken | @bendiken | @dydradata