Bigdata

2 views
Skip to first unread message

Hilmar Lapp

unread,
Feb 14, 2009, 2:49:43 PM2/14/09
to Phenoscape Developers, OBD Development
This is an open-source high-performance, scalable, potentially
distributed semantic web (triple) data store:

http://bigdata.sourceforge.net/pubs/bigdata-oscon-7-23-08.pdf

It's open-source, and the presentation looks intriguing and looks like
definitely worth trying out. It's apparently also among the largest
RDF triple stores deployed (http://esw.w3.org/topic/LargeTripleStores).

Chris - has anyone tried to layer the OBD API on top of this?

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
===========================================================


Chris Mungall

unread,
Feb 16, 2009, 1:19:36 PM2/16/09
to obd...@googlegroups.com, Phenoscape Developers

On Feb 14, 2009, at 11:49 AM, Hilmar Lapp wrote:

>
> This is an open-source high-performance, scalable, potentially
> distributed semantic web (triple) data store:
>
> http://bigdata.sourceforge.net/pubs/bigdata-oscon-7-23-08.pdf
>
> It's open-source, and the presentation looks intriguing and looks like
> definitely worth trying out. It's apparently also among the largest
> RDF triple stores deployed (http://esw.w3.org/topic/
> LargeTripleStores).
>
> Chris - has anyone tried to layer the OBD API on top of this?

Nope. The RDFShard is designed for this - it is a bridge to the Jena
API. Even if BD can't be directly accessed via Jena, it presumably
supports a SPARQL endpoint which Jena should be able to access, at
least for reads (SPARQL is a read-only language)

However, the RDFShard hasn't been touched since the summer. In the
intervening time the core API capabilities have progressed
substantially, particularly for statistical and semantic similarity
measures. In the OBDSQL implementation, these make heavy use of
aggregate query operators (count, group by). There's no equivalent in
SPARQL, so to be honest I don't have a clue how to go about supporting
these in the RDFShard other than expensive read-all-objects-into-
memory-then-count methods.

Then there's also reasoning. Most triplestores come with basic RDFS
reasoning, which is probably insufficient for Phenoscape's purposes.
Other triplestores claim to support the right fragment of OWL but I'm
not sure how they fair on large TBoxes. If you were going to use a
triplestore you'd probably be forced to place as much as possible in
the ABox - which probably suits Jim for his taxon-as-instance
modeling! BD has a rule engine so it appears to be able to support *in
theory* the DLP subset of OWL with some work, so it's comparable with
OBD.

Hilmar Lapp

unread,
Feb 16, 2009, 1:39:54 PM2/16/09
to Chris Mungall, obd...@googlegroups.com, Phenoscape Developers

On Feb 16, 2009, at 1:19 PM, Chris Mungall wrote:

> the core API capabilities have progressed substantially,
> particularly for statistical and semantic similarity measures. In
> the OBDSQL implementation, these make heavy use of
> aggregate query operators (count, group by). There's no equivalent
> in SPARQL, so to be honest I don't have a clue how to go about
> supporting these in the RDFShard other than expensive read-all-
> objects-into-memory-then-count methods.


That's a good point. Personally, I tend to think that that
statistical and similarity searches may be better looked at in
similar ways as for sequence databases, or databases or high-
dimensional data like expression profiling. I.e., those should
probably better be external tools with their own optimized indexes
rather than being layered on top of the capabilities of SQL.
Otherwise algorithmic innovation may be too constrained too.

So I'm not too concerned about that part, actually, because I think
eventually this will be outside of SQL (or whatever storage model one
uses for the assertions and inferences) anyway.

Jim Balhoff

unread,
Feb 16, 2009, 1:55:53 PM2/16/09
to obd...@googlegroups.com, Chris Mungall, Phenoscape Developers
I think the paper Chris attached makes it clear how different the OWL
framework is from how biologists think of ontologies - the OBO style
seems much more direct and understandable (at least to me). It seems
like OBD provides a lot of value in its correspondence with users'
understanding of the data.

- Jim

Reply all
Reply to author
Forward
0 new messages