[Obo-taxonomy] Updates and disjoints of NCBI taxonomy in OWL

2 views
Skip to first unread message

Bjoern Peters

unread,
Nov 6, 2012, 5:44:34 PM11/6/12
to obo-ta...@lists.sourceforge.net, Chris Mungall, Alan Ruttenberg, James A. Overton, Randi Vita
Hi, 

For IEDB work, we now want to use the OWL representation of the NCBI taxonomy from http://www.obofoundry.org/cgi-bin/detail.cgi?id=ncbi_taxonomy . We have run into two issues: 

- We continuously need the most current version of the NCBI taxonomy available. Presently, the version is about a year old, and missing several identifiers from the last release. Would it be possible to sync the updates with those at NCBI? 

- For reasoning purposes, we need disjoints between siblings in the taxonomy. Does anything speak against adding these? 

Who is the right person to contact about this? We would be happy to help in the implementation. 

- Bjoern





--
Bjoern Peters
Assistant Professor
La Jolla Institute for Allergy and Immunology
9420 Athena Circle
La Jolla, CA 92037, USA
Tel: 858/752-6914
Fax: 858/752-6987
http://www.liai.org/pages/faculty-peters

Chris Mungall

unread,
Nov 7, 2012, 1:57:10 PM11/7/12
to Bjoern Peters, Randi Vita, obo-ta...@lists.sourceforge.net, James A. Overton

Hi Bjoern,

We're just transitioning the build to Jenkins so it will be on a regular release cycle. Is weekly fine? I'll report back later today.

Regarding disjointness axioms - I would rather add these as a separate ontology at least at first. These could then be imported with the main ontology. Would this work for you? The reasons not to immediately add these:

 - inflates the already large owl file - we should at least give people time to change their workflow to pull in a disjoint-free version
 - potential impact on reasoning performance - ditto the above
 - biological correctness - I think disjointness is a reasonable assumption, at least if we exclude some of the curious non-organism environmental sample branches etc, but others should comment here

How about going all the way of disjoint union axioms between all siblings (ie making the taxonomy JEPD), at least for any node above species? This would be more controversial, but potentially a very useful assumption that potentially gives you powerful entailments.

Is your primary use case MIREOTing, e.g. for OBI? In which case you would want to MIREOT any disjointness axioms, but you would necessarily lose disjoint union axioms, unless you're MIREOTing in an entire sibling set. Not sure how OntoFox handles this. Also the decision as to whether to distribute addition axioms as a separate ontology may affect how this could be used in OntoFox.

Chris Mungall

unread,
Nov 7, 2012, 3:59:36 PM11/7/12
to Hilmar Lapp, Randi Vita, obo-ta...@lists.sourceforge.net, Bjoern Peters, James A. Overton

On Nov 7, 2012, at 12:34 PM, Hilmar Lapp wrote:


On Nov 7, 2012, at 1:57 PM, Chris Mungall wrote:

 - potential impact on reasoning performance - ditto the above

Disjointness axioms should actually improve reasoning performance, if anything, shouldn't they? They'd cut out some of the "satisfiable nonsense" (quoting a Google engineer explained this quite nicely).

potentially, but I would expect this to be dependent on the particular reasoner and the algorithm and heuristics it applies.

 - biological correctness - I think disjointness is a reasonable assumption, at least if we exclude some of the curious non-organism environmental sample branches etc, but others should comment here

Are you thinking about hybrids here?

Not specifically - good point though, although I don't think hybrids should be represented using MI in a taxonomy.

One more prosaic scenario is that people may be creating their own intersections as a means of getting around issues with the ncbi taxonomy. For GO we create our own unions (which is inherently safer that making intersections, but you never know what people are doing)


-hilmar
-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================




Hilmar Lapp

unread,
Nov 7, 2012, 3:34:22 PM11/7/12
to Chris Mungall, Randi Vita, obo-ta...@lists.sourceforge.net, Bjoern Peters, James A. Overton
On Nov 7, 2012, at 1:57 PM, Chris Mungall wrote:

 - potential impact on reasoning performance - ditto the above

Disjointness axioms should actually improve reasoning performance, if anything, shouldn't they? They'd cut out some of the "satisfiable nonsense" (quoting a Google engineer explained this quite nicely).

 - biological correctness - I think disjointness is a reasonable assumption, at least if we exclude some of the curious non-organism environmental sample branches etc, but others should comment here
Are you thinking about hybrids here?

James A. Overton

unread,
Nov 7, 2012, 5:25:32 PM11/7/12
to Chris Mungall, obo-ta...@lists.sourceforge.net, Bjoern Peters, Randi Vita
Hi Chris,

I'm working with Bjoern on this.

Weekly releases would be great.

Our current use case has us extracting a subset of NCBI Taxonomy and then building a shallower tree that IEDB curators and users can navigate more easily than the full taxonomy. Our groupings are quite different from GO extensions you linked to. We have some union classes, but there are also some intersections and complements that we'd like to use.

Keeping the disjoints in a separate file, at least initially as you suggested, makes good sense. I can see that MIREOTing disjointness axioms could be tricky. For our current purposes we can write code to add the disjointness assertions for our subset. But I'd very much like to know whether adding disjoint union axioms between all siblings (at least for any node above species, and maybe excluding some edge case, as you suggested) is biologically correct.

I'm happy to help out with the implementation.

James

Alan Ruttenberg

unread,
Nov 9, 2012, 7:49:34 PM11/9/12
to Chris Mungall, James A. Overton, obo-ta...@lists.sourceforge.net, Bjoern Peters, Randi Vita

On Nov 7, 2012, at 1:57 PM, Chris Mungall <cjmu...@lbl.gov> wrote:


Hi Bjoern,

We're just transitioning the build to Jenkins so it will be on a regular release cycle. Is weekly fine? I'll report back later today.

Regarding disjointness axioms - I would rather add these as a separate ontology at least at first. These could then be imported with the main ontology. Would this work for you? The reasons not to immediately add these:

 - inflates the already large owl file - we should at least give people time to change their workflow to pull in a disjoint-free version
 - potential impact on reasoning performance - ditto the above
 - biological correctness - I think disjointness is a reasonable assumption, at least if we exclude some of the curious non-organism environmental sample branches etc, but others should comment here

How about going all the way of disjoint union axioms between all siblings (ie making the taxonomy JEPD), at least for any node above species? This would be more controversial, but potentially a very useful assumption that potentially gives you powerful entailments.

This would probably be incorrect in many cases - species are found all the time, and even higher branches have changed. I'd be against any automatically applied (or applied without review) covering axioms. 


Is your primary use case MIREOTing, e.g. for OBI? In which case you would want to MIREOT any disjointness axioms, but you would necessarily lose disjoint union axioms, unless you're MIREOTing in an entire sibling set. Not sure how OntoFox handles this. Also the decision as to whether to distribute addition axioms as a separate ontology may affect how this could be used in OntoFox.

You can accomplish the disjointed in a distributed way by having a functional property and a hasValue axiom with a distinct value per species (the value could be the uri string, for example). This will make extraction and mirroring easier. 

-Alan

Chris Mungall

unread,
Nov 9, 2012, 8:22:02 PM11/9/12
to Alan Ruttenberg, James A. Overton, obo-ta...@lists.sourceforge.net, Bjoern Peters, Randi Vita
On Nov 9, 2012, at 4:49 PM, Alan Ruttenberg wrote:


On Nov 7, 2012, at 1:57 PM, Chris Mungall <cjmu...@lbl.gov> wrote:


Hi Bjoern,

We're just transitioning the build to Jenkins so it will be on a regular release cycle. Is weekly fine? I'll report back later today.

Regarding disjointness axioms - I would rather add these as a separate ontology at least at first. These could then be imported with the main ontology. Would this work for you? The reasons not to immediately add these:

 - inflates the already large owl file - we should at least give people time to change their workflow to pull in a disjoint-free version
 - potential impact on reasoning performance - ditto the above
 - biological correctness - I think disjointness is a reasonable assumption, at least if we exclude some of the curious non-organism environmental sample branches etc, but others should comment here

How about going all the way of disjoint union axioms between all siblings (ie making the taxonomy JEPD), at least for any node above species? This would be more controversial, but potentially a very useful assumption that potentially gives you powerful entailments.

This would probably be incorrect in many cases - species are found all the time, and even higher branches have changed. I'd be against any automatically applied (or applied without review) covering axioms. 

yes, I think it would be a mistake to include these covering axioms in the main ontology but as an optional adjunct it could be useful, as for many bioinformatics purposes, you can assume all species are known, and you're already making the assumption the taxonomy is correct.




Is your primary use case MIREOTing, e.g. for OBI? In which case you would want to MIREOT any disjointness axioms, but you would necessarily lose disjoint union axioms, unless you're MIREOTing in an entire sibling set. Not sure how OntoFox handles this. Also the decision as to whether to distribute addition axioms as a separate ontology may affect how this could be used in OntoFox.

You can accomplish the disjointed in a distributed way by having a functional property and a hasValue axiom with a distinct value per species (the value could be the uri string, for example). This will make extraction and mirroring easier. 

Neat idea - but I personally prefer to keep things obvious and free of artificial individuals. YMMV. Also, this DL pattern is less useful for those of us who love http://purl.obolibrary.org/obo/NCBITaxon_9860

Alan Ruttenberg

unread,
Nov 9, 2012, 9:02:48 PM11/9/12
to Chris Mungall, James A. Overton, obo-ta...@lists.sourceforge.net, Bjoern Peters, Randi Vita
I do think there are a class of such things - ontology documents with assertions that can be put to computational use. We should, if we choose to include such, document their purposes well enough that people don't mistake them for something they are not. 

-Alan

Chris Mungall

unread,
Nov 9, 2012, 9:38:04 PM11/9/12
to Bjoern Peters, Randi Vita, obo-ta...@lists.sourceforge.net, James A. Overton

Update: obo/ncbitaxon.owl now reflects the contents of the source NCBI taxonomy as of Nov 7.

James is taking a look at replacing the old perl code that's part of the pipeline - this will allow us to run a weekly job on the OBO jenkins server keeping the owl up to date. 

On Nov 6, 2012, at 2:44 PM, Bjoern Peters wrote:

Reply all
Reply to author
Forward
0 new messages