[Obo-taxonomy] Resubmission of taxonomic rank ontology

2 views
Skip to first unread message

Peter Midford

unread,
Jan 5, 2010, 5:14:33 PM1/5/10
to obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Happy New Year everyone,
Most of you will remember that I submitted an ontology for taxonomic ranks a while ago. That submission generated a lot of interesting discussion about the implementation of taxonomic ontologies, but did not yield a candidate rank ontology. I still think it would be desirable to have such rank terms included in OBO in some fashion, other than the current approach of adding a second tree of rank terms to each ontology of taxonomic terms. I am therefore submitting a revised and updated vocabulary of taxonomic ranks for your consideration.

This submission reflects two major changes. First, it is strictly a vocabulary of ranks, without any interpretation of rank ordering or whether ranks are simply tags or something else. This allows each taxonomic ontology to use these terms as they deem appropriate. This hopefully avoids the metaphysical issues that were raised with the previous submission. Second, it incorporates, in addition to the rank terms used in the NCBI taxonomy, terms used by TDWG's vocabulary of taxonomic ranks - http://rs.tdwg.org/ontology/voc/TaxonRank. I have asked and been assured that the message on the TDWG page is correct and the URI's for these concepts will remain stable. In addition to adding TDWG rank terms not appearing in the NCBI taxonomy, I have added cross references to each term that appears in the TDWG vocabulary. I think it would be premature to add these URIs as alternate id's, but I'm open to being persuaded on this.

For those concerned about NCBI compatibility, all NCBI rank terms are included and cross referenced with NCBI prefixes. Further, I would point out that in the two cases of synonymy where NCBI uses the latin name (forma, varietas) and TDWG the anglicized name (form, variety), I have retained the NCBI usage as the term name.

By adding the TDWG terms, we gain interoperability with TDWG's (and hence GBIF's) usage, perhaps the first step towards a bridge between OBO and the Biodiversity community. This interoperability is an increasingly important issue for those projects that collect phenotypic data across multiple taxa, as much of the relevant data will be accessible through resources made available through the Biodiversity community.


Cheers,

Peter


taxonomic_rank.obo

Peter Midford

unread,
Jan 6, 2010, 10:55:00 AM1/6/10
to willy.v...@orionbiosciences.com, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Hi Willy,
In this case I'd like to just follow NCBI's lead on this, but it looks to me like NCBI isn't providing rank specification for the Baltimore categories (or numerous other terms). Nor do the corresponding terms in the OBO rendering of the NCBI taxonomy specify ranks. Likewise, TDWG's rank vocabulary doesn't seem to include anything specific to viruses either. Therefore, since this is a vocabulary of ranks (not a taxonomy) I see no way to proceed on adding rank terms for these levels until NCBI or another authority (there seem to be several for viral taxonomy) proposes suitable rank terms.

Cheers,

Peter
On Jan 5, 2010, at 18:05, Willy Valdivia-Granda wrote:

> Hi Peter,
>
> The taxonomy ontology will be an important step for several
> applications. However, I was wondering how the issue of "strain" is
> handle? In the case of viruses, NCBI uses the Baltimore classification
> as well standard taxonomical ranks associated with specific taxon_ids.
> In both cases, NCBI assigns parent and child taxonomy ids and
> increasingly taxonomy ids at the strain level. Are you considering this
> issue in the taxonomy ontology?
>
> Best regards,
>
> Willy Valdivia

>> ------------------------------------------------------------------------
>>
>>
>>
>>
>> Peter E. Midford
>> Phenoscape Ontology Curator
>>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Verizon Developer Community
>> Take advantage of Verizon's best-in-class app development support
>> A streamlined, 14 day to market process makes app distribution fast and easy
>> Join now and get one step closer to millions of Verizon customers
>> http://p.sf.net/sfu/verizon-dev2dev
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Obo-discuss mailing list
>> Obo-d...@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/obo-discuss
>

Peter E. Midford
Mesquite Developer
Peter....@gmail.com

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Obo-taxonomy mailing list
Obo-ta...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/obo-taxonomy

Chris Mungall

unread,
Jan 6, 2010, 12:24:33 PM1/6/10
to obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net

Hi Peter

Is there a stable URL for the obo file? If so I will register it on
the OBO registry as a first step.

> <taxonomic_rank.obo>


>
>
> Peter E. Midford
> Phenoscape Ontology Curator

> Peter....@gmail.com
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast
> and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________

Jonathan Rees

unread,
Jan 6, 2010, 12:38:50 PM1/6/10
to Peter Midford, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Hi Peter,

The point about TDWG and GBIF interoperation with OBO is dear to my
heart, and this does seem like a step in the right direction, so
thanks for persisting. I worry a bit about foundry principle 6 "The
ontologies include textual definitions for all terms." As there is no
objective way to determine whether any given taxon is a class (order,
family, genus, ...) or not, the Foundry ideal seems impossible. On the
other hand it does not seem to be necessary in this particular case,
and no one has said that this vocabulary has to participate in the
Foundry.

But I did a quick check anyhow for definitions and/or citations. The
TDWG list seems good, i.e. much more informative than a simple list,
so those xrefs look pretty happy. I couldn't follow my nose to
NCBITaxon:xxx definitions; not to say it's not possible but I just
didn't know where to look (not apparent from a quick scan of
taxonomy.dat). And there is one term in your list, species_complex,
that has no cross-reference of any kind, so does not even meet TDWG's
bar.

I suppose a pointer to taxonomy.dat is a huge usage example, but I'd
be happier with independent pointers to influential literature
(treatises, monographs, revisions, survey articles, whatever).

Is TDWG committed to helping maintain this list? In addition to, or
instead of, its own?

Is contained_in transitive? I'm thinking there might be risk to
clients of the "ontology" when a new rank is inserted between existing
ones. Is the expectation that one would compute the transitive
closure, and program against that, as opposed to against the asserted
contained_in relationships?

Just for my own edification I'd like to see examples of how the terms
would be used both in and out of OBO. E.g. ideally how would you like
to see amphibian_taxonomy or the other taxonomy .obo files changed to
make use of the controlled ranks list?

Best
Jonathan

> Peter E.  Midford
> Phenoscape Ontology Curator
> Peter....@gmail.com
>
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________

Arlin Stoltzfus

unread,
Jan 6, 2010, 2:36:20 PM1/6/10
to Jonathan Rees, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
On Jan 6, 2010, at 12:38 PM, Jonathan Rees wrote:

> I suppose a pointer to taxonomy.dat is a huge usage example, but I'd
> be happier with independent pointers to influential literature
> (treatises, monographs, revisions, survey articles, whatever).

The TDWG vocabulary for taxonomic ranks (http://rs.tdwg.org/ontology/voc/TaxonRank
) has reference information citing scholarly monographs as well as
individual articles and online information resources. This info could
be transferred to Peter's ontology since it is based on the TDWG
concepts. Were you volunteering to do that :-?

> Is contained_in transitive? I'm thinking there might be risk to
> clients of the "ontology" when a new rank is inserted between existing
> ones. Is the expectation that one would compute the transitive
> closure, and program against that, as opposed to against the asserted
> contained_in relationships?

I'm not sure that I understand what is the issue here. Lets suppose
that our ranking has R1 contains R2 contains R3 contains R4 . . . and
we want to insert a new rank R2.5 between R2 and R3. The definition
of ranks allows that a taxonomy need not have every rank, so its
possible for taxonomies to have R2 and R3 but not R2.5.

Inserting a new rank only breaks the chain of transitive closure if we
fail to assert both of the relevant contained_in relationships,
right? For instance if we only assert R2.5 contained_in R2, then we
could reason R2.5 contained_in R1, but we would miss R4 contained_in
R2.5. If the relevant contained_in relationships are asserted
whenever a new rank is inserted, what is the problem? Or do I have
this all wrong?

Arlin
-------
Arlin Stoltzfus (stol...@umbi.umd.edu)
Fellow, CARB; Adj. Assoc. Prof., UMBI; Research Biologist, NIST
CARB, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org

Jonathan Rees

unread,
Jan 6, 2010, 2:49:35 PM1/6/10
to Arlin Stoltzfus, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
On Wed, Jan 6, 2010 at 2:36 PM, Arlin Stoltzfus <stol...@umbi.umd.edu> wrote:
> On Jan 6, 2010, at 12:38 PM, Jonathan Rees wrote:
>
>> I suppose a pointer to taxonomy.dat is a huge usage example, but I'd
>> be happier with independent pointers to influential literature
>> (treatises, monographs, revisions, survey articles, whatever).
>
> The TDWG vocabulary for taxonomic ranks
> (http://rs.tdwg.org/ontology/voc/TaxonRank) has reference information citing

> scholarly monographs as well as individual articles and online information
> resources.  This info could be transferred to Peter's ontology since it is
> based on the TDWG concepts. Were you volunteering to do that :-?

I was referring in this instance to the NCBI set, not to the TDWG set.
If the NCBI rank set is a subset of the TDWG rank set then there is
no problem. Is it?

There's no reason to duplicate information that's in a cited document,
as long as the cited document is likely to be well-behaved for a long
time, and TDWG has committed to that, I think.

>> Is contained_in transitive? I'm thinking there might be risk to
>> clients of the "ontology" when a new rank is inserted between existing
>> ones. Is the expectation that one would compute the transitive
>> closure, and program against that, as opposed to against the asserted
>> contained_in relationships?
>
> I'm not sure that I understand what is the issue here.  Lets suppose that
> our ranking has R1 contains R2 contains R3 contains R4 . . . and we want to
> insert a  new rank R2.5 between R2 and R3.  The definition of ranks allows
> that a taxonomy need not have every rank, so its possible for taxonomies to
> have R2 and R3 but not R2.5.
>
> Inserting a new rank only breaks the chain of transitive closure if we fail
> to assert both of the relevant contained_in relationships, right?  For
> instance if we only assert R2.5 contained_in R2, then we could reason R2.5
> contained_in R1, but we would miss R4 contained_in R2.5.  If the relevant
> contained_in relationships are asserted whenever a new rank is inserted,
> what is the problem?  Or do I have this all wrong?

You understand perfectly. I think you are answering my question of
whether contained_in is transitive in the affirmative, and my question
of whether applications should reason using only the transitive
closure of the asserted contained_in also in the affirmative. The risk
would be only to a program that assumed R3 asserted to be contained_in
R2, and did not do transitive closure, since after the insertion of
R2.5, R3 would no longer be asserted to be contained_in R2. The short
answer is: Don't do that.

Peter Midford

unread,
Jan 6, 2010, 3:02:08 PM1/6/10
to Jonathan Rees, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Hi Jonathan,
Thanks for the comments. The contained_in links that were unintentionally left in the ontology have been removed, as was my intent - this is meant as a pure vocabulary without any imposed ordering of ranks. I've attached a corrected version. You are also right about species complex not having any cross ref - that term was suggested by Michael Ashburner in support of the taxonomic ontology for flies that is listed as an OBO candidate. I've reviewed that ontology and it appears that species_complex might be best treated as a synonym for species_subgroup, but I think that's a question for a fly taxonomist, so I'll leave it for now - it can always be formally obsoleted if this isn't easy to resolve.

I agree that there don't seem to any definitions of ranks or taxa in the NCBI taxonomy - the only definitions in OBO file are taxonomic_rank itself and the has_rank relation. The TDWG definitions aren't exactly up to Foundry standards, but coming up with a useful 'genus differentia' definition for a rank would either be so trivial as to be meaningless (if we treat ranks as tags) or reintroduce the metaphysical issues that caused problems with the last submission (including the rank ordering issue).

I've announced this ontology to TDWG-content, so the relevant people (e.g., Roger Hyam) know about it, but I expect keeping it up to date will fall to me or someone else getting update messages once they have finished their move to google-code. I suppose updating this vocabulary from TDWG could be automated as well after their changes have settled, but I'm not volunteering to write that script.

As for use examples, I can best speak for TTO, where we will remove the rank vocabulary that is currently included in the ontology and update each use of has_rank to point to corresponding term in the rank ontology. It looks like the amphibian taxonomy is now set up in a similar fashion, with a tree of rank terms, so they could use the rank ontology as well (though I notice they include a few unfamiliar rank terms).


Thanks for looking this over,

Peter


taxonomic_rank.obo

Peter Midford

unread,
Jan 6, 2010, 4:02:15 PM1/6/10
to Arlin Stoltzfus, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Arlin,
I see Jonathan has already responded to your comments, but as I just responded to him, the contained_in relation was left from a previous submission and was intended to be removed. The problem as I see it isn't one of transitivity, which we could handle as Jonathan outlined, but more one of domain and range and that gets back to whether taxa and ranks are individuals or classes. As terms they would be, by default, classes, but OBO relations are supposed to be defined for classes in terms of relations between individuals (see the obo relations paper, and Schutz et al. 2008 makes some points relevant to this as well). The has_rank relation that the OBO version of NCBI taxonomy, the TTO, and the ATO already use is somewhat problematic in this way, but Chris Mungall intended it to be used for annotation rather than reasoning, so these ontologies continue to use it.

Also note that different codes of nomenclature use different terms at the same relative level that may or may not be synonymous. Thus the structure we would be building with contained_in would be a lattice or at least a tree, rather than a linear chain.

That said, I won't mind having an ordering, but two years after I started looking at defining one, I'll leave it for someone else.

Peter

Peter E. Midford
Mesquite Developer
Peter....@gmail.com

------------------------------------------------------------------------------

Michael Ashburner

unread,
Jan 6, 2010, 4:40:30 PM1/6/10
to obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net, Michael Ashburner
Peter

Species complex and species subgroup are not used synonymously by
drosophila taxonomists.

the hierarchy is

species group
species subgroup
species complex

Michael

> <taxonomic_rank.obo>

> Peter E. Midford
> Mesquite Developer

> Peter....@gmail.com
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast
> and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________

Peter Midford

unread,
Jan 6, 2010, 5:01:39 PM1/6/10
to Michael Ashburner, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Michael,
Thanks, I'll keep leave the term as it is then.

Peter

Ward Blondé

unread,
Jan 7, 2010, 8:55:54 AM1/7/10
to obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net, Michael Ashburner

Dear OBO people,

I wonder whether there is any policy for the OBO-Foundry to distinguish
classes from meta-classes. As the discussion about taxonomy shows,
scientists want to speak about 'concepts' like species, family, group,
set, class, etc., which have classes as instances, and are therefore
meta-classes. E.g. what is the formal relation between 'lion' and 'species'?

lion instance_of species??

Is there anything in the syntax of OBOF to see when a 'Term' is in fact
a meta-class?

thanks,
Ward

Michael Ashburner wrote:
> Peter
>
> Species complex and species subgroup are not used synonymously by
> drosophila taxonomists.
>
> the hierarchy is
>
> species group
> species subgroup
> species complex
>
>

------------------------------------------------------------------------------

Peter Midford

unread,
Jan 7, 2010, 11:26:15 AM1/7/10
to Ward Blondé, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Hi Ward,
Although a few people favor the use of meta-classes, several taxonomy ontologies (NCBI, TTO, ATO) are treating them as 'annotations' or metadata - the has_rank property is not intended to be reasoned with, therefore the ranks are not considered meta-classes. This is one reason I pulled the contained_in relation from the rank vocabulary - so that people would be less tempted to think of these as meta-classes.

cheers,

Peter

Peter E. Midford
Phenoscape Ontology Curator
Peter....@gmail.com

------------------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages