[Obo-taxonomy] Proposal for a rank ontology (vocabulary)

2 views
Skip to first unread message

Peter E. Midford

unread,
May 28, 2009, 5:16:21 PM5/28/09
to obo-d...@lists.sourceforge.net, Phenoscape Project, obo-ta...@lists.sourceforge.net
Hello,
After over a year of discussing alternative approaches to
representing taxonomic ranks, I am submitting a proposal for an
ontology of taxonomic ranks that closely mirrors the way these are
implemented in the NCBI and Teleost taxonomy ontologies. This
ontology is a break-out of the rank terms that appear in these
taxonomy ontologies, allowing both ontologies, and others, to have a
single root for their terms.

As many of you have heard my proposals for representing taxa as
individuals, this design may come as a surprise. There are both
pragmatic and philosophical reasons for this choice. Pragmatically,
the current OBO tools (e.g., OBO Edit) are not setup to treat
instances as first class entities. Although I have discussed OBO-Edit
add-ons for converting taxonomies between instances and terms, storing
taxonomies as hierarchies of individuals would complicate use or reuse
by other projects. As several other projects have used the existing
TTO and NCBI taxonomy ontologies as models, changing TTO would
potentially split the community. Philosophically, I have come to see
that taxonomy ontologies are better considered as collections of
entities called 'taxon concepts' - information entities that appear in
publications - rather than clades. If taxonomy ontologies are about
published entities rather than clades, than most of the arguments for
the terms being metaphysical individuals rather than classes (e.g.,
those of Ghiselin) no longer apply.

The attached obo file contains a set of taxonomic rank terms that
includes the ranks from the NCBI taxonomy as well as a few additional
ones I picked up from reviewing the Zoological and Botanical codes.
I'll admit to a bit of zoological chauvinism in assigning phylum as a
primary term and division as a synonym, but as the plant ontology
people seem to have their own system for coding ranks in their
ontology, this preference for zoological terms reflects the majority
of the use community for this ontology. I have also removed any
ordering relation as having one seems to cause more trouble than it is
worth.

My apologies for having held this so long. I hope we can quickly
resolve this issue, either with this set or an alternative.

Thanks,

Peter

taxonomic_rank.obo

Jonathan Rees

unread,
Jun 4, 2009, 6:54:21 PM6/4/09
to Peter E. Midford, Phenoscape Project, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
On Thu, May 28, 2009 at 5:16 PM, Peter E. Midford
<petere...@yahoo.com> wrote:
> Hello,
> After over a year of discussing alternative approaches to
> representing taxonomic ranks, I am submitting a proposal for an ontology of
> taxonomic ranks that closely mirrors the way these are implemented in the
> NCBI and Teleost taxonomy ontologies. This ontology is a break-out of the
> rank terms that appear in these taxonomy ontologies, allowing both
> ontologies, and others, to have a single root for their terms.

Maybe this is off-topic, but what if one publication defines a taxon
as being of one rank, and another defines it as being of another rank?
Are family Cicindellidae and subfamily Cicindellinae, or order
Hemiptera and suborder Hemiptera, the same taxon?

It's OK if a taxon has no rank, right? (E.g. most of the interior
nodes on most phylogenies?) So you don't need to have an exhaustive
list.

> As many of you have heard my proposals for representing taxa as individuals,
> this design may come as a surprise. There are both pragmatic and
> philosophical reasons for this choice. Pragmatically, the current OBO tools
> (e.g., OBO Edit) are not setup to treat instances as first class entities.
> Although I have discussed OBO-Edit add-ons for converting taxonomies
> between instances and terms, storing taxonomies as hierarchies of
> individuals would complicate use or reuse by other projects. As several
> other projects have used the existing TTO and NCBI taxonomy ontologies as
> models, changing TTO would potentially split the community.
> Philosophically, I have come to see that taxonomy ontologies are better
> considered as collections of entities called 'taxon concepts' - information
> entities that appear in publications - rather than clades.

This makes sense to me given how much people argue about phylogeny.
One should define a taxon, and then separately argue over what's in it
or whether it's monophyletic.

However I urge you to avoid "concepts" and to distinguish the
publication from the taxon. The publication may define the taxon, but
it isn't the taxon, and neither is a concept. A taxon could easily be
a class of things (not sure what, species? organisms?), while a
publication is definitely not a class.

And I agree with Barry's question - if ranks are classes, what are
their members? Taxa?

> If taxonomy
> ontologies are about published entities rather than clades, than most of the
> arguments for the terms being metaphysical individuals rather than classes
> (e.g., those of Ghiselin) no longer apply.

Reference for the uninitiated?

> The attached obo file contains a set of taxonomic rank terms that includes
> the ranks from the NCBI taxonomy as well as a few additional ones I picked
> up from reviewing the Zoological and Botanical codes. I'll admit to a bit
> of zoological chauvinism in assigning phylum as a primary term and division
> as a synonym, but as the plant ontology people seem to have their own system
> for coding ranks in their ontology, this preference for zoological terms
> reflects the majority of the use community for this ontology. I have also
> removed any ordering relation as having one seems to cause more trouble than
> it is worth.

What is the practical impact of this list of ranks? What are the risks
of getting it wrong? I guess I'm sort of in the rank-free camp so I'm
not sure why this matters.

Sorry to trip into the middle of this; I don't know your use cases or
requirements.

Best
Jonathan

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises
looking to deploy the next generation of Solaris that includes the latest
innovations from Sun and the OpenSource community. Download a copy and
enjoy capabilities such as Networking, Storage and Virtualization.
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Obo-taxonomy mailing list
Obo-ta...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/obo-taxonomy

Hilmar Lapp

unread,
Jun 4, 2009, 9:01:12 PM6/4/09
to Jonathan Rees, Phenoscape Project, Peter E. Midford, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net

On Jun 4, 2009, at 6:54 PM, Jonathan Rees wrote:

> On Thu, May 28, 2009 at 5:16 PM, Peter E. Midford
> <petere...@yahoo.com> wrote:
>> Hello,
>> After over a year of discussing alternative approaches to
>> representing taxonomic ranks, I am submitting a proposal for an
>> ontology of
>> taxonomic ranks that closely mirrors the way these are implemented
>> in the
>> NCBI and Teleost taxonomy ontologies. This ontology is a break-
>> out of the
>> rank terms that appear in these taxonomy ontologies, allowing both
>> ontologies, and others, to have a single root for their terms.
>
> Maybe this is off-topic, but what if one publication defines a taxon
> as being of one rank, and another defines it as being of another rank?
> Are family Cicindellidae and subfamily Cicindellinae, or order
> Hemiptera and suborder Hemiptera, the same taxon?

Wouldn't they be differently named then? (The rank should be implicit
from the suffix, though I think there are different conventions for
plants, animals, and bacteria.)

> It's OK if a taxon has no rank, right? (E.g. most of the interior
> nodes on most phylogenies?)

Some of the interior nodes in the NCBI taxonomy don't have a rank I
believe, but those are typically the ones marked "not for display."
Note also that taxonomies are not phylogenies, but classifications. As
such, names are typically assigned to interior nodes.

-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
===========================================================

Wacek Kusnierczyk

unread,
Jun 5, 2009, 4:20:52 AM6/5/09
to obo-d...@lists.sourceforge.net, Phenoscape Project, Peter E. Midford, obo-ta...@lists.sourceforge.net
Jonathan Rees wrote:

[...]


> However I urge you to avoid "concepts" and to distinguish the
> publication from the taxon. The publication may define the taxon, but
> it isn't the taxon, and neither is a concept. A taxon could easily be
> a class of things (not sure what, species? organisms?), while a
> publication is definitely not a class.
>
> And I agree with Barry's question - if ranks are classes, what are
> their members? Taxa?
>

[...]

I have been loosely and passively following this discussion for a while,
and may have glossed over the details, but it seems to me that the best
approach is to start with having a look at some of the relevant
literature. The issue of what taxa and ranks are is by no means new,
and is more complex than the discussion here may seem to imply, and you
may and should want to avoid reinventing the wheel.

Unfortunately, I am no expert in this area, and have no complete
overview of the literature, but the following few publications [1-3]
seem relevant, even if they focus on species as a particular kind
(class, type, sort, have your pick) of taxa.

The problem of species in the context of formal ontology has been
(briefly and somewhat naively) considered by, e.g., Guarino and Welty [4].

You can surely find more discussion about these issues by following the
references given there.

Regards,
Wacek

[1] Species concepts: the basis for controversy and reconciliation.
Ghiselin. Fish and Fisheries 3:151 2002
[2] On biological species, species concepts and individuation in the
natural world. Mayden. Fish and Fisheries 3:171 2002
[3] A hierarchy of species concepts: the denouement in the saga of the
species problem. Maayden. In Claridge, Dawah, and Wilson, eds.
Species: the units of biodiversity. Chapman-Hall 1997, pp. 381+
[4] Evaluating ontological decisions with OntoClean. Comm. of the ACM
45:61+ 2002

Jonathan Rees

unread,
Jun 5, 2009, 1:01:33 PM6/5/09
to Wacek Kusnierczyk, Phenoscape Project, Peter E. Midford, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Thanks for the references. I'm sure Peter is an expert on them
already, but I'm not. On the other hand it appears to me that one goal
might be to attempt to sidestep completely the question of what a
species (or any other rank) is and leave the definitions up to
publications. This seems like a very good idea if the goal is data
integration.

My questions were just simple mechanical ones, I think. If OBO terms
name classes, and a rank (such as genus) is an OBO term, and classes
have members, then ranks have members, so what are they? Taxa, I
presume (e.g. the genus Pan, whatever that is, is an individual of
class 'genus'), but I don't know. (That doesn't automatically rule out
Pan also being a class itself.) If I'm wrong and ranks are entities of
an undetermined nature, then this question needn't be answered.

We've been asked to review something and I'm not sure what the review
criteria are (i.e. what risk might be associated with my saying "looks
great, go ahead"). My apologies for coming into the middle of this
conversation without proper background, but I am curious.

Jonathan

Peter E. Midford

unread,
Jun 5, 2009, 2:30:39 PM6/5/09
to Jonathan Rees, Wacek Kusnierczyk, Phenoscape Project, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Jonathan,
Sorry for the confusion, let me try to provide
some more context. This proposal is less than it seems - currently
OBO taxonomy ontologies (in particular the NCBI and teleost
taxonomies) contain two term trees: the actual taxonomy and a set of
rank terms. The proposal is simply to break out a set of rank terms
that NCBI, TTO and the several taxonomy ontologies in the pipeline to
joining the OBO foundry (amphibia, hymenoptera, maybe others) can share.

That being said, this discussion has exposed the issues raised by the
representation of ranks in the NCBI taxonomy, and the associated
metadata relation has_rank. As Chris Mungall, who designed the
approach in the first place, indicated:

> - ranks are not types in the sense used by BFO, RO etc. They are
> just convenient terms that are used to indicate depth in a taxonomy
> - has_rank is simply a way of associating a class in a taxonomy with
> its rank
> - the scheme is deliberately ontologically and logically weak, like,
> for example, GO slims.
>

Ranks are identifiers for relative levels in the hierarchy. You could
build sets of taxa that share a rank, but I don't think such sets
would be good candidates for classes. Treating them as simple
identifiers suggest individuals. However, the current OBO toolset
doesn't support individuals very well, as I discovered when I tried to
implement an OBO-format taxonomy which represented taxa as
individuals. Although the OBO file format supports INSTANCE stanzas,
I don't know what future plans, if any, there are to support them
beyond just being able to round-trip them through obo-tools. They are
currently invisible to an OBO-Edit user. However, the current scheme
for ranks and taxa plays pretty well with the OBO toolset.

There are other models (e.g., TDWG's taxon concept schema and ontology
format) for serializing taxonomy information. I am in the process of
extending the tool I use to construct the TTO to generate other
formats as well. I expect we will visit this issue again, especially
as multi-species OBO projects reach out to corresponding portions of
the biodiversity community.

I agree that "taxon concept" carries some baggage as a term. It seems
to be used either as the pairing of a name with publication
identifier, or as the taxonomic construct (either a species with a
type or a set of smaller taxonomic groupings) that is associated with
the name in the publication. Depending on the author's opinion of
monophyly, they would be claiming such a set to be either a
metaphysical class or a clade. These are the entities I am claiming
are the referents for terms in a taxonomic ontology. This works
well with our annotation process since we are annotating to the
appearance of a name in a publication, regardless of whether we agree
with the author's judgement as to whether their entity is a proper
class or clade.

TTO terms identify the publication, either by specifying it explicitly
in the name, or implicitly through references to entries in Catalog of
Fish database. The same applies to names in the NCBI ontology.

If we want to use explicitly phylogenetic reasoning, we can map these
names onto a phylogeny. Otherwise, traditional inheritance seems to
still have its place, especially in groups where information is
sparse, making character optimization dubious.

Hope this helps a bit,

Peter

> _______________________________________________
> Phenoscape mailing list
> Pheno...@nescent.org
> https://lists.nescent.org/mailman/listinfo/phenoscape

--
Peter Midford
Mesquite Developer

Richard Pyle

unread,
Jun 6, 2009, 1:10:35 PM6/6/09
to Peter E. Midford, Jonathan Rees, Wacek Kusnierczyk, Phenoscape Project, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net

Forgive me for commenting, as I have not been following this thread
completely, and I'm not entirely sure I fully understand the context.
However, I did want to make one comment/clarification.

While I certainly agree with this statement:

> I agree that "taxon concept" carries some baggage as a term.

I'm not so sure on this is right:

> It seems to be used either as the pairing of a name with
> publication identifier,

I am one of many who advocate using a pairing of a Name with a publication
identifier as a convenient handle/pointer to something that contains the
definition of a taxon concept, but not *as* a taxon concept per se. Sort of
like how a Citation to a publication points one to a publication, but is
not, in itself, a publication.

In the emerging Global Names Architecture, we refer to such pairings of
names & publication identifiers as "TaxonNameUsage" instances. These
instances represent convenient "anchorpoints" to both nomenclatural acts and
to taxon concept definitions (regardless of whether those definitions are
explicit or implicit within the contents represented by the TaxonNameUsage
instance itself).

Whether or not an identifier assigned to a TaxonNameUsage instance can
itself be used as an identifier to a TaxonConcept is debatable. I suspect
it is reasonable to assume 1:1 parity between the two entities in all cases,
but I don't know if such parity means that they are the "same thing".

I'm not sure if that makes sense, or whether I'm using appropriate terms.

> or as the taxonomic construct (either
> a species with a type or a set of smaller taxonomic
> groupings) that is associated with the name in the
> publication.

I think this is how most people would use the term "taxon concept". I
further think it's likely the case that a Taxon Concept is seen as a set,
with the "type" representing only an attribute of the Concept that links it
to a Taxon Name (and vice versa).

This all may very-well be irrelevant to the discussion at hand, in which
case I apologize.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deep...@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html

Peter E. Midford

unread,
Jun 6, 2009, 3:29:20 PM6/6/09
to Richard Pyle, Wacek Kusnierczyk, Phenoscape Project, obo-d...@lists.sourceforge.net, obo-ta...@lists.sourceforge.net
Richard,
Not irrelevant at all. I am happy to see you and
Roger joining this discussion as building bridges between OBO Foundry
taxonomy ontologies and TDWG and other interested parties in the
diversity community is very much on the agenda. This rank ontology is
just a small
first step. I am aware that TDWG (and others) is more focused on data
exchange, while OBO Foundry is more focussed on logical correctness
and supporting logically correct reasoning - this isn't going to
happen overnight.

And thank you for clearing that confusion up, I think the assumption
of a 1:1 pairing lead me to the confusion here.

Cheers,

Peter

--
Peter Midford
Phenoscape Taxonomy Curator

Reply all
Reply to author
Forward
0 new messages