entry: Phenoscape Knowledgebase

4 views
Skip to first unread message

Jim Balhoff

unread,
Jun 19, 2011, 2:39:26 PM6/19/11
to chal...@ievobio.org, Wasila Dahdul, Hilmar Lapp, Paula Mabee, Peter Midford, Todd Vision, Monte Westerfield
Hi,

I would like to enter the following abstract into the iEvoBio data integration challenge:

The Phenoscape Knowledgebase
Authors: James Balhoff, Wasila Dahdul, Hilmar Lapp, Paula Mabee, Peter Midford, Todd Vision, and Monte Westerfield
URL: http://kb.phenoscape.org/

Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. We have created the Phenoscape Knowledgebase, which consists of a database and web application (http://kb.phenoscape.org/). The database combines ontologically annotated phenotypic character data for a large and diverse group of fishes with phenotypic annotations from the ZFIN model organism database. The web application provides query and browsing interfaces which allow users to exploit the the logical framework provide by the ontologies which underpin the data.

We used OBD ("Ontology-based Database") to store phenotypic data, from ~50 phylogenetic publications, as statements using terms from ten different OBO ontologies. The phenotypic data, taxa, and specimens in these published data sets were annotated with ontology terms using our curation application, Phenex. In this process free-text phenotype descriptions were converted to semantic representations using an Entity-Quality (EQ) model, combining terms from separate anatomical and qualitative ontologies. The ontologies and annotated data sets, along with EQ phenotype annotations for zebrafish genes, exported from the ZFIN database, were loaded into OBD using its own triple-based schema. We used the SQL-based OBD reasoner to pre-compute inferred statements and add them to the Knowledgebase.

We developed a web services API providing access to the Knowledgebase using the Restlet Java framework. We also developed a Ruby on Rails-based end-user web interface, which allows biologists to query the Knowledgebase, accessing the data via these public web services.

The Phenoscape Knowledgebase integrates over 500,000 asserted phenotype statements, concerning ~2500 fish species, with over 20,000 phenotype statements linked to over 3700 zebrafish genes. Users can discover fish species matching arbitrary phenotypic profiles, which can be expressed as queries making use of the hierarchical nature of anatomical, qualitative, and taxonomic ontologies. Moreover, genes influencing these phenotypes can be simultaneously returned. At the same time users can visualize the structure and explore term definitions of the included ontologies. The Knowledgebase has been used to investigate patterns of anatomical coverage within published phylogenetic characters, as well as to generate hypotheses for candidate genes underlying evolutionary losses of both scales and skeletal elements.

Ontological annotations of free-text phenotypic data, built with shared community-driven ontologies, constitute a powerful resource when aggregated within a database system which makes full use of the semantic framework provided by those ontologies. For the first time, scientists can search phenotypic content from dozens of phylogenetic publications, querying across anatomical, qualitative, and taxonomic axes.

____________________________________________
James P. Balhoff, Ph.D.
National Evolutionary Synthesis Center
2024 West Main St., Suite A200
Durham, NC 27705
USA

Reply all
Reply to author
Forward
Message has been deleted
0 new messages