phylotastic tnrs use-case

4 views

Skip to first unread message

Arlin Stoltzfus

unread,

Nov 21, 2012, 5:14:40 PM11/21/12

to wg-...@googlegroups.com

regarding the phylotastic use-case for taxonomic name resolution, Matt thought that the attached slide was very helpful. The text below provides explanation. Method A is to run GNRD on the PDF of a scientific article, then use the output for a phylotastic tree query (which will extract the corresponding sub-tree from a mammal supertree, based on any matching binomials). This method is performed while you watch in www.youtube.com/watch?v=uCIKsDuhQnA .

Arlin

(this slide is explained in the screencast www.youtube.com/watch?v=uCIKsDuhQnA) Here are the results of 4 methods to get a species list to match the phylogeny of 40 mammals in Riek, 2011: (A) extract 43 binomials automatically from PDF using GNRD (www.gnrd.org); (B) copy & paste 40 binomials from the main data table, Table 1; (C) use manual keyboard entry to get a list of 40 binomials from the tree image in Figure 1; (D) manually reconcile names to match with source tree, using an expert reading of Riek, 2011, a local copy of the source tree from Bininda-Emonds, and interactive searches of online taxonomy resources.

All 4 lists are different. Method B gives a tree with 38 (adds Arctocephalus gazella, Phoca vitulina, lacks Papio cynocephalus, Felis catus, Oreamnos americanus, Ovis ammon). The data table includes some spelling errors that are not in the text or the tree image. Why doesn’t method C work perfectly, since they both come out of the same source tree (Bininda-Emonds, et al.)? Apparently the tree was edited, because it seems to have some names (F catus, S suricata, C manticola, P cynocephalus) that aren’t in the source tree.

The role of the TNRS component, ultimately, is to provide a method that is as fast and convenient as method A, but as accurate as method D. Currently, method D takes hours or days, depending on your skills in discovering and using online taxonomy resources. For some people, this would be so daunting as to constitute a major barrier.

-------
Arlin Stoltzfus (ar...@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org