regarding the phylotastic use-case for taxonomic name resolution, Matt thought that the attached slide was very helpful. The text below provides explanation. Method A is to run GNRD on the PDF of a scientific article, then use the output for a phylotastic tree query (which will extract the corresponding sub-tree from a mammal supertree, based on any matching binomials). This method is performed while you watch in
www.youtube.com/watch?v=uCIKsDuhQnA .
Arlin
(this slide is explained in the screencast www.youtube.com/watch?v=uCIKsDuhQnA) Here are the results of 4 methods to get a species list to match the
phylogeny of 40 mammals in Riek, 2011: (A) extract 43 binomials automatically from PDF using GNRD (www.gnrd.org); (B) copy &
paste 40 binomials from the main data table, Table 1; (C) use manual keyboard
entry to get a list of 40 binomials from the tree image in Figure 1; (D)
manually reconcile names to match with source tree, using an expert reading of Riek, 2011, a local copy
of the source tree from Bininda-Emonds, and interactive
searches of online taxonomy resources.
All 4 lists are different. Method B
gives a tree with 38 (adds Arctocephalus gazella, Phoca vitulina, lacks Papio cynocephalus, Felis catus, Oreamnos americanus, Ovis ammon). The data table
includes some spelling errors that are not in the text or the tree image. Why doesn’t method C work perfectly,
since they both come out of the same source tree (Bininda-Emonds, et al.)? Apparently the tree was edited, because
it seems to have some names (F catus, S suricata, C manticola, P cynocephalus)
that aren’t in the source tree.
The role of the TNRS component,
ultimately, is to provide a method that is as fast and convenient as method A,
but as accurate as method D.
Currently, method D takes hours or days, depending on your skills in discovering
and using online taxonomy resources.
For some people, this would be so daunting as to constitute a major
barrier.