A common problem with data sharing in phylogenetics is that OTU names
do not match between files, e.g., between the alignment and the tree
from the same study. I think I heard it from Bill that this is a
common problem in TreeBASE submissions. I have encountered it many
times and have thought about how to design software to deal with the
problem.
After discussing this with Vivek, I decided to make a more formal
description of the problem which is available here (sorry about the
pptx format):
http://dl.dropbox.com/u/7727158/name_matching.pptx
This includes real examples of mismatched names collected in the wild,
an explanation of why the problem occurs, mock-ups of interactive
user sessions, and implementation notes. Vivek already started
playing with some of the concepts and put an app on appspot (the link
is in the presentation).
Comments are welcome. If implemented as described, how well would
this tool serve the community need for name-matching? What would make
it better?
Arlin
-------
Arlin Stoltzfus (ar...@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org
--
You received this message because you are subscribed to the Google
Groups "MIAPA" group.
For more options, visit this group at
http://groups.google.com/group/miapa-discuss?hl=en
Indeed.
> C- The basic data model of matrix-rows-matching-with-tree-OTUs works for 99%
> of datasets, but a growing number of studies use BEAST species inference
> (and other similar methods) where the tree ends in species OTUs, but the
> alignment has many more haplotype OTUs. -- i.e. there is, on purpose, a
> complete mismatch between alignment row labels and tree OTUs. Mesquite can
> handle this using a taxon association table, though I don't know that this
> is formal NEXUS or just a Mesquite invention. I don't think that NeXML or
> PhyloML can handle this. This calls for expanding the capabilities of NeXML
> and PhyloML.
Yes and no. Multiple matrix rows can reference the same otu, but
that's not quite what we want. Multiple, separately annotatable matrix
row segments would be a good feature to have, also for TreeBASE's
needs.
--
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Treebase-devel mailing list
Treebas...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/treebase-devel
Mapping tree names to matrix names could be formulated as a bipartite matching problem, where we have two lists of names and want to find the best matching. See http://iphylo.blogspot.com/2007/09/matching-names-in-phylogeny-data-files.html for more details.
Hi all,I just noticed that Hilmar tweeted a link to Linnaeus: http://linnaeus.sourceforge.net/ which seems relevant to this thread.all the best,Mark
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
user administration capabilities and model configuration. Take
the hassle out of deploying and managing Subversion and the
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2_______________________________________________
Treebase-devel mailing list
Treebas...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/treebase-devel
Hi all,I just noticed that Hilmar tweeted a link to Linnaeus: http://linnaeus.sourceforge.net/ which seems relevant to this thread.all the best,Mark
On Aug 19, 2011, at 11:06 AM, Arlin Stoltzfus wrote:
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
user administration capabilities and model configuration. Take
the hassle out of deploying and managing Subversion and the
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2_______________________________________________
Treebase-devel mailing list
Treebas...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/treebase-devel
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Treebase-devel mailing list
Treebas...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/treebase-devel
--
You received this message because you are subscribed to the Google
Groups "MIAPA" group.
For more options, visit this group at
http://groups.google.com/group/miapa-discuss?hl=en