Dear Dendrophiles,
I am trying to filter trees from several posterior distributions according to whether or not they match one of a set of reference trees. The trees in any given posterior distribution may have a reduced number of taxa compared to the reference trees, so prior to comparison I prune down each reference to match the taxa in any given posterior sample. I've then been using the false_positives_and_negatives() tree member function to perform comparisons. However, I noticed that some of the values returned after this procedure seemed off. The false_positives_and_negatives() function was giving a result of (2,2) when comparing two trees that looked the same to me. Just to be complete, I printed these trees as newick strings, read them back in, and then did the comparison again. This time, I got the expected result of (0,0). I've probably made some simple error, but I can't figure out what it is.
Any help would be greatly appreciated. I'm using version 3.11.0. Below, I've pasted an example script, an example .t file, and the output that this produces.
Thanks,
Jeremy
Script:
#!/usr/bin/env python
import dendropy
# Define reference topology
refTree = dendropy.Tree.get_from_string("((galGal3,taeGut1),allMis0,(((hg19,ornAna1),chrPic0),(pytMol0,anoCar2)))",schema="newick")
# Read in posterior distribution (just one tree for this example)
postTrees = dendropy.TreeList(taxon_set=refTree.taxon_set)
postTrees.read_from_path("example.t","nexus",taxon_set=refTree.taxon_set,as_unrooted=True)
# Prune reference trees down to only taxa in posterior trees
for i in refTree.leaf_nodes():
if not (postTrees[0].find_node_with_taxon_label(i.get_node_str())):
refTree.prune_taxa_with_labels([i.get_node_str()],update_splits=True)
# Spit out newick trees for reference
print(postTrees[0].as_newick_string())
print(refTree.as_newick_string())
# Display trees to make sure reference tree has been pruned properly
postTrees[0].print_plot()
refTree.print_plot()
# Calculate false positives and negatives in posterior tree relative to reference
print(postTrees[0].false_positives_and_negatives(refTree))
# Convert both trees to strings, then read them back in
# Recalculate false positives and negatives in posterior tree relative to reference
testTree = dendropy.Tree.get_from_string(postTrees[0].as_newick_string(),schema="newick")
print(testTree.false_positives_and_negatives(dendropy.Tree.get_from_string(refTree.as_newick_string(),schema="newick",taxon_set=testTree.taxon_set)))
Tree file ("example.t"):
#NEXUS
[ID: 8249523985]
begin trees;
translate
1 hg19,
2 ornAna1,
3 galGal3,
4 taeGut1,
5 anoCar2,
6 chrPic0;
tree rep.3100 = (2:0.076762,(((3:0.058522,4:0.026272):0.095072,5:0.143657):0.008043,6:0.041739):0.155849,1:0.145005);
end;
Output:
(ornAna1:0.076762,(((galGal3:0.058522,taeGut1:0.026272):0.095072,anoCar2:0.143657):0.008043,chrPic0:0.041739):0.155849,hg19:0.145005)
((galGal3,taeGut1),((hg19,ornAna1),chrPic0),anoCar2)
/------------------------------------------------------------------------------- ornAna1
|
| /------------------- galGal3
| /-------------------+
| /-------------------+ \------------------- taeGut1
+ | |
|-------------------+ \--------------------------------------- anoCar2
| |
| \----------------------------------------------------------- chrPic0
|
\------------------------------------------------------------------------------- hg19
/-------------------------- galGal3
/----------------------------------------------------+
| \-------------------------- taeGut1
|
| /-------------------------- hg19
+ /--------------------------+
|-------------------------+ \-------------------------- ornAna1
| |
| \----------------------------------------------------- chrPic0
|
\------------------------------------------------------------------------------- anoCar2
(2, 2)
(0, 0)
Jeremy M. Brown
Assistant Professor
Louisiana State University
Dept. of Biological Sciences
202 Life Sciences Building
Baton Rouge, LA 70803