1. When calculating the symmetric difference, DendroPy has to calculate
the splits/bipartition hashes.
2. When doing this on an unrooted tree, the basal (root) split has to
be collapsed for correct behavior.
3. This is counter to most folks' (initial) intuition, and many a
investigator has been thrown off by this, to the point of scornful
skepticism, bitter frustration, adamant denial, condescending criticism,
or righteous fury.
4. Emotional responses nonwithstanding, these folks are wrong. Nothing
changes the fact that the collapse of the basal bifurcation on unrooted
trees is absolutely necessary to produce correct (and, more importantly,
arguably, predictable) results when comparing splits across trees. And
produce correct results it does.
5. The reason for this is briefly mentioned here (extracted from a series
of email responses):
***
Note that with unrooted trees, the basal bifurcation will be
collapsed. This is (briefly) mentioned here:
http://dendropy.org/primer/bipartitions.html . No real explanation is
given in the documentation, but the reason is that, on unrooted trees,
retaining the basal bifurcation results in an artifactually redundant
redundant splits hash on the tree. The practical effect of this is
rooted trees and unrooted trees cannot be predictably and meaningfully
compared.
Incidentally, the intuition behind this working is that, no matter where
the root is placed on a tree, the split immediately below that is
excluded from analysis. But instead of seeing this as different splits
being excluded from different trees depending on the rooting, we should
see it as an entire set of splits being excluded across all analysis,
and each unrooted tree has one and exactly one split belonging to that
set.
In other words, it is impossible for a split that has been excluded from
analysis due to the basal bifurcation being collapsed on any given
unrooted tree to be found on any other unrooted tree that has a
different pseudo-root placement, so in terms of support for or against
any particular tree, this excluded split cannot count in any way. And,
of course, if another tree has the same root placement, then it too will
have the same split excluded.
***
6. And here is a script that explores this:
~~~
#! /usr/bin/env python
import dendropy
import math
trees_str = """\
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] ((A,B),(C,(D,E)));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] (((A,B),C),(D,E));
[&U] ((((A,B),C),D),E);
[&U] ((((A,B),C),D),E);
[&U] ((((A,B),C),D),E);
[&U] ((((A,B),C),D),E);
[&U] ((((A,B),C),D),E);
[&U] ((((A,B),C),E),D);
[&U] ((((A,B),C),E),D);
[&U] ((((A,B),C),E),D);
[&U] ((((A,B),C),E),D);
[&U] (A,(B,(C,(D,E))));
[&U] (A,(B,(C,(D,E))));
[&U] (A,(B,(C,(D,E))));
[&U] (A,(B,(C,(D,E))));
[&U] (B,(A,(C,(D,E))));
[&U] (B,(A,(C,(D,E))));
[&U] (B,(A,(C,(D,E))));
[&U] (B,(A,(C,(D,E))));
[&U] (B,(A,(C,(D,E))));
"""
def run(rooting):
trees = dendropy.TreeList.get(
data=trees_str,
schema="newick",
rooting=rooting)
tree_array = trees.as_tree_array()
log_split_supports, best_tree =
tree_array.calculate_log_product_of_split_supports()
for log_split_support in log_split_supports:
print("{}".format(math.exp(log_split_support)))
print("-- Unrooted -- ")
run("force-unrooted")
print("-- Rooted -- ")
run("force-rooted")
~~~
with results:
~~~
-- Unrooted --
1.0
1.0
...
...
...
-- Rooted --
0.29956851312
0.29956851312
...
0.252268221574
0.252268221574
...
0.339591836735
0.339591836735
...
0.403265306122
0.403265306122
...
~~~
7. The moral of this story is that you have to always be aware, and if
necessary, manage the rooting state of your trees. If you want rooted
trees from a Nexus/Newick source, use "rooting='force-rooted'" when
reading the trees or prefix your tree statements with "[&R]".
Or otherwise explicitly set the rooting state (``tree.is_rooted=True``
or ``tree.is_rooted=False``) BEFORE carrying out ANY calculations or
other operations that require the split hashes on the trees.
-- jeet
--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------