(1) So, again, ``len(b.split_as_bitstring())'' is NOT reporting what you
think it is reporting. It is not reporting the size of the encoding, nor
the number of splits, nor the number tips, nor the number of taxa, etc.
It is reporting the maximum size that the taxon namespace ever was,
which is really only of internal book-keeping interesting. It is, in
fact, for most practical purposes, a pretty useless value.
(2) I am repeating this point again, because (a) it is the major source
of your confusion, and (b) I did state this in the previous email: the
length of the the split representation as bitstring is NOT the length of
the bipartition encoding.
(3) The number of elements in the bipartition encoding is exactly equal
to the number of splits plus 1. The number of splits for an unrooted
tree is exactly 2N-3 (where N=number of tips on the tree; for details on
this, you could look up, e.g. Felsenstein 2004). So the size of the
bipartition encoding is correct: for 5 taxa, it is 8; for 4 taxa it os 6.
(4) You can get the number of bipartitions on a tree by call ``len()``
on the return value of ``encode_bipartitions()'':
bipartitions = tree.encode_bipartitions()
print(len(bipartitions))
or looking up the dictionary attributed added to the tree when encoding
bipartitions:
print(len(tree.bipartition_edge_map))
(4) For your purposes, there is no need to purge the taxon namespace.
The only reason to purge the taxon namespace is when writing to a data
file in NEXUS format to a program that crashes or otherwise demands that
every tree should have a complete leaf set.
(5) Your stated goal can be achieved by something like the following
(pseudo-pseudo-code):
assert tree1.taxon_namespace is tree2.taxon_namespace
t1_taxa = set(nd.taxon for nd in tree1.leaf_node_iter())
t2_taxa = set(nd.taxon for nd in tree2.leaf_node_iter())
to_keep = t1_taxa.intersection(t2_taxa)
tree1.retain_taxa(to_keep)
tree2.retain_taxa(to_keep)
bp1 = tree1.encode_bipartitions()
bp2 = tree2.encode_bipartitions()
(5) "Pruning the trees and recalculating the encodings results in
encodings of length 5? Which 5 taxa would these refer to?" I am not sure
that this statement makes sense given a proper understanding of what
bipartition encodings are.
(6) "As expected, if I create new trees (TX and TY) from the Newick
representations of the pruned trees they both have bipartition encoding
of length 4." Unfortunately, this is a case of two wrongs making a
wrong. First, your expectation is wrong --- the bipartition encoding of
the new trees should be of length 6. But the reason that your incorrect
expectation is met is because you are printing out the wrong value and
reading it as the bipartition encoding length.
(7) "Before pruning, the common taxon namespace has 8 taxa so it
surprises me that the encoding for tree 1 (T1) has a length of 6." Here
is another case of two wrongs making another wrong. Your expectation
that an 8-tip tree does not have an encoding length of 6 is wrong (it
should and it does), and the value you are printing out
(len(tree1.seed_node.bipartition.split_as_bitstring())) is, anyway, as I
have noted in point (1) and (2) and in the previous email, is meaningless.
> > an email to
dendropy-user...@googlegroups.com <javascript:>
> > <mailto:
dendropy-user...@googlegroups.com <javascript:>>.
> <
https://groups.google.com/d/optout>.
>
> --
>
>
>
> --------------------------------------
> Jeet Sukumaran
> --------------------------------------
>
jeetsu...@gmail.com <javascript:>
> --------------------------------------
> Blog/Personal Pages:
>
http://jeetworks.org/
> GitHub Repositories:
>
http://github.com/jeetsukumaran <
http://github.com/jeetsukumaran>
> <
http://www.flickr.com/photos/jeetsukumaran/sets/>
> --------------------------------------
>
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
dendropy-user...@googlegroups.com
> <mailto:
dendropy-user...@googlegroups.com>.