Non-equiprobable breaking of polytomies

13 views
Skip to first unread message

Yan Wong

unread,
Jun 26, 2020, 11:01:21 AM6/26/20
to DendroPy Users
As I understand it, the resolve_polytomies() method is intended to create a resolution of a polytomy so that each topology is equiprobable. But )on my computer) that doesn't seem to work for the simple case of resolving a 4-tip tree into one of the 15 possible topologies. Here's some code to demonstrate:

import collections
import dendropy
from dendropy.simulate import treesim
import random
taxa = dendropy.TaxonNamespace(['a', 'b', 'c', 'd'])
random.seed(123)
trees = collections.defaultdict(int)

def lexicographic_sort_tree(tree):
    """
    A hacky way to get a unique identifier for each topology - there must be a better way
    """
    for nd in tree.postorder_node_iter(): 
        nd._child_nodes.sort(key=lambda x: x._sort_key) 
        nd._sort_key = nd.taxon.label if nd.taxon and nd.taxon.label else "\n".join(ch._sort_key for ch in nd._child_nodes) 
    return tree 

for i in range(500000):
    tree = treesim.star_tree(taxa)
    tree.resolve_polytomies(rng=random)
    trees[str(lexicographic_sort_tree(tree))] += 1
for k, v in sorted(trees.items()):
    print(k, v)

For which I get the following counts, which are clearly biased in favour of the 3 evenly balanced topologies. Do others find this?

(((a,b):0.0,c):0.0,d) 31438
(((a,b):0.0,d):0.0,c) 31474
(((a,c):0.0,b):0.0,d) 31134
(((a,c):0.0,d):0.0,b) 31108
(((a,d):0.0,b):0.0,c) 31151
(((a,d):0.0,c):0.0,b) 31246
((a,(b,c):0.0):0.0,d) 31442
((a,(b,d):0.0):0.0,c) 31234
((a,(c,d):0.0):0.0,b) 31373
((a,b):0.0,(c,d):0.0) 41816
((a,c):0.0,(b,d):0.0) 41283
((a,d):0.0,(b,c):0.0) 41680
(a,((b,c):0.0,d):0.0) 31119
(a,((b,d):0.0,c):0.0) 31142
(a,(b,(c,d):0.0):0.0) 31360

mholder

unread,
Jun 26, 2020, 8:30:52 PM6/26/20
to DendroPy Users
Thanks for the bug report!
This is fixed in https://github.com/jeetsukumaran/DendroPy/tree/equiprobable at least when you are calling the function with the default limit=2 as you were.

Jeet, I'd recommend that we just note that the distribution is not equiprobable when limit > 2.  I can implement an equiprobable version of for that case if we really need it, but I think that it is a bit tedious to do correctly...


Mark

Yan Wong

unread,
Jun 27, 2020, 4:08:39 AM6/27/20
to DendroPy Users
Thanks so much for this Mark. It's really helpful. I'm a little surprised that others hadn't noticed this before, but perhaps it's not a very commonly used function.
Reply all
Reply to author
Forward
0 new messages