Frequency of bipartition

49 views
Skip to first unread message

Carl Oliveros

unread,
Oct 8, 2015, 1:47:53 PM10/8/15
to DendroPy Users
Hi Jeet, Mark,

I'm trying to get the frequency of a bipartition in a tree list in which some of the trees don't have all the taxa defined in my bipartition.  My code looks something like:

trees = dendropy.TreeList()
for f in files:
    trees.read_from_path(f,'newick')    
taxon_list = ['a', 'b', 'c', 'd']
freq = trees.frequency_of_bipartition(labels=taxon_list)

And I get the error:

IndexError: Not all taxa could be mapped to bipartition (000000000000000000000000): 4

I checked the spelling of the taxa in my taxon lists and it does not seem to be the cause of the error.  My suspicion is that the error is caused by the fact that some taxa in my taxon list are not present in some of the trees.

Do you have any suggestions as to how I can get this information?

My other thought would be to get a consensus tree with the consensus function but because my bipartition of interest may not be present in the consensus tree if its frequency value is low, I will not be able to get the frequency value.  At least I'd know that the frequency will be below the minimum I set with the consensus function.

I'd appreciate your thoughts.

Cheers
Carl

 

Jeet Sukumaran

unread,
Oct 8, 2015, 6:18:44 PM10/8/15
to dendrop...@googlegroups.com
Carl,

Almost certainly something is wrong with the taxon labels that you are
passing in. I say this because "000000000000000000000000" indicates that
not ONE of your taxa could be found in the taxon namespace of your trees.

DendroPy actually is designed to work with trees with incomplete
leafsets. The failure to map to a bipartition error is indicative of
being unable to find any taxa associated with one or more of your labels
*across the entire taxon namespace*, rather than any single tree.

It would be helpful if you could attach a minimal data file that
replicates this error.

You could also troubleshoot it by add a line like:

for t in trees.taxon_namespace:
print("'{}'".format(t.taxon_label))

to see what the taxon labels that you have actually ingested are.

You could also check if your labels are associated with taxa:

labels = [t.label for t in trees.taxon_namespace]
assert "a" in labels
assert "b" in labels

FYI, the following works for me:

~~~
#! /usr/bin/env python

import dendropy

trees_str1 = """\
(c, (b, (d, e)));
(a, (b, (c, d)));
(a,(b,(c,d)));
(f, (g, (a, (b, (c, d)))));
"""

def f1(trees_str):
trees = dendropy.TreeList.get_from_string(
trees_str,
schema="newick",
rooting="force-rooted",
)
labels = ['a', 'b', 'c', 'd']
freq = trees.frequency_of_bipartition(labels=labels)
print("---")
print(freq)

f1(trees_str1)

~~~

As an aside, you should probably explicitly tell DendroPy the rooting of
the trees unless (a) all your trees have [&R]/[&U] associated with them
or (b) you are OK with unrooted trees. The results of
splits/bipartitions operations on unrooted trees confuse a LOT of people
(it has recently been one of the more common pseudo-"bug" reports I get
...).

-- jeet
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Reply all
Reply to author
Forward
0 new messages