Weighted RF Distance w. Support as Node Labels

35 views
Skip to first unread message

Matt Stata

unread,
Dec 22, 2015, 9:48:04 PM12/22/15
to DendroPy Users
Hello,

I am trying to calculate the weighted Robinson-Foulds distance for a bunch of trees in Newick format.  The support values are stored as node labels, and I'm not clear from the DendroPy documentation if/how I can direct the weighted RF function to use these labels as weights.  If it's not possible directly, is there some sort of conversion I could do?

Thanks very much!

Matt

Jeet Sukumaran

unread,
Dec 31, 2015, 12:00:20 AM12/31/15
to dendrop...@googlegroups.com
Terminologically, "weighted" RF means weighting the distance by the
subtending edge length. If you really do mean weighted by edge length,
then this functionality already exists in DendroPy: see
``treecompare.weighted_robinson_foulds_distance``).

From what I can make out this is not what you mean, however, I think
that you mean that each split has a weight associated with it that
contributes to its distance.

Is that correct?

If so, it can be done very easily, but does require some prepping.

First, you will have to properly parse the node weights. This can be
done as follows. When you read the trees, unless otherwise instructed to
parse the labels as taxa by specifying
``suppress_internal_node_taxa=False``, the labels will be read and
stored as string values in the ``label`` attribute of each node. You
will need to convert these to support values which we store with the
associated edge as a new attribute, ``support``, e.g.:

~~~
tns = dendropy.TaxonNamespace()
trees1 = dendropy.TreeList.get(path="f1.tre",
schema="newick",
taxon_namespace=tns,
suppress_internal_node_taxa=False)
trees2 = dendropy.TreeList.get(path="f1.tre",
schema="newick",
taxon_namespace=tns,
suppress_internal_node_taxa=False)
for tree in itertools.chain(trees1, trees2):
if nd.label:
nd.edge.support = float(nd.label)
else:
nd.edge.support = 0.0 # or something else ...
~~~

Now you will need to call
``dendropy.calcualte.treecompare.weighted_robinson_foulds_distance()``,
but specify that instead of using the edge length, you want DendroPy to
use the ``support`` attribute of each edge. You do this by specifying
``edge_weight_attr="support"``.

So, for e.g.:

~~~
from dendropy import treecompare
for tree1 in trees1:
for tree2 in trees2:
d = treecompare.weighted_robinson_foulds_distance(
tree1,
tree2,
edge_weight_attr="support")
print(d)
~~~

Code is not tested, so caveat coder ...

-- jeet
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Reply all
Reply to author
Forward
0 new messages