sorting tree branches lexicographically

24 views
Skip to first unread message

Erick Matsen

unread,
Jan 17, 2018, 7:16:01 AM1/17/18
to DendroPy Users
Hallo Jeet--


We're working on a project in which we'd like to have sorted Newick representations of trees via the lexicographically smallest taxon name. That is, we'd like to be able to put in a tree like (b,(a,c)) and have ((a,c),b) come out.

nw_order from Newick utilities does this. 

If we were going to do this in DendroPy, it seems like we could sort each internal node, visiting via postorder_internal_node_iter. This wouldn't be super-ideal, because ideally we'd like to pass some state back up the tree (the smallest taxon label). Suggestions?

Is this functionality you would want via a PR? It's nice to have a unique Newick representation of a tree.


Thanks,

Erick

Jeet Sukumaran

unread,
Jan 17, 2018, 8:11:08 AM1/17/18
to dendrop...@googlegroups.com, Erick Matsen
Hi Erick,

Depending on your performance demands and your tolerance for hackiness,
you could get this very easily and right away without much effort by
something like the following:

~~~
import dendropy

def lexicographic_sort_tree(tree):
for nd in tree.postorder_node_iter():
nd._child_nodes.sort(key=lambda x: x._sort_key)
nd._sort_key = nd.taxon.label if nd.taxon and
nd.taxon.label else "\n".join(ch._sort_key for ch in nd._child_nodes)
return tree

s1 = "(((b,c),(a,d)),e);"
tree = dendropy.Tree.get(data=s1, schema="newick")
print(tree.as_string("newick"))
lexicographic_sort_tree(tree)
print(tree.as_string("newick"))
~~~

So, what this does is a post-order traversal of the tree, constructing a
sort key on each internal node by some algorithm (here, just a
concatenation of the daughter taxon labels), and sorting the underlying
child node storage container (which is a list) on this key.

Cons are:

(1) efficiency; the tree is traversed twice, once to sort and then
another time when writing out the NEWICK string
(2) reliance on implementation details: (a) the underlying child node
storage container and (b) the assumption that the native Newick writer
emits child nodes in the order of the storage container

I suspect that for most applications the performance hit of (1) will be
negligible. So the question is whether you can live with (2).

If either of these or both are objectionable, then what needs to be done
is to derive a specialized Newick writer (e.g.,
LexicographicalNewickWriter) from the existing one, overriding the core
writing function to hardwire in the sorting and node order logic. This
is not too difficult to do ... though if you can get away with the above
it is not worth the effort. Let me know if you want to go this route.

-- jeet

On 1/17/18 7:16 AM, Erick Matsen wrote:
> Hallo Jeet--
>
>
> We're working on a project in which we'd like to have sorted Newick
> representations of trees via the lexicographically smallest taxon name.
> That is, we'd like to be able to put in a tree like (b,(a,c)) and have
> ((a,c),b) come out.
>
> nw_order from Newick utilities <http://cegg.unige.ch/newick_utils> does
> this.
>
> If we were going to do this in DendroPy, it seems like we could sort
> each internal node, visiting via postorder_internal_node_iter. This
> wouldn't be super-ideal, because ideally we'd like to pass some state
> back up the tree (the smallest taxon label). Suggestions?
>
> Is this functionality you would want via a PR? It's nice to have a
> unique Newick representation of a tree.
>
>
> Thanks,
>
> Erick
>
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Erick Matsen

unread,
Jan 17, 2018, 4:14:27 PM1/17/18
to Jeet Sukumaran, dendrop...@googlegroups.com
Jeet--

I think this is exactly what we need! 

Thanks a ton. 

Erick


--
Frederick "Erick" Matsen, Associate Member
Fred Hutchinson Cancer Research Center
http://matsen.fredhutch.org/
Reply all
Reply to author
Forward
0 new messages