Hi Erick,
Depending on your performance demands and your tolerance for hackiness,
you could get this very easily and right away without much effort by
something like the following:
~~~
import dendropy
def lexicographic_sort_tree(tree):
for nd in tree.postorder_node_iter():
nd._child_nodes.sort(key=lambda x: x._sort_key)
nd._sort_key = nd.taxon.label if nd.taxon and
nd.taxon.label else "\n".join(ch._sort_key for ch in nd._child_nodes)
return tree
s1 = "(((b,c),(a,d)),e);"
tree = dendropy.Tree.get(data=s1, schema="newick")
print(tree.as_string("newick"))
lexicographic_sort_tree(tree)
print(tree.as_string("newick"))
~~~
So, what this does is a post-order traversal of the tree, constructing a
sort key on each internal node by some algorithm (here, just a
concatenation of the daughter taxon labels), and sorting the underlying
child node storage container (which is a list) on this key.
Cons are:
(1) efficiency; the tree is traversed twice, once to sort and then
another time when writing out the NEWICK string
(2) reliance on implementation details: (a) the underlying child node
storage container and (b) the assumption that the native Newick writer
emits child nodes in the order of the storage container
I suspect that for most applications the performance hit of (1) will be
negligible. So the question is whether you can live with (2).
If either of these or both are objectionable, then what needs to be done
is to derive a specialized Newick writer (e.g.,
LexicographicalNewickWriter) from the existing one, overriding the core
writing function to hardwire in the sorting and node order logic. This
is not too difficult to do ... though if you can get away with the above
it is not worth the effort. Let me know if you want to go this route.
-- jeet
On 1/17/18 7:16 AM, Erick Matsen wrote:
> Hallo Jeet--
>
>
> We're working on a project in which we'd like to have sorted Newick
> representations of trees via the lexicographically smallest taxon name.
> That is, we'd like to be able to put in a tree like (b,(a,c)) and have
> ((a,c),b) come out.
>
> nw_order from Newick utilities <
http://cegg.unige.ch/newick_utils> does
> this.
>
> If we were going to do this in DendroPy, it seems like we could sort
> each internal node, visiting via postorder_internal_node_iter. This
> wouldn't be super-ideal, because ideally we'd like to pass some state
> back up the tree (the smallest taxon label). Suggestions?
>
> Is this functionality you would want via a PR? It's nice to have a
> unique Newick representation of a tree.
>
>
> Thanks,
>
> Erick
>
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
dendropy-user...@googlegroups.com
> <mailto:
dendropy-user...@googlegroups.com>.
> For more options, visit
https://groups.google.com/d/optout.
--
--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------