Reading in newick files with blank names

32 views
Skip to first unread message

Yan Wong

unread,
Dec 1, 2014, 7:30:47 AM12/1/14
to dendrop...@googlegroups.com
Is there a way to get DendroPy to read in newick files with blank names, such as the example from http://evolution.genetics.washington.edu/phylip/newicktree.html:

(,(,,),);

At the moment, I get
>>> from dendropy import Tree, TreeList
>>> Tree.get_from_string("(,(,,),);", schema="newick")
...
dendropy.utility.error.DataParseError: Error parsing data source on line 1 at column 4: Missing taxon specifier in a tree -- found either a '(,' or ',,' construct.

Mark Holder

unread,
Dec 1, 2014, 7:51:25 AM12/1/14
to dendrop...@googlegroups.com
Hi,
I'm not 100% sure, but I don't think that this is possible using DendroPy.
Sorry.
Mark

--
You received this message because you are subscribed to the Google Groups "DendroPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dendropy-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Mark Holder


==============================================
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone:  785.864.5789
fax (shared): 785.864.5860
==============================================

Jeet Sukumaran

unread,
Dec 1, 2014, 9:45:11 AM12/1/14
to dendrop...@googlegroups.com
Hi Yan Wong,

Mark is correct with respect to DendroPy 3: this is not possible.

With DendroPy 4, however, this *is* possible, if you are willing to
forgo the rich taxon object mapping and read in the labels as *node*
labels rather than taxon labels:

######
import dendropy

tree = dendropy.Tree.get_from_string(
"(,(,,),);",
schema="newick",
suppress_internal_node_taxa=True,
suppress_external_node_taxa=True)
print(tree.as_string("newick"))
#####

(The cost of this is that there is no rich taxon management, which means
all functionality that relies on this will not be applicable. In
particular, bipartitions/splits hashes will not be able to be
calculated, and many functions, from patristic distances to RF distances
will not be able available.
The reason anonymous taxa in NEWICK format is not supported is that it
is more important, IMHO, to support incomplete leaf-set trees, and the
book-keeping for these would be a nightmare to handle with anonymous taxa.)

I might back-port support for the above to DendroPy 3 in the future, but
it will not be for a while, and definitely not before DendroPy 4 goes
public.

On the other hand, apart from SumTrees and functionality in the
`treesum` module, DendroPy 4 should be stable enough for you to work
with. Documentation is lagging, but most of the DendroPy 3 API is still
supported, either directly as DendroPy 3 or with deprecation warnings,
and, of course, you could always turn to this group for questions on the
new API if there is any confusion or lack of legacy support.

At this point, I would encourage you to (permanently) switch to using
DendroPy 4. But if you are uncomfortable with that and really need to
process files with anonymous taxa, you could run the following in
DendroPy 4 to convert the labeling scheme to one parseable by DendroPy 3:

####

import dendropy

tree = dendropy.Tree.get_from_string(
"(,(,,),);",
schema="newick",
suppress_internal_node_taxa=True,
suppress_external_node_taxa=True)
for idx, nd in enumerate(tree):
nd.taxon = tree.taxon_namespace.new_taxon(
label="t{}".format(idx+1))
print(tree.as_string("newick"))

###



-- jeet
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Yan Wong

unread,
Dec 1, 2014, 5:21:07 PM12/1/14
to dendrop...@googlegroups.com
On Monday, 1 December 2014 14:45:11 UTC, Jeet Sukumaran wrote:
Hi Yan Wong,

Mark is correct with respect to DendroPy 3: this is not possible.

With DendroPy 4, however, this *is* possible,

Thanks for that. I did indeed download version 4 and it works. Surprisingly, v4 is fine without either of the suppress_*** flags: I don't know if this is unintended, but so far it works fine for me. Thanks again for your work on this.

Jeet Sukumaran

unread,
Dec 1, 2014, 7:49:35 PM12/1/14
to dendrop...@googlegroups.com
Ah, yes:

s1 = "(,(,,),);"

will be parsed in DendroPy 4 with no special handling. Nodes with no
explicit label are created without any label value or taxon assignment.

What *might* be considered a semantically identical tree representation
(but not necessarily: the standard is ambiguous in this regard):


s2 = "(_,(_,_,)_,_);"

requires usage of the ``suppress_internal_taxa=True`` and
``suppress_external_taxa=True`` specifications. Here the `_` represents
an actual label, and the problem is not so much that it is a
blank/space, but that the redundancy of the labels: strings such as

s3 = "(a,(a,a)a,a);"
s4 = "(a,(b,c)d,a);"

would also require the use of ``suppress_internal_taxa=True`` and
``suppress_external_taxa=True`` specifications to be parsed successfully
(in DendroPy 4 only; DendroPy 3 cannot handle any of these cases).
In general, usage of the ``suppress_internal_taxa=True`` and
``suppress_external_taxa=True`` specification are useful in cases where
there duplicate/redundant labels on the tree, whether these are blanks
or otherwise.

(Note that "s1" and "s2" produce subtly different trees. In the first
case, all node labels are `None`, while in the latter all Node labels
are actually a space character.)

Yan Wong

unread,
Dec 2, 2014, 4:34:21 AM12/2/14
to dendrop...@googlegroups.com
On Tuesday, 2 December 2014 00:49:35 UTC, Jeet Sukumaran wrote:
Ah, yes:

     s1 = "(,(,,),);"

will be parsed in DendroPy 4 with no special handling. Nodes with no
explicit label are created without any label value or taxon assignment.

That turns out to be perfect for my use case, and I think it is a sensible way for DendroPy to interpret the standard. Thanks 
Reply all
Reply to author
Forward
0 new messages