Checking tip labels

38 views
Skip to first unread message

Emily Jane McTavish

unread,
Jun 8, 2015, 9:34:23 AM6/8/15
to dendrop...@googlegroups.com
Hi Jeet + dendropyrates,

I want to run a check that my alignment file and my tree have the same tip names, and report any that differ.
I'm reading them into the same namespace using:

d = dendropy.DnaCharacterMatrix.get(path='example.aln', schema='fasta')
tree = dendropy.Tree.get(path='tree.tre', schema='newick', preserve_underscores=True, taxon_namespace=d.taxon_namespace)

but I can't figure out how to make sure that the taxon sets are the same for the tree and the character matrix. Any tips?
Along those lines - is it possible to give a taxon in the namespace a new label, so that when I write out the tree and alignment it has the new label for both?

I'm using 4.0!

Thanks for the help,
EJM

(tree.tre and example.aln are rather large so I haven't attached them here)

Mark Holder

unread,
Jun 8, 2015, 10:01:51 AM6/8/15
to dendrop...@googlegroups.com
Hi,
I'm not sure if it is the best method, but check: https://gist.github.com/mtholder/25633ad86b0c14221e18

I think that you need to:
    1. make the taxon_namespace immutable after reading the data, and
    2. verify that the tree uses the taxa at the tips. Checking the # of tips may be sufficient.

Mark

--
You received this message because you are subscribed to the Google Groups "DendroPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dendropy-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Mark Holder


==============================================
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone:  785.864.5789
fax (shared): 785.864.5860
==============================================

Emily Jane McTavish

unread,
Jun 8, 2015, 10:28:41 AM6/8/15
to dendrop...@googlegroups.com
Thanks Mark!

I think that is a good place for me to start. Lines 14+ are not necessary I think, as in my test if the labels are not exactly the same Tree.get() fails with the error reporting the mismatched tip.
 File "/usr/local/lib/python2.7/dist-packages/dendropy/datamodel/taxonmodel.py", line 825, in new_taxon
    raise error.ImmutableTaxonNamespaceError("Taxon '{}' cannot be added to an immutable TaxonNamespace".format(label))
dendropy.utility.error.ImmutableTaxonNamespaceError: Taxon 'Xtaxon1' cannot be added to an immutable TaxonNamespace

So I can use it to make sure that all labels are the same.

EJM
You received this message because you are subscribed to a topic in the Google Groups "DendroPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dendropy-users/FDHUQGZIGGc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dendropy-user...@googlegroups.com.

Jeet Sukumaran

unread,
Jun 8, 2015, 10:57:12 AM6/8/15
to dendrop...@googlegroups.com
Yay for DendroPy 4!

(1) What you are doing looks correct, i.e., explicitly passing in the
taxon namespace. You can check that you are referencing the same taxon
namespace, of course, by something like:

~~~
assert tree.taxon_namespace is d.taxon_namespace
~~~

Now, if the above is true, then the issue is whether or not the taxa are
correctly being mapped to the proper taxon instances across the data.
The ``preserve_underscores=True`` is almost certainly correct and need
when working with taxa spanning FASTA and Newick/Nexus formats, so that
looks fine too. What do the taxon labels look like across the data sets?
Can you post the data or representative data?

(2) For the renaming of the taxa, just change the label of the taxon
instance, and everything will update across the board, as long as the
taxon and taxon namespace references are correct. E.g
~~~
import dendropy

tree = dendropy.Tree.get(
data="(A,(B,(C,D)));",
schema="newick")
dna = dendropy.DnaCharacterMatrix.get(
data=">A\nACGT\n >B\nACGT\n >C\nACGT\n >D\nACGT\n",
schema="fasta",
taxon_namespace=tree.taxon_namespace)
print(tree.as_string("newick"))
print(dna.as_string("fasta"))
assert tree.taxon_namespace is dna.taxon_namespace
assert tree.taxon_namespace[0].label == "A"
tree.taxon_namespace[0].label = "hello"
s = dna.taxon_namespace.get_taxon(label="B")
s.label = "world"
print(tree.as_string("newick"))
print(dna.as_string("fasta"))
~~~
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Emily Jane McTavish

unread,
Jun 9, 2015, 5:23:23 AM6/9/15
to dendrop...@googlegroups.com
Thanks! That looks like just what I need.

Labels I am dealing with include stuff like
"Nostoc_spLukesova_40/93_r__strain_Lukesova_40/93__1295_bp" and I was
slightly concerned that tree tip labels had been recoded, and that
alignments had been deposited with the original labels, so I wanted to
double check. (I forget the rules for slashes but I know at least some
analysis software cannot handle them in tip names).

I can use an immutable taxon namespace to assert they are matched, or if
I leave it mutable (line 8 to d.taxon_namespace.is_mutable = True) in
Mark's gist https://gist.github.com/mtholder/25633ad86b0c14221e18 it
will report mismatches.

I appreciate the help!
ejm

p.s. Some dendropy links are broken e.g.
https://pythonhosted.org/DendroPy/tutorial/writing.html
Emily Jane McTavish
Humboldt Research Fellow, Scientific Computing Group
Heidelberg Institute for Theoretical Studies
Schloss-Wolfsbrunnenweg 35
D-69118 Heidelberg
+49 157 53005470

Reply all
Reply to author
Forward
0 new messages