Multiple datasets mapping to the same taxon

April Wright

unread,

Oct 28, 2016, 3:19:49 PM10/28/16

to DendroPy Users

Hi Jeet and other dendropyers-

This is probably going to be a dumb question, but I had a question about taxon namespaces. I thought these were mutable objects, so if I have two datasets, and I read them both in with the same taxon namespace, like below:

all_taxa = dendropy.TaxonNamespace()

ds0 = dendropy.StandardCharacterMatrix.get_from_path(file, taxon_namespace=all_taxa)
ds1 = dendropy.StandardCharacterMatrix.get_from_path(file, taxon_namespace=all_taxa)

the taxon_namespace would be (all the ds0 taxa) + (all the ds1 taxa not contained in ds0). As in, the namespace would be a comprehensive list of all unique taxa in both sets. Instead I'm getting this error:

TooManyTaxaError: Error parsing data source 'file' on line XXX at column Y: Cannot add taxon with label 
'taxon': Declared number of taxa (XXX) already defined. 

What would be the correct way to read in multiple datasets to the same namespace if some taxa may be unique to one dataset?

Thanks!

a

Jeet Sukumaran

unread,

Oct 28, 2016, 3:45:37 PM10/28/16

to dendrop...@googlegroups.com

Hi April,

Your intuition is correct in principle. However, a complication here is
that NEXUS files have an NTAX specification, and DendroPy uses this to
make sure that the number of taxa read == number of taxa expected. If
you are reading multiple NEXUS files with the same taxa into the same
TaxonNamespace, this is not a problem. But if you are reading multiple
NEXUS files with different taxa, then as you read in some new taxa from
the second file, an error is thrown once the number of TOTAL defined
taxa in the ENTIRE TaxonNamepace exceed the NTAX specification in the
second file.

So, e.g., Given the following files:

file1.nex : NTAX=5 : A,B,C,D,E
file2.nex : NTAX=6 : F,G,H,I,J,K

And error is thrown when encountering taxon "G", as this now results in
7 taxa in the TaxonNamespace, whereas file2.nex specified only 6.

What is happening here is that you are essentially combining two taxon
namespaces, and you need to explicitly tell DendroPy that this is what
is happening. When reading NEXUS files, if you pass the keyword argument
``unconstrained_taxa_accumulation_mode=True``, this will relax the
checking of the number of taxa being being read <= NTAX.

So:

ds0 = dendropy.StandardCharacterMatrix.get_from_path(file0,
unconstrained_taxa_accumulation_mode=True,
taxon_namespace=all_taxa)
ds1 = dendropy.StandardCharacterMatrix.get_from_path(file1,
unconstrained_taxa_accumulation_mode=True,
taxon_namespace=all_taxa)

Note that there should be NO error if you reading the same file twice,
as no new taxa are created during the second read of the same file.

> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--

--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

April Wright

unread,

Oct 28, 2016, 3:58:37 PM10/28/16

to DendroPy Users

Awesome, thanks. I wrote a little loop to add taxa to the namespace if they're present in ds1 but not ds0, but I realized you had probably already thought of a better way.

> <mailto:dendropy-users+unsub...@googlegroups.com>.

Reply all

Reply to author

Forward

Multiple datasets mapping to the same taxon_namespace

April Wright

Jeet Sukumaran

April Wright