How to add character data to the tree for phylogenetic contrasts computation

21 views
Skip to first unread message

Liang Xu

unread,
Jun 29, 2019, 12:36:48 PM6/29/19
to DendroPy Users
Hi, I just started to use Dendropy to analyze the correlation of traits on a phylogenetic tree. I am wondering if I can add trait/character to an existing Nexus tree in Python?

And I would also like to consistently simulate traits and compute contrasts once the simulation is done. 

Concretely, I have a baleen whales' tree (see the attached nex file). And Now I have the traits of species, for example, a list like this 

[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

 with species labels 

['B.acutorostrata','B.bonaerensis','B.borealis','B.brydei','B.edeni','B.musculus','B.mysticetus','B.omurai','B.physalus','C.marginata','E.australis','E.glacialis','E.japonica','E.robustus','M.novaeangliae']

How can I add this trait to the tree to compute contrasts?

Furthermore, if I have a ndarray of traits 

array([1,...15],
           [1,...,15],
           ...,
           [1,...,15])

Can I add them to the tree and compute contrasts for each row of it?

Thanks!
bw.nex

Jeet Sukumaran

unread,
Jun 29, 2019, 7:33:41 PM6/29/19
to dendrop...@googlegroups.com, Liang Xu
Have you tried looking at the documentation?

https://dendropy.org/primer/phylogenetic_character_analyses.html
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dendropy-users/7aa0878e-07ff-405f-a8f1-57739556e08a%40googlegroups.com
> <https://groups.google.com/d/msgid/dendropy-users/7aa0878e-07ff-405f-a8f1-57739556e08a%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--

----------------------------------------------------
Jeet Sukumaran
----------------------------------------------------
Assistant Professor
Biology Department
San Diego State University
----------------------------------------------------
Lab:
https://sukumaranlab.org/
Blog:
https://jeetblogs.org/
Repositories:
https://github.com/jeetsukumaran
Photography:
https://www.flickr.com/photos/jeetsukumaran/
Instagram:
https://www.instagram.com/jeetsukumaran/
Calendar:
https://goo.gl/dG5Axs
----------------------------------------------------
Email:
jsuku...@sdsu.edu (work)
jeetsu...@gmail.com (personal)
----------------------------------------------------
Mailing Address:
Biology Department, LS 262
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-4614
----------------------------------------------------

Liang Xu

unread,
Jul 1, 2019, 5:40:54 AM7/1/19
to DendroPy Users
Thanks a lot! Jeet helped me to work out it. 

I am recording the solution here in case the coming users have the same questions.

Actually, if you want to incorporate a/a batch of simulated trait sets to the loaded tree, you should specify the taxon_namespace when loading characters from a dictionary.

For example, first, you load a tree without characters
taxa = dendropy.TaxonNamespace()
tree_sim = dendropy.Tree.get(
path=a/tree/without/characters, schema="nexus",
taxon_namespace=taxa)

Then, you define a dictionary to store the characters
simchar_dict = {}
keys = [
"A", "B", ..., "Z"]
values = [[1.0],[2.0],...[26.0]]
for i in range(26):
simchar_dict[keys[i]]=values[:,i].tolist()

After that, you can add the new character dictionary to the tree by
simchars = dendropy.ContinuousCharacterMatrix.from_dict(simchar_dict, taxon_namespace=taxa)

Note that the keys of the dictionary should be identical to the species names of the tree.

At last, you can compute the pic. 
simpic = continuous.PhylogeneticIndependentConstrasts(tree=tree_sim, char_matrix=simchars)


If you have a batch of simulated trait sets in the dictionary like this
sim_chars = {"A":[1.0,2.0,...],...,"Z":[1.0,2.0,...]}

You can use the same way to add it to the tree
simchars = dendropy.ContinuousCharacterMatrix.from_dict(sim_chars, taxon_namespace=taxa)

To compute the contrasts for all sets, I just loop the calculation
sim_pic_thisbatch = []
for pic_each in range(num_sims):
sim_ctree = simpic.contrasts_tree(character_index=pic_each,
annotate_pic_statistics=True,
state_values_as_node_labels=False,
corrected_edge_lengths=False)
sim_pic = []
for nd in sim_ctree.postorder_internal_node_iter():
sim_pic.append(nd.pic_contrast_raw)

sim_pic_thisbatch.append(sim_pic)

It is working but slow. If someone have an idea of how to speed it up, I am very appreciated. Thanks.

Liang Xu

unread,
Jul 2, 2019, 7:00:22 AM7/2/19
to DendroPy Users
The speed issue of calculation has been solved. 

I used multiprocessing to parallelize the computation. The efficiency is substantially improved. 

I don't know why a single thread computation is such time-consuming for multiple trait sets. One guess is that storing a character matrix in large size takes time. Thus, the parallelized computation that only feeds one set at a time saves time. Not sure of that.
Reply all
Reply to author
Forward
0 new messages