merge external data to phylo with aim to plot rescaled trees

175 views
Skip to first unread message

carmen murall

unread,
Dec 19, 2020, 4:09:41 PM12/19/20
to ggtree
Hello everyone,

I am trying to plot several rescaled trees and I need to combine external data.
My situation is different from this example (that combines a beast and ML tree, merge_tree):
in that instead of having two trees to merge, I have a table of the values for all nodes AND tips (see 'edgetable'). 
I tried combining the data using only the labels (the way it's done in the above example) but I loose the information for the nodes (see middle tree below).  
I have an addition issue with using non-numerical dates, since 'mrsd=' doesn't assign the correct date (see tree on the right).
The code is below. 
Any suggestions on how to fix these two problems (which are likely related), would be appreciated.
Carmen 



egtree <- read.tree("example_tree.nwk") #'phylo' object and edge.length in tree is = clock_length in edgetable
edgetable <-read.csv("edgetable_cladet.csv", header = T)

m <- data_frame(label= egtree$tip.label, mutation_length= edgetable$mutation_length[edgetable$isTip == TRUE] )
d <- data_frame(label= egtree$tip.label, date= edgetable$date[edgetable$isTip == TRUE] )

as.treedata(egtree) -> egtree #now S4 object
full_join(egtree, m, by="label") -> egtree2
full_join(egtree2, d, by="label") -> egtree2

#plot tree (scale is clock length)
p1<- ggtree(egtree2) + geom_tiplab()+
  coord_cartesian(clip = 'off') + theme_tree2()+ggtitle("clock length")

#plot rescaled by mutation length
p2<- ggtree(rescale_tree(egtree2, branch.length = 'mutation_length')) + geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("mutation length")

#extra problem (non-numerical date): scale is incorrect: adds same decimal date to all ticks
p3<- ggtree(egtree2, mrsd = as.Date(max(edgetable$date)))+ geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("date")

library(cowplot)
plot_grid(p1, p2, p3, ncol=3)

edgetable_cladet.zip

carmen murall

unread,
Dec 19, 2020, 4:11:52 PM12/19/20
to ggtree
oops, visible image: 

example_3plots.png

Yu, Guangchuang

unread,
Dec 19, 2020, 10:06:20 PM12/19/20
to carmen murall, ggtree
mutation lengths for the internal nodes are missing.
you can construct a data.frame with the first column of node number and second column of the mutation length and map the data to the tree structure.


--
1. G Yu*. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 2020, 69:e96. https://doi.org/10.1002/cpbi.96
2. LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603. http://dx.doi.org/10.1093/molbev/msz240
3. G Yu*, TTY Lam, H Zhu, Y Guan*. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution, 2018, 35(2):3041-3043. https://doi.org/10.1093/molbev/msy194
4. G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data, Methods in Ecology and Evolution, 2017, 8(1):28-36. https://doi.org/10.1111/2041-210X.12628
5. Book: https://yulab-smu.top/treedata-book/
---
You received this message because you are subscribed to the Google Groups "ggtree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioc-ggtree...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioc-ggtree/168587cb-e316-43f5-854d-76a3a3ccb794n%40googlegroups.com.


--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu PhD
Professor, Associate Director
Department of Bioinformatics
School of Basic Medical Sciences
Southern Medical University
Guangzhou, China
-~----------~----~----~----~------~----~------~--~---

carmen murall

unread,
Dec 20, 2020, 9:56:14 PM12/20/20
to Yu, Guangchuang, ggtree

Thank you for the quick response. Using the node numbers as you suggested fixed the mutation length issue. Thank you.
I still have problems with the mrsd and rescaling by decimal date. I believe it's because the dates are over a few months and not years. Here are three attempts and they all give incorrect scales. I would like to plot it with either the decimal date or the date with characters. 
Below is the code 
 
example_3plots_time.png

nm <- data_frame(node= edgetable$child, mutation_length= edgetable$mutation_length )
nnd <- data_frame(node= edgetable$child, numdate= edgetable$numdate )
nd <- data_frame(node= edgetable$child, date= edgetable$date )
nb <- data_frame(node= edgetable$child, branch= edgetable$branch_length)


as.treedata(egtree) -> egtree #now S4 object

full_join(egtree, nm, by="node") -> egtree2
full_join(egtree2, nnd, by="node") -> egtree2
full_join(egtree2, nd, by="node") -> egtree2
full_join(egtree2, nb, by="node") -> egtree2

p3<-ggtree(rescale_tree(egtree2, branch.length = 'numdate'))+ geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("decimal date")

p4<-ggtree(egtree2, mrsd = max(edgetable$date))+ geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("date, w/clock length")

p5<- ggtree(rescale_tree(egtree2, branch.length = 'branch'), mrsd = max(edgetable$date))+ geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("date, w/branch length")

library(cowplot)
plot_grid( p3, p4, p5, ncol=3)

Yu, Guangchuang

unread,
Dec 21, 2020, 12:24:49 AM12/21/20
to carmen murall, ggtree


For example, in your p4, I notice that all the x-axis breaks are labeled as 2020.284, which is equal to max(edgetable$date). This is weird and if you plot the tree without setting the mrsd, you can figure it out. The tree you are plotting is not time scaled and the branch length are all close to 0.

image.png

The ggtree command works as it is expected.


If you believe there is something wrong, please check your data first.

carmen murall

unread,
Dec 21, 2020, 12:04:59 PM12/21/20
to Yu, Guangchuang, ggtree
Yes you are correct the scale in the tree is the divergence adjusted by time, and the corresponding time scales are stored in the table separately. This tree was made in Augur (IQ Tree first) and then time-scaled adjusted by TreeTime, which gives trees with clock rate as the edge length. Therefore to get the mrsd to work I had to divide the edge length by the rate used to make the tree (in this case 0.0008 substitutions per site per year).  Below is the code that gives the tree with the correct numerical time scale. Thanks again. 
Best,
Carmen 

nc <- data_frame(node= edgetable$child, clock= (edgetable$clock_length/0.0008))
full_join(egtree2, nc, by="node") -> egtree2

ggtree(rescale_tree(egtree2, branch.length = 'clock'), mrsd = max(edgetable$date))+ geom_tiplab()+
  coord_cartesian(clip = 'off') +  theme_tree2()+ggtitle("date, w/clock length")

example_1plot.png
 


Reply all
Reply to author
Forward
0 new messages