ggtreeExtra: bring back geom_vline

222 views
Skip to first unread message

Xabier Vázquez-Campos

unread,
May 25, 2021, 1:56:00 AM5/25/21
to ggtree

Hi there,

I recently upgraded R to v 4.0. I was just regenerating some of my trees and I realised that the geom_vline() function from ggtreeExtra was completely removed and replaced by geom_segment().

I tried to use geom_segment() as replacement, but its behaviour is completely different as it uses coordinates based on the whole graph. On the other hand, geom_vline() was using xintercept based on the actual tree length. This was extremely convenient when plotting data on the tree that is based on the branch lengths. As example, I was trying to add vertical lines to a rectangular tree based on the calculated RED (relative evolutionary divergence), so the RED values were very easy to plot (e.g. xintercept = 0.5 would be right in the middle of the tree) however, with geom_segment() this is not straightforward at all as it dependent on the area of the graph.

I usually wouldn't bother with this but I found geom_vline() extremely useful for this. Any chance to make it come back to ggtreeExtra?

Thank you,
Xabi

xush...@gmail.com

unread,
May 27, 2021, 9:20:35 AM5/27/21
to ggtree
So, I don't understand. The `geom_vline` just was used to plot the gridline of the external layer in the early version.

Xabier Vázquez-Campos

unread,
May 31, 2021, 8:22:08 PM5/31/21
to xush...@gmail.com, ggtree
Hi,
I've been digging into this a bit more, and it certainly has nothing to do with ggtreeExtra. I was certainly using geom_vline() from ggplot2 and not from ggtreeExtra sorry. My bad. Nonetheless, there is still something going on that I tried to replicate (details below)

I've run the same code in R 3.6.1 (left, original plot) and R 4.0.5 (right, 'wrong') with the same version of ggplot2 on both (3.3.3) and the closest (similar) versions of ggtree (2.3.0 vs 2.3.0.992) and tidytree (0.3.3 vs 0.3.4) that I could install in each environment.
What I was trying to explain in my initial post was the difference in the positions of the vertical lines added via geom_vline() (from ggplot2, no ggtreeExtra sorry). I noticed too that the geom_treescale() also gives very different (wrong) scale bars in both situations (the total length of the tree is 1)

Btw, the R 4.0.5 output is the same with ggtree installed from Bioconductor (2.4.2), from repo (3.1.0), or using the older version (2.3.0.992) installed from GitHub. I also found the same output if I installed R 3.6.x and all the packages with conda no matter if I installed them from the r or conda-forge channels. I could only replicate the original correct output if I loaded my original R libraries (from before the upgrade to R 4.0).

Screenshot from 2021-05-31 11-59-39.png

This is the original code:
library(ggtree)
library(tidytree)
library(stringr)
library(ggplot2)
library(phyloseq)
mytree <- read_tree_greengenes("path/to/tree/mytree.tree")
ggtree(mytree)

mytree.tibble <- as_tibble(mytree)
mytree.tibble

mytree.tibble.fix <- mytree.tibble %>% mutate(label = word(label, 1, sep = ":"))

p<-ggtree(as.treedata(mytree.tibble.fix), ladderize = TRUE, right = TRUE,layout = "rectangular") +
  geom_treescale(x=0, y=45, width=0.25) +
  geom_point2(aes(subset=!isTip , fill=cut(as.numeric(label), c(0, 75, 90, 100))),
              shape=21, size=1.5) +
  scale_fill_manual(values=c("black", "grey", "white"), guide='legend',
                    name='Ultrafast Bootstrap Support\n(UFBoot)',
                    breaks=c('(90,100]', '(75,90]', '(0,75]'),
                    labels=expression(BP >= 90, 90 > BP * " => 75", 75 > BP)) +
  theme(legend.text.align = 0) +
  geom_tiplab(aes(label = label),
              size = 2)

p +
    geom_vline(xintercept = c(0.43,0.57), linetype = 1, color = "blue", alpha = 0.5)  + geom_vline(xintercept = c(0.51), linetype = 2, color = "blue", alpha = 0.5) + #order
    geom_vline(xintercept = c(0.59,0.78), linetype = 1, color = "darkseagreen", alpha = 0.5)  + geom_vline(xintercept = c(0.72), linetype = 2, color = "darkseagreen", alpha = 0.5) + #family
    geom_vline(xintercept = c(0.85,0.95), linetype = 1, color = "purple", alpha = 0.5)  + geom_vline(xintercept = c(0.91), linetype = 2, color = "purple", alpha = 0.5) + #genus  
    theme(legend.position = "none")

In all situations the tree is read and imported into R with the same results (no alterations to the branch lengths or else). Therefore I followed by comparing the actual ggtree objects generated using waldo::compare().
The largest difference is in the p$data$branch and the p$data$x data (there are some other minor differences in other data too though). These are the first 10 values from data$branch in each plot:

> p_good$data$branch[1:10]
 [1] 0.7168397 0.8978873 0.8978873 0.9475526 0.9475526 0.7697012 0.7131633 0.9683917 0.9683917 0.9521250

> p_bad$data$branch[1:10]
 [1] 21.0 21.5 21.5 21.5 21.5 21.0 20.5 21.5 21.5 21.0

While based on p$data$x, the tips are located at 1.0 in p_good (expected), but at 22.0 in p_bad!!!.

Based on this, my guess is that geom_vline() and geom_treescale() are working as they should (they make sense now that I see that the total length of the tree is 22 and not 1), but they base their position/size on the data$x or data$branch vectors with odd values and that's why they look smaller/out of place.

I still couldn't figure out what package might be responsible for that but you might, given you know how ggtree works inside. I paste the sessionInfo from the instances used to generate the plots both from 3.6.3 to reduce differences


Good:
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/xabi/miniconda3/envs/r363-clean/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8  
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] phyloseq_1.28.0 ggplot2_3.3.3   stringr_1.4.0   tidytree_0.3.3
[5] ggtree_2.3.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6          ape_5.4-1           lattice_0.20-44    
 [4] tidyr_1.1.3         ps_1.4.0            Biostrings_2.52.0  
 [7] digest_0.6.27       assertthat_0.2.1    foreach_1.5.1      
[10] utf8_1.2.1          R6_2.5.0            plyr_1.8.6        
[13] stats4_3.6.3        pillar_1.6.0        zlibbioc_1.30.0    
[16] rlang_0.4.10        lazyeval_0.2.2      rstudioapi_0.13    
[19] data.table_1.14.0   vegan_2.5-6         S4Vectors_0.22.1  
[22] Matrix_1.3-3        labeling_0.4.2      splines_3.6.3      
[25] igraph_1.2.6        munsell_0.5.0       compiler_3.6.3    
[28] pkgconfig_2.0.3     BiocGenerics_0.30.0 multtest_2.40.0    
[31] mgcv_1.8-35         biomformat_1.12.0   tidyselect_1.1.0  
[34] tibble_3.1.1        IRanges_2.18.3      codetools_0.2-18  
[37] fansi_0.4.1         permute_0.9-5       crayon_1.4.1      
[40] dplyr_1.0.5         withr_2.4.2         MASS_7.3-54        
[43] grid_3.6.3          nlme_3.1-152        jsonlite_1.7.2    
[46] gtable_0.3.0        lifecycle_1.0.0     DBI_1.1.0          
[49] magrittr_2.0.1      scales_1.1.1        cli_2.4.0          
[52] stringi_1.5.3       farver_2.0.3        XVector_0.24.0    
[55] reshape2_1.4.4      ellipsis_0.3.1      rvcheck_0.1.8      
[58] generics_0.1.0      vctrs_0.3.7         Rhdf5lib_1.6.3    
[61] iterators_1.0.13    tools_3.6.3         treeio_1.8.2      
[64] ade4_1.7-15         Biobase_2.44.0      glue_1.4.2        
[67] purrr_0.3.4         parallel_3.6.3      survival_3.2-11    
[70] colorspace_2.0-0    rhdf5_2.28.1        cluster_2.1.2      
[73] BiocManager_1.30.15 aplot_0.0.6         patchwork_1.1.1   

Bad
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/xabi/miniconda3/envs/r363-clean-noconda/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8  
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] phyloseq_1.30.0 ggplot2_3.3.3   stringr_1.4.0   tidytree_0.3.4
[5] ggtree_2.0.4  

loaded via a namespace (and not attached):
 [1] treeio_1.10.0       progress_1.2.2      tidyselect_1.1.1  
 [4] purrr_0.3.4         reshape2_1.4.4      splines_3.6.3      
 [7] rhdf5_2.30.1        lattice_0.20-44     colorspace_2.0-1  
[10] vctrs_0.3.8         generics_0.1.0      stats4_3.6.3      
[13] mgcv_1.8-35         survival_3.2-11     utf8_1.2.1        
[16] rlang_0.4.11        pillar_1.6.1        glue_1.4.2        
[19] withr_2.4.2         BiocGenerics_0.32.0 rvcheck_0.1.8      
[22] foreach_1.5.1       lifecycle_1.0.0     plyr_1.8.6        
[25] zlibbioc_1.32.0     Biostrings_2.54.0   munsell_0.5.0      
[28] gtable_0.3.0        codetools_0.2-18    Biobase_2.46.0    
[31] permute_0.9-5       IRanges_2.20.2      biomformat_1.14.0  
[34] parallel_3.6.3      fansi_0.5.0         Rcpp_1.0.6        
[37] scales_1.1.1        BiocManager_1.30.15 vegan_2.5-7        
[40] S4Vectors_0.24.4    jsonlite_1.7.2      XVector_0.26.0    
[43] hms_1.1.0           stringi_1.6.2       dplyr_1.0.6        
[46] grid_3.6.3          ade4_1.7-16         tools_3.6.3        
[49] magrittr_2.0.1      lazyeval_0.2.2      tibble_3.1.2      
[52] cluster_2.1.2       crayon_1.4.1        ape_5.5            
[55] tidyr_1.1.3         pkgconfig_2.0.3     ellipsis_0.3.2    
[58] MASS_7.3-54         Matrix_1.3-3        data.table_1.14.0  
[61] prettyunits_1.1.1   iterators_1.0.13    Rhdf5lib_1.8.0    
[64] R6_2.5.0            multtest_2.42.0     igraph_1.2.6      
[67] nlme_3.1-152        compiler_3.6.3    


--
1. G Yu*. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 2020, 69:e96. https://doi.org/10.1002/cpbi.96
2. LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603. http://dx.doi.org/10.1093/molbev/msz240
3. G Yu*, TTY Lam, H Zhu, Y Guan*. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution, 2018, 35(2):3041-3043. https://doi.org/10.1093/molbev/msy194
4. G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data, Methods in Ecology and Evolution, 2017, 8(1):28-36. https://doi.org/10.1111/2041-210X.12628
5. Book: https://yulab-smu.top/treedata-book/
---
You received this message because you are subscribed to a topic in the Google Groups "ggtree" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bioc-ggtree/IAR2LfPwhg8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bioc-ggtree...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioc-ggtree/db714f52-43ff-479f-84d7-51230f552baen%40googlegroups.com.


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
Reply all
Reply to author
Forward
0 new messages