Annotating both bootstrap (UFboot + SH-aLRT) nodes and species nodes

420 views
Skip to first unread message

Michelle Demers

unread,
May 6, 2021, 11:21:24 PM5/6/21
to ggtree
Greetings ggtree community,

I am trying to build a tree using output from IQtree2 where I have two legends: one legend for grouping my organisms by species, and a second legend to annotate nodes based on ultrafast bootstrap values + SH-aLRT values. I want only 2 categories for the bootstrap legend: high confidence (UFBoot >= 90 && SH-aLRT >= 80) and low confidence (UFBoot < 90 && SH-aLRT < 80). I have attached a tree ("desired_tree) that I am trying to model this one after. 

Just as a note, the Newick tree is not direct output from IQtree2.  I first load my tree file output into Archaeopteryx to root it at the midpoint, then output it as Newick for processing in R.

Here is my code, but it does not work properly. I think it may be incorrectly annotating the  nodes by bootstrap confidence, and I am having difficulty making two separate legends with colours. I have modelled this code from the answer you gave for the question "Annotating nodes wit bootstrap cutoff" but I can't get it to work the way I want it to. 

Can you help?

library(ggtree)
library(ggplot2)
library(tidyverse)
library(treeio)

tree <- read.newick("scabrum_midpoint_rooted.treefile") 
 #Extract node label
 bootsv <-data.frame(tree[["node.label"]])
 ntip <- length(tree[["tip.label"]])
 nnode <- tree[["Nnode"]]
 SHaLRT <- data.frame(lapply(bootsv, function(x) as.numeric(sub("/.*", "",x))))
 UFBoot <-data.frame(lapply(bootsv, function(x) as.numeric(sub(".*/", "",x))))
 support <- cbind(SHaLRT, UFBoot)
 colnames(support) <- c("SHaLRT", "UFBoot")
 row.names(support) <- c((ntip + 1) : (ntip + nnode))
 #Remove values below SH-aLRT80 <80 and UFB <90
 support <- support[which(support$SHaLRT >=80 & support$UFBoot >=90),]
 #cut data
 support$UFBoot <- cut(as.numeric(support$UFBoot), breaks = 90,right = F,include.lowest = TRUE)
 #Put the results back into the tree.
 tree[["node.label"]] <- support$UFBoot
 #make a color list
 col = c("blue", "orange", "yellow", "green", "purple", "pink")
 names(col) = c(1, 2, 3, 4, 5)
 #plot the tree with bootstrap cutoffs
 tree2 <- ggtree(tree) +
   geom_point2(aes(subset = !isTip,colour = label), size=2.2)+
   geom_tiplab(size = 2.6, align=TRUE, linesize=0.25)+
   ggplot2::xlim(0, 0.4) +
   #geom_text(aes(label=node)) +
   scale_colour_manual("High bootstrap Confidence", #legend name
                       values = col, #legend color
                       labels = c("High confidence", "low confidence"))

#add labelling for forma species
tipcategories <- read.csv("tree_metadata.csv", header = TRUE, stringsAsFactors = FALSE, col.names = c("isolate", "fsp"),)
dd = as.data.frame(tipcategories)

tree2 %<+% dd +
  theme(legend.position = c(0.1, 0.9)) + 
  geom_tippoint(aes(colour = factor(fsp))) +
  guides(colour=guide_legend(title="Forma specialis")) 

Thank you!

Michelle
PhD candidate | The University of Sydney
desired_tree.png

Yu, Guangchuang

unread,
Nov 10, 2021, 11:18:59 PM11/10/21
to Michelle Demers, ggtree
I would recommend you to use the tidytree package to extract information from node labels and create a variable to store the category of high and low confidence. Then convert the object to a treedata object and visualize it with ggtree. 

--
1. G Yu*. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 2020, 69:e96. https://doi.org/10.1002/cpbi.96
2. LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603. http://dx.doi.org/10.1093/molbev/msz240
3. G Yu*, TTY Lam, H Zhu, Y Guan*. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution, 2018, 35(2):3041-3043. https://doi.org/10.1093/molbev/msy194
4. G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data, Methods in Ecology and Evolution, 2017, 8(1):28-36. https://doi.org/10.1111/2041-210X.12628
5. Book: https://yulab-smu.top/treedata-book/
---
You received this message because you are subscribed to the Google Groups "ggtree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioc-ggtree...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioc-ggtree/f449fd81-7923-4967-9b0a-c0e4baaeeafcn%40googlegroups.com.


--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu PhD
Professor, Director
Department of Bioinformatics
School of Basic Medical Sciences
Southern Medical University
Guangzhou, China
-~----------~----~----~----~------~----~------~--~---

George Pacheco

unread,
Dec 10, 2021, 9:55:27 AM12/10/21
to ggtree
Hello Guangchuang,

Would you be so kind as to quickly demonstrate how to do that? Basically, I have the same issue myself as I would like to have categorized bootstraps and not only above a certain number.

Thanks a lot in advance, George. 

 basePhylo_annot +
  geom_point2(aes(label = label, subset = !is.na(as.numeric(label)) & as.numeric(label) > 85), shape = 21, size = 1.65, fill = "#155211", colour = "#155211", alpha = .85, stroke = .075, show.legend = FALSE) +
  geom_tippoint(aes(fill = BioStatus, subset = !is.na(BioStatus)), size = 1.65, stroke = .075, colour = "#000000", alpha = 1, shape = 21, na.rm = TRUE) +
  geom_strip("PigeonIsland_05-GBS", "Trincomalee_01-GBS", barsize = 3.5, color = "#d9d9d9", label = "Group A", fontsize = 6, offset = .75, offset.text = 1.5) +
  geom_strip("Abadeh_04-GBS", "Torshavn_02-GBS", barsize = 3.5, color = "#bdbdbd", label = "Group B", fontsize = 6, offset = .75, offset.text = 1.5) +
  geom_strip("TelAviv_07-GBS", "Isfahan_03-GBS", barsize = 3.5, color = "#969696", label = "Group C", fontsize = 6, offset = .75, offset.text = 7) +
  geom_strip("Barcelona_15-GBS", "Monterrey_05-GBS", barsize = 3.5, color = "#636363", label = "Group D", fontsize = 6, offset = .75, offset.text = 9) +
  geom_strip("Berlin_04-GBS", "London_05-GBS", barsize = 3.5, color = "#252525", label = "Group E", fontsize = 6, offset = .75, offset.text = 1.5) +
  scale_fill_manual(values = c("#44AA99", "#F0E442", "#E69F00", "#56B4E9", "#ff0000"), labels = gsub("_", " ", levels(Data_annot$BioStatus)), na.translate = FALSE) +
  theme(panel.spacing = margin(t = 0, b = 0, r = 0, l = 0),
        panel.border = element_blank(),
        plot.margin = margin(t = 0, b = 0, r = 0, l = 0),
        legend.position = c(.11, .875),
        legend.spacing.y = unit(.4, "cm"),
        legend.key.height = unit(.45, "cm"),
        legend.margin = margin(t = 0, b = 0, r = 0, l = 0),
        legend.box.margin = margin(t = 5, b = -20, r = 0, l = 30)) +
  guides(colour = guide_legend(title = "Species", title.theme = element_text(size = 14, face = "bold"),
                               label.theme = element_text(size = 10, face = "italic"), override.aes = list(size = .8, starshape = NA), order = 1),
         fill = guide_legend(title = "Biological Status", title.theme = element_text(size = 14, face = "bold"),
                             label.theme = element_text(size = 10),
                             override.aes = list(starshape = 21, size = 2.85, alpha = .9, starstroke = .0015), order = 2))
Reply all
Reply to author
Forward
0 new messages