How to automatize the coloring of taxonomic units in big data sets with introgression?

1,570 views
Skip to first unread message

jeama...@gmail.com

unread,
Nov 11, 2016, 5:30:46 PM11/11/16
to ggtree
Hi!,

I would like to create a tree in which the taxonomic units are colored until a conflict is found. There is a description on how to do this in one of your vignettes, which can be used to produce the tree that is below and is a perfect example of what I want. (http://bioconductor.org/packages/3.1/bioc/vignettes/ggtree/inst/doc/ggtree.html)

In that case you used a list to define the OTU. Unfortunately my dataset is to big too to create the list manually and all the members of a species are not connected to the same most common ancestor (due to introgression). For instance, in the previous plot, L,K,J and I could be part of the same species as A,B,C,D,E (an henceforth should be coloured pink), despite G, F and H, a different species, being in the middle.


For now, I'm able to colour my branches (see a closeup of my tree below), but something alike to a majority consensus is applied to resolve conflicting nodes. I'm using the following code to:


groupInfo <- split(tree, data$species)
tree
<- groupOTU(tree,groupInfo)

p
= ggtree(tree,aes(color=group))
p
<- p +
 scale_color_manual
(values=c("black","chocolate3","darkorange","cadetblue","darkblue","darkgreen","aquamarine","yellow","gray","brown","darkolivegreen4"))



Is is possible to automatize the detection of OTUs in this cases and get black edges when conflicting nodes are found? How?


Thanks!

Auto Generated Inline Image 1

jeama...@gmail.com

unread,
Nov 11, 2016, 8:59:50 PM11/11/16
to ggtree

After toying with R I was able to create a function that finds which tips belong to each OTU, but the plotting problem persists and seems to be deeply embedded in the code. I will show you an example. There is no problem if the OTU are clearly defined. In the following code and plot (OK.png), the remaining edges becomes black once t1, t2, t7 and t8 enter in conflict with t3, t4, and t10.
library(ggtree)
set.seed(123)
tree
<- rtree(10)

cls
<- list(c1=c("t1", "t2","t8","t7"),
            c2
=c("t3", "t4", "t10"),
            c3
=c( "t9","t6","t5"))

tree
<- groupOTU(tree, cls)
ggtree
(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual
(values=c("black", "red","green","blue")) + theme(legend.position="right")


On the contrary, If t1 belong to the same species as t3, t4 and t10 I will get this horrible green line going backwards until the origin of the tree. Since t1 and t2 are in conflict, I would expect the remaining lines to be black.
library(ggtree)
set.seed(123)
tree
<- rtree(10)

cls
<- list(c1=c( "t2","t8","t7"),
            c2
=c("t3", "t4", "t10","t1"),
            c3
=c( "t9","t6","t5"))

tree
<- groupOTU(tree, cls)
ggtree
(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual
(values=c("black", "red","green","blue")) + theme(legend.position="right")


Is it possible to solve this problem? It makes very unappealing plots when trying to convey information about introgression.

gc...@connect.hku.hk

unread,
Nov 16, 2016, 12:51:29 AM11/16/16
to ggtree

tree <- groupOTU(tree, cls, overlap='abandon')

ggtree(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual(values=c("black", "red","green","blue")) +
   theme(legend.position="right")


The 'overlap' parameter was introduced in ggtree v=1.7.4

jeama...@gmail.com

unread,
Jan 23, 2017, 10:41:46 AM1/23/17
to ggtree
Hi!

Sorry for bringing back to life this tread after several months but there's still a small detail to fix with this function (I just realized it now that my dataset has become more complex.) I will try to explain the best I can:

If one internal node connects to two clades, each being a different OTU, the function produces the expected results, i.e., a black edge after the clades coalesce to the same common ancestor: Like green and red OTUs in this tree.

library(ggtree)
set.seed(123)
tree <- rtree(14)

cls <- list(c1=c("t14","t9","t4"),
            c2=c("t1","t11","t2"))

tree <- groupOTU(tree, cls)
ggtree(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual(values=c("black", "red","green")) + theme(legend.position="right")



Now, if one tip belonging to, lets say, OTU "blue" connects to a node and it happens that that node is connected to two or more tips of a different OTU "green", the results is that the edges starting from that node towards the root of the tree will have the same color of the tip that is connected to the node, i.e, blue, like here with t2 and t3:

library(ggtree)
set.seed(123)
tree <- rtree(14)

cls <- list(c1=c("t14","t9","t4"),
            c2=c("t1","t11","t2"))

tree <- groupOTU(tree, cls)
ggtree(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual(values=c("black", "red","green")) + theme(legend.position="right")



But, to be scientifically accurate, the tree should look like this:

In other words, as soon as a discrepancy arises, the remaining branches closer to the root must be colored black, even if it is a tip connecting to a very internal node. Could you correct the overlap function to fix this problem?

jeama...@gmail.com

unread,
Jan 26, 2017, 4:17:37 PM1/26/17
to ggtree
Correction: On my last message, the code to plot the second tree was mistaken. Such tree was produced with the following code:

library(ggtree)
set.seed(123)
tree <- rtree(14)

cls <- list(c1=c("t14","t9","t4"),
            c2=c("t1","t11"),
            c3=c("t2","t3"),
            c4=c("t5","t10"))

tree <- groupOTU(tree, cls)
ggtree(tree, aes(color=group)) + geom_text(aes(label=label)) +
  scale_color_manual(values=c("black", "red","green","blue","yellow")) + theme(legend.position="right")

Yu, Guangchuang

unread,
Dec 11, 2017, 10:53:36 PM12/11/17
to jeama...@gmail.com, ggtree

The overlap was introduced to solve conflict issue. For your last plot, there is no conflict exists and groupOTU do the right thing to traceback from t2 and t3 to their mrca.
In next release, I will introduce another parameter connect = FALSE (by default) to not trace back t2 and t3 to mrca in the situation we presented in your last plot that the induced graph contains only nodes with degree = 1. If setting connect = TRUE, it will always trace back to mrca.




--
You received this message because you are subscribed to the Google Groups "ggtree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioc-ggtree+unsubscribe@googlegroups.com.
To post to this group, send email to bioc-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioc-ggtree/d77fc1ab-8775-49cb-81e7-2db8dcb5941d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu PhD
Postdoc researcher
State Key Laboratory of Emerging Infectious Diseases
School of Public Health
The University of Hong Kong
Hong Kong SAR, China
-~----------~----~----~----~------~----~------~--~---

Yu, Guangchuang

unread,
Dec 11, 2017, 11:04:26 PM12/11/17
to jeama...@gmail.com, ggtree
Sorry, the induced graph contains all nodes with degree 1 except the mrca node.

Carles Borredá

unread,
Jul 10, 2018, 8:50:40 AM7/10/18
to ggtree
Hello

I just want to know if the 'connect=T' option has been implemented or not. I can't find it on github news, but I just wanted to check it.

Thank you
To unsubscribe from this group and stop receiving emails from it, send an email to bioc-ggtree...@googlegroups.com.

To post to this group, send email to bioc-...@googlegroups.com.
--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu PhD
Postdoc researcher
State Key Laboratory of Emerging Infectious Diseases
School of Public Health
The University of Hong Kong
Hong Kong SAR, China
-~----------~----~----~----~------~----~------~--~---

Yu, Guangchuang

unread,
Jul 10, 2018, 9:33:17 AM7/10/18
to Carles Borredá, ggtree
yes

--
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data, Methods in Ecology and Evolution, 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
 
Homepage: https://guangchuangyu.github.io/software/ggtree
---
You received this message because you are subscribed to the Google Groups "ggtree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioc-ggtree+unsubscribe@googlegroups.com.

To post to this group, send email to bioc-...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Message has been deleted
Message has been deleted

aw...@cornell.edu

unread,
Apr 20, 2019, 1:36:24 PM4/20/19
to ggtree

Hello,


I am encountering the problem of traceback to mrca despite setting "overlap='abandon'" and "connect = F". 

You can see some of the branches labeled green that should be labeled gray in the attached snapshot of the tree.

Groups are four species, and tip labels of each species have a unique string of four characters at the beginning of the label.
Curious as to where I am going wrong here?

groupInfo <- split(TREE2$tip.label, substr(TREE2$tip.label, start = 1, stop = 4))

TREE3 <- groupOTU(TREE2, groupInfo, overlap='abandon', connect=F)

ggtree(TREE3, aes(color=group), layout='circular', size=0.75) +
  geom_treescale(x=5.5,y=1,fontsize = 6,color="gray87",linesize=1,offset=5) +
  theme(panel.background=element_rect(fill="gray20")) + 
  geom_tiplab(size=2,aes(angle=angle),align=T,linesize=0.5,offset=1) +
  scale_color_manual(values=c("gray87","red", "yellow", "magenta", "green"))

- Andrew
yes

To unsubscribe from this group and stop receiving emails from it, send an email to bioc-...@googlegroups.com.

To post to this group, send email to bioc-...@googlegroups.com.
tree_snapshot.png
Reply all
Reply to author
Forward
0 new messages