concordance factor values & Hybridization

yaf...@gmail.com

unread,

Nov 3, 2015, 6:36:25 PM11/3/15

to BUCKy users

Hi BUCKy users,

I have six taxons and more than 1000 gene trees but no outgroup there. I would like to detect if the reticulate evolution happend among them.

One of my split CF is 0.78(0.764-0.796), could it suggest that there would be reticulate happend?

My question is what is the cut off of CFs potentially to support there would be reticulate evolution or ILS happend?

Thanks for your reading and hope get your reply a lot^^

Thanks a lot again~

Best,

Yafei

Cécile Ané

unread,

Nov 3, 2015, 10:24:42 PM11/3/15

to bucky...@googlegroups.com

Hi Yafei,

There is no cutoff to distinguish ILS-only from ILS + hybridization.

You can get a split with CF of 0.78 from ILS only and no hybridization, in which case the alternative splits contradicting this "major" split would have roughly equal CFs. For example there may be 2 alternative conflicting splits, each with CF of 0.11, roughly.

But you could also get a split with CF of 0.78 from hybridization only, and no ILS. In this case, there should be one other split that would reflect the other parental origin of genes at the hybridization event. For example, this alternative split could have a CF of 0.22, and all other splits would have a CF of 0 (never represented in any gene tree). But that's at the opposite end of the spectrum: in most cases there would be ILS too, on top of hybridization (if there is indeed hybridization).

In short: you need to look at the CFs of alternative, conflicting splits, to see a signature of hybridization. This paper is a great example, I think: http://onlinelibrary.wiley.com/doi/10.1111/evo.12099/abstract

Cecile.

--
You received this message because you are subscribed to the Google Groups "BUCKy users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bucky-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Cécile Ané
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat_consulting.php

yaf...@gmail.com

unread,

Nov 4, 2015, 12:32:55 AM11/4/15

to BUCKy users

Hi Ane,

Thanks for your reply, it is pretty useful for me^^

But I am confused about the concept of "alternative split concordance factor"? My understanding is the concordance value between one taxon in 'major' clade and others taxons, which looks like the split 4 where I put my result in attachment. My understanding is right?

And also, the paper you recommend me^^ mentioned one sentence "alternative bipartitions with CFs higher than 10% were further investigated for evidence of hybridization". Do you know what is meaning of 'alternative bipartitions' here? is it same idea with 'alternative splits CFs'

Thanks for your kind reply again.

Best,

Yafei

Screen Shot 2015-11-04 at 14.03.00.png

Cécile Ané

unread,

Nov 4, 2015, 9:57:52 AM11/4/15

to bucky...@googlegroups.com

Hi again Yafei,

Yes in your example, split number 4 with either split 1, 2 or 3 (I can't tell which one from your screenshot, but you can click on split 4 to see which other split has to go away, to display split 4 on the tree). In Cui et al (2013), when they refer to 'alternative bipartitions', they refer to bipartitions (or splits) that are not represented in their primary concordance tree, because they are not compatible with some other split in the tree. In your example, split number 4 is one such bipartition.

Two more resources to test a model with ILS only, versus a model with ILS + hybridization:

Stenz et al. (2015, Syst. Biol.): TICR can test the adequacy of a tree with ILS only. pdf here:
http://sysbio.oxfordjournals.org/cgi/reprint/syv039?ijkey=PGiptM62iGhH0zu&keytype=ref
and software here: https://github.com/nstenz/TICR/
although in your case, you might have too few taxa to use this test.

Solís-Lemus and Cécile Ané (arXiv): http://arxiv.org/abs/1509.06075
SNaQ is a method to estimate a network, including hybridization or gene flow events (with ILS on top). The software is here: https://github.com/crsl4/PhyloNetworks

Cheers,
Cecile.

-- 
Cecile Ane
Departments of Statistics and of Botany
University of Wisconsin - Madison

yaf...@gmail.com

unread,

Nov 5, 2015, 9:12:30 PM11/5/15

to BUCKy users

Hi Cecile,

Sorry for asking again. *^*

Thanks for your guide^^, they are so useful for me.

And also, I tried to use the software you mentioned here and I have some questions about SNaQ.

1.I tested my trees with branch length produced by raxml, but get wrong like this in Julia:

'ERROR: MethodError: `unionTaxa` has no method matching unionTaxa(::Void)'

My tree file looks like this:

" (((1:0.00000100000050002909,(5:0.00543345007380428846,3:0.01389751016355925822):0.00541256517336659478):0.00000100000050002909,2:0.00270503078140669172):0.03338650760066354944,6:0.00000100000050002909,4:0.00000100000050002909):0.0; "

for your example, it looks like

"(6:2.728,((3:0.655,5:0.655):1.202,(1:0.881,(2:0.783,4:0.783):0.098):0.976):0.871);"

I wondering the "2.728" is branch length or CF?

if it is CF, could I use bucky for each of genes produced by mrbayes and track each of them the "Primary Concordance Tree with Sample Concordance Factors" from the run1.concordance file and put them together to build the tree file?

2. Could I make the table of CF values directly from bucky output (ie: run1.concordance), which output contains all of genes tree produced by Mrbayes. Any script could be used? Actually I looked at the TICR, and one of script 'bucky.pl' seems can do it but i am not sure, and also I tried it but it is not easy and handy to use directly with my ability. Actually, I do not clearly know how to build my CF table from the output of bucky.

BTW, I also tried PhyloNet before, I could detect the signal in my case, but as you mentioned in your paper, it will cost lots of time to run when reticulated number over 3 (almost over one week in my case). I am wondering how could you make sure how many reticulate events happened in you SNaQ. Like PhyloNet, I need to try different reticulate number and it will give a Log Probability. When the log Probability does not increase, I roughly know how many reticulate events happened in my case. But for SNaQ, how can I tell it?

Sorry for asking so many questions=-=

Thanks a lot!

Best,

Yafei

Cécile Ané

unread,

Nov 7, 2015, 6:30:52 PM11/7/15

to bucky...@googlegroups.com

Hi again Yafei,
Your question below is more about SNaQ than BUCKy, so I will answer on a different email list: see here
https://groups.google.com/forum/#!forum/phylonetworks-users

To summarize quickly:

1. more information is needed to reproduce the error and to fix it.
Branch lengths are in substitutions per site in gene trees, but in coalescent units in species trees, when estimated by many methods.

2. Yes you can get a table of quartet CFs using the script bucky.pl --we'll include better instructions (really just 2 command lines)

3. Yes PhyloNet is a great other option! You will need to root your gene trees in some good way though, for input to PhyloNet.
With 6 species, I doubt that any method can accurately detect 3 hybridization events. That's a hard problem.
The issue of deciding how many reticulation events are needed for a given data set is a tough one too. In PhyloNet like SNaQ, looking at the increase in the log-likelihood is a good thing to do, but there is no hard-and-fast rule about how much of an increase is significant or not.

Cheers,
Cecile.

-- 
Cécile Ané
Departments of Statistics and of Botany
University of Wisconsin - Madison

Simon Uribe-Convers

unread,

Nov 12, 2015, 7:17:37 AM11/12/15

to BUCKy users

Dear Cécile,

Thank you for recommending this paper (Cui et al. 2013), we read it in our journal club yesterday and really liked it! However, one question came up and I'm still thinking about it.

The final dataset the authors analyzed with BUCKy had ~2300 transcripts but instead of running each of those in MrBayes to get gene trees for each locus, they split the data in half and ran only two MrBayes analyses—each with 1183 concatenated transcripts. At the end they only analyzed posterior probabilities of two large partitions/loci. I understand that there are computational and time limits in running ~2300 MrBayes analyses but don't you think this is what they should have done? Would BUCKy be capable of analyzing posterior probabilities for so many loci? What is the point of generating such a great dataset if you are going to concatenate it? Finally, what do you think is the impact of their two locus approach in assessing hybridization—would they have gotten different results or weaker support for the patterns they found?

Thanks a lot for your input.

Best,

Simon

--
Simon Uribe-Convers, Ph.D.

Postdoctoral Fellow

Muchhala Lab; Biology Department; University of Missouri - St. Louis; St. Louis, Missouri, 63121, USA

Research Associate

Missouri Botanical Garden; St. Louis, Missouri, 63110, USA

www.simonuribe.com

http://www.umsl.edu/~muchhalan/

Cecile Ane

unread,

Nov 12, 2015, 12:32:01 PM11/12/15

to bucky...@googlegroups.com

Hi Simon,

You could ask Cui et al to be sure, but my understanding is that they definitely estimated a separate gene tree for each of the 2366 transcripts. They used the option of unlinking tree topologies to do so easily in MrBayes 3.2.1. Their supplementary information, for the analysis of transcripts obtained using the X. maculatus genome, says that they "unlinked tree topologies and branch lengths" across all transcripts. So even if 1183 transcripts were analyzed together, MrBayes estimated 1183 unlinked gene trees, one for each transcript. It makes things easier in that there is only one (long) run to start. You don't need to write an additional script to start 1183 separate MrBayes runs. So my understanding is that Cui et al. split the jobs into 2 long MrBayes runs, each one having to estimate the 1183 gene tree posterior distributions, one for each of the 1183 unlinked transcripts. Then their BUCKy analysis did really use all 2366 transcripts as separate loci, not just 2 (very long) loci.

Cécile

Simon Uribe-Convers

unread,

Nov 12, 2015, 4:51:21 PM11/12/15

to BUCKy users

Hi Cécile,

thanks for the clarification, I really liked the paper!

Cheers,

Simon

Reply all

Reply to author

Forward