bestTree vs. bootstrap consensus

2,513 views
Skip to first unread message

pancho3f

unread,
Aug 17, 2012, 12:39:32 PM8/17/12
to ra...@googlegroups.com
Hi ,

I'm using RAxML black box in the CIPRES portal and using MEGA to visualize the resulting trees. When I download the bootstrap.result file I use it to generate the boothstrap consensus on MEGA.  I was wondering which is the main difference between the bootstrap consensus and the bestTree since their topology is different. Would both of them be valid for publication? Which one would be best? I hope somebody can help me out with this. Thanks!

Francisco Flores
Oklahoma State University
 

Alexandros Stamatakis

unread,
Aug 17, 2012, 6:01:13 PM8/17/12
to ra...@googlegroups.com
that's a though question, the bestTree is the ML tree on the original
alignment and, as such, the best possible single topology you can get
for the datasets you generated, given that you are willing to accept
that the maximum likelihood tree is the one that best corresponds to the
"true" tree ... I don't have a strong opinion though and I am not sure
that anyone could give a decisive/definitive one.

Why don't you just show both trees and highlight the differences between
them?

That should be valid as long as we don't know how to best do it
anyway ... albeit I'd have a slight personal preference for the ML
tree...

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

pancho3f

unread,
Aug 17, 2012, 7:02:53 PM8/17/12
to ra...@googlegroups.com
Alexis,

Thank you very much for your prompt response! Is certainly helpful to know your opinion in this matter.
I just wanted to mention that there is a single difference between the trees (consisting of 81 taxa) and is one taxon that is inside a clade on the bootstrap consensus, with a support value of 88, but it falls out on the bestTree. Other than that, and some really minor differences on the support values, they are exactly the same. Unfortunately this taxon is one of the type species that I included in the analysis making this difference a prominent one.

Francisco

Alexandros Stamatakis

unread,
Aug 18, 2012, 4:50:18 AM8/18/12
to ra...@googlegroups.com
In that case it's definitely worth to discuss the difference ...
regarding the support values: keep in mind that when you build a
consensus tree on bootstrap replicates, the consensus tree algorithm
actually strives to maximize support ...

alexis

pancho3f

unread,
Aug 20, 2012, 3:57:35 PM8/20/12
to ra...@googlegroups.com
Thanks for the follow up! I'll keep that in mind

Francisco

Tobias Klöpper

unread,
Aug 18, 2012, 5:14:54 AM8/18/12
to ra...@googlegroups.com
Can you rule out long branch attraction?

Tobias

Alexandros Stamatakis

unread,
Aug 21, 2012, 9:48:41 AM8/21/12
to ra...@googlegroups.com
Hi Tobias,

> Can you rule out long branch attraction?

With respect to the differences between ML and bootstrap trees I'm
pretty sure that they are not due to long branch attraction.

Long branch attraction may be present, but if so, it should equally
affect BS and ML trees.

Alexis

pesthoney

unread,
Aug 22, 2012, 9:04:48 AM8/22/12
to ra...@googlegroups.com
Hi pancho3f,
   I use CIPRES too, but I got A best tree with bs value. And I got very low value such as 16 or 2. Can you tell me how to use MEGA to generate the bootstrap consensus tree? I don't know how to open the bootstrap.result file.
thanks,
yi

Alexandros Stamatakis

unread,
Aug 22, 2012, 10:04:19 AM8/22/12
to ra...@googlegroups.com
you can also do this with RAxML directly:

-J
Compute majority rule consensus tree with "-J MR" or extended majority
rule consensus tree with "-J MRE"
or strict consensus tree with "-J STRICT".
Options "-J STRICT_DROP" and "-J MR_DROP" will execute an
algorithm that identifies dropsets which contain
rogue taxa as proposed by Pattengale et al. in the paper
"Uncovering hidden phylogenetic consensus".
You will also need to provide a tree file containing
several UNROOTED trees via "-z"

e.g.:

./raxmlHPC -m GTRCAT -J MR -z RAxML_bootstrap.XY -s alignment.phy -n T1

alexis

pancho3f

unread,
Aug 22, 2012, 10:38:57 AM8/22/12
to ra...@googlegroups.com
Hi Pesthoney,

Sure, I downloaded the bootstrap.result file on windows explorer, I used the "save as" option and added a .tre extension to the name. I opened the file on MEGA5 and used the "compute consensus" option that is provided in one of the buttons that appear above the tree image.

Francisco 

pesthoney

unread,
Aug 22, 2012, 11:42:46 AM8/22/12
to ra...@googlegroups.com
thanks

Alexandre Selvatti

unread,
Aug 22, 2012, 12:44:06 PM8/22/12
to ra...@googlegroups.com
Hello Alexis and everyone,

Where can I find formal description for such different kinds of consensus trees, so I can use the best one to fit my dataset?

Thanks in advance,

Alex

Alexandros Stamatakis

unread,
Aug 23, 2012, 2:07:03 AM8/23/12
to ra...@googlegroups.com
Hi Alex,

I believe that it is described in the standard text books by Joe
Felsenstein and Ziheng Yang.

A shorter description can be found in this paper here:

http://www.sciencedirect.com/science/article/pii/S1877750310000086

Alexis

Grimm

unread,
Aug 30, 2012, 5:50:42 AM8/30/12
to ra...@googlegroups.com
Just came into this googlegroup thing because Alexi force me to. Just browsed over the topic and got stuck here.
I have a pretty decent experience with relatively high supported bipartitions not realised as branches in the best-found topology since I am working mostly at the intrageneric level, max. up to intergeneric relationships.

If you really want to explore competing signals in your bootstrap sample, it's worth to take a look at "bipartition networks". You read-in the RAxML bootstrap replicates file into SplitsTree and then make a consensus network using the "COUNT" option. If it takes ages to calculate, use the cut-off option (naturally a BS sample can have myriads of random bipartitions which are only represented in very few replicates). The edge lengths in the resulting graph are proportional to the frequency of the according bipartion in the BS sample. For single-gene data, a cutoff of 15 or 20 gives you a pretty good overview. In multigene data, in particular if you have combined data from different genomes, you may want to see all alternatives to branches <100. In the plant dataset I looked into it so far, which usually combine one or two nuclear partitions with few to many plastid partitions, incongruent nuclear signal eventually will be wiped out by the overwhelming plastid data, but detectable in the BS sample.

Regarding the review process: We (me, Alexi, and some others) used them already back in 2006 in an broadly sampled ITS-based paper on genus Acer (maples). Here's the link in case you're interested:
http://www.la-press.com/a-nuclear-ribosomal-dna-phylogeny-of-acer-inferred-with-maximum-likeli-article-a132
I have used them both for genetic and morphological data since then for all my phylogenetic publications. Our reviewers were mostly not familiar with it, but obviously, they could live with it :)
 
Reply all
Reply to author
Forward
0 new messages