'0' in bootstrap values

3,917 views
Skip to first unread message

Leo

unread,
Oct 2, 2011, 3:41:32 PM10/2/11
to raxml
Dear all, I've been using RAxML for some time although this is my
first post ;)

I'm working with quite a big dataset of sequences of a bacterial
protein, and it's well-known that bacterial sequences often give
phylogenetic trees which cannot be completely resolved at the most
internal nodes. I have tried different alignment parameters to improve
it - 'cause they're quite divergent sequences -, but some of these
internal nodes are assigned with a bootstrap value of '0'.

Actually, my question comes here, how can be possible that the best
tree had a node which is 0% supported, I mean, if none of the 1000
bootstrap replicates I used supported that node even once, how can it
be
finally represented in the tree?

I wish you could solve my question, and I hope you could understand
what I'm trying to say ;)

Leo

Joseph Brown

unread,
Oct 2, 2011, 4:25:40 PM10/2/11
to ra...@googlegroups.com
It is not uncommon for your MLE to not be identical to your bootstrap consensus tree. This seems (in my experience) to happen when only a small number of characters have consistent signal in favour of your MLE; during bootstrap resampling these characters can easily be missed, yielding low support. The issue is exacerbated when dealing with sparse (or gappy) matrices (those with a lot of listing data); it is possible (as you've seen) within a finite number of bootstrap replicates to _never_ recover a particular node in your MLE tree. 

There is unfortunately no way for you to improve your bootstrap scores; they are what they are. As for interpretation, I think there is only one way to summarize the situation. _Given_ (or _conditional_ upon) the empirical data, your MLE tree is your best estimate of phylogeny, but some nodes within the tree are not supported by resampling methods. [As an aside, I am not convinced that nonparametric bootstrapping is the best or most informative way to summarize uncertainty in a sparse matrix; I am much more inclined to summarize uncertainty by conditioning on the data (which requires a Bayesian framework).] If you want to _do_ something with your tree (that is, use the tree as input for another analysis), I think it is defensible to use your MLE tree, as long as you are upfront with the limitations of your data.

HTH.
Joseph.

Leo

unread,
Oct 2, 2011, 4:57:38 PM10/2/11
to raxml
Ok, thanks for your answer!
So, as I'm doing finite bootstrapping of my data, maybe any of these
re-sampling steps give support to some of the nodes of the ML tree,
and that's why they appear as '0'-labeled.
I was thinking of using the RAxML approach of not setting a priori the
number of replicates (-I autoMR), letting the program finish
bootstrapping, maybe that way I could improve the support of some of
the nodes, what do you think?

Leo

Andre J. Aberer

unread,
Oct 2, 2011, 6:35:48 PM10/2/11
to ra...@googlegroups.com
Dear Leo,

I agree, bootstrap support of 0 is pretty tough given 1000 replicates.

I'd explain your observation as follows:
1. Consider that ~37% of the characters of the original alignment do not
occur at all in a bootstrap replicate. For demonstrating this just
enter in R
> unique(round(runif(1000, min=0, max=999)))
2. If there is not much phylogenetic signal in the alignment for
resolving a specific inner node, then the ML search will yield just
"some" arrangement of the taxa (because it does not make much of a
difference in terms of likelihood).
3. So given that there is not enough phylogenetic signal to resolve some
inner nodes, it may be likely, that the original alignment slightly
favors the inference of an arrangement of taxa that does not occur in
any of the bootstrap replicates (since bootstrap replicates and
original alignment are pretty different (see 1.)).

Another example from the bootstrapping perspective (ignoring likelihood
considerations): Assume, we have just 1 rogue taxon R, that largely
lacks phylogenetic signal. In all bootstrap replicates R is placed as
follows:
(((a,b),c),d,(e,(f,R)));
Now assume that the ML search yielded the following tree:
((((a,R),b),c),d,(e,f));
If we draw bootstrap support on the ML tree, we get:
((((a,R)[0],b)[0],c)[0],d,(e,f)[0]);
If we prune R from bootstrap set and ML tree, we instead obtain:
(((a,b)[100],c)[100],d,(e,f)[100]);

So in conclusion, one single rogue taxon can suffice (in theory) to
destroy any bootstrap support.

If you want to give rogue taxa identification a try for your data set,
RAxML currently offers two algorithms that prune rogue taxa from your
bootstrap tree set in order to increase the overall support in a
consensus tree. There is
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5710874
which is implemented as -J MR_DROP | STRICT_DROP in RAxML
and
http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2011-9.pdf
for which the code can be downloaded under
http://sco.h-its.org/exelixis/aberer/rogue-raxml.tbz

Beside, in at latest a week, we will release a programme that offers an
option for identification of rogue taxa in the context of bootstrap
support drawn on the ML tree.

--
Best regards,
Andre J. Aberer

M.Sc. (Bioinformatics)
Scientific Computing Group

Heidelberg Institute for Theoretical Studies (HITS gGmbH)
Schloss-Wolfsbrunnenweg 35
D-69118 Heidelberg

Tel.: +49 6221 533 264
Fax: +49 6221 533 298
Email: andre....@h-its.org
WWW: http://www.exelixis-lab.org
http://www.h-its.org/english/research/sco/index.php

Amtgericht Mannheim / HRB 337446
Managing Directors: Dr. h.c. Dr.-Ing. E.h. Klaus Tschira, Prof. Dr.-Ing. Andreas Reuter

Alexis

unread,
Oct 3, 2011, 4:46:05 AM10/3/11
to raxml
Thanks for your replies Andre and Leo,

It can indeed happen that if you have few sites and many taxa some
bipartitions (this is not node support, in unrooted trees
support values ALWAYS refer to inner branches and NOT to nodes) in the
ML tree do not occur in any of the bootstrap replicates.

This happens of course if you draw the bipartition support values as
induced by the collection of bootstrap trees onto the ML tree.

Evidently, if you build a MR or MRE consensus tree out of the BS
replicates, then you will get non-zero values.

What you are observing indicates that you probably don't have enough
data.

Alexis
> consensus tree. There ishttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5710874
> which is implemented as -J MR_DROP | STRICT_DROP in RAxML
> andhttp://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2011-9.pdf
> for which the code can be downloaded underhttp://sco.h-its.org/exelixis/aberer/rogue-raxml.tbz
> Email:  andre.abe...@h-its.org
> WWW:    http://www.exelixis-lab.org
>        http://www.h-its.org/english/research/sco/index.php
>
> Amtgericht Mannheim / HRB 337446
> Managing Directors: Dr. h.c. Dr.-Ing. E.h. Klaus Tschira, Prof. Dr.-Ing. Andreas Reuter
>
>  application_pgp-signature_part
> < 1 KBAnzeigenHerunterladen

Leo

unread,
Oct 3, 2011, 6:29:49 AM10/3/11
to raxml
Thanks all!!
Your replies have been very useful ;)

Alexandre Selvatti

unread,
Jan 17, 2013, 2:26:05 PM1/17/13
to ra...@googlegroups.com
Dear Andre,

While searching the group for "low bootstrap values", in order to get more familiar and interpreting such values correctly, I found this topic and tried the result in roguenarok online server. I jsut wanted a breif explanation on the tree I obtained from the results, if I may.

As far as I understood by reading the paper cited above (Aberer & Stamatakis) and running some tests at the online server, the resulting "taxa list" shows how good the support values get when removing potential (or true) rogue taxa. Is that correct?
Thus, if yes, what would be the threshold (if any) of the "sum of support value" for a reliable, rogue-removed alignment? That it, should I remove all taxa accused of improving the support values when removed or is there a minimum value? I'm interested in this because I would like to keep as many taxa as possible, removing only those that really disturbes the support values.

I beg your pardon for any lack of clarity on these questions, and please let me know where can I make more precise.

Thank you enormously for the attention,

Alex 

Andre J. Aberer

unread,
Jan 17, 2013, 3:51:33 PM1/17/13
to ra...@googlegroups.com
Hi Alex,


> While searching the group for "low bootstrap values", in order to get more
> familiar and interpreting such values correctly, I found this topic and
> tried the result in roguenarok online server. I jsut wanted a breif
> explanation on the tree I obtained from the results, if I may.
>
> As far as I understood by reading the paper cited above (Aberer &
> Stamatakis) and running some tests at the online server, the resulting
> "taxa list" shows how good the support values get when removing potential
> (or true) rogue taxa. Is that correct?

That' correct. 3.0 would mean that the sum of all bootstrap values
increases by 300%. However, the order of the taxa is important: you only
get your x %, if you removed all taxa from the top of the list up until
this one.


> Thus, if yes, what would be the threshold (if any) of the "sum of
> support value" for a reliable, rogue-removed alignment?

I fear, there is none. The roguenarok output gives you the full view of
what's going on: this gives you the opportunity to decide, if 50% (or
just 30%) additional support is worth sacrificing a taxon.

> That it, should I remove all taxa accused of improving the support
> values when removed or is there a minimum value?

My personal cutoff is usually 30%. But again: you may have a rogue in
your list, that marginally improves support value, but only if you
remove that one, the next rogue (down the list) yields you an additional
50%.

> I'm interested in this because I would like to keep as many taxa as
> possible, removing only those that really disturbes the support
> values.
>
> I beg your pardon for any lack of clarity on these questions, and please
> let me know where can I make more precise.

Your question raises a very valid point.

A few more hints:
* make use of the option to exclude taxa from the search, if roguenarok
detects certain taxa, that are very important for your study
* the reduced consensus, you obtain from roguenarok already is a valid
result => the inner branches in this consensus tree were computed on
your bootstrapped dataset (removing the rogues just reveals support
that has been hidden by their presence)
* this is not so much true, if you computed rogues based on the support
values of a ML tree (after all, the ML tree may look pretty different,
when you remove a taxon).

=> If you recompute the dataset, I'd suggest you combine a rogue search
on the bootstrap with a search on the support values of a best-known
ML tree and decide on a set of rogue taxa to remove from the study,
that fits best the question that you strive to answer.


At last a nasty side-note: in forums it usually considered best practice
to open a new thread for a new thing (even if continuing a thread
appears logically).



HTH,
Andre
>> Email: andre....@h-its.org <javascript:>
>> WWW: http://www.exelixis-lab.org
>> http://www.h-its.org/english/research/sco/index.php
>>
>> Amtgericht Mannheim / HRB 337446
>> Managing Directors: Dr. h.c. Dr.-Ing. E.h. Klaus Tschira, Prof. Dr.-Ing.
>> Andreas Reuter
>>
>>


--
Best regards,
Andre J. Aberer

M.Sc. (Bioinformatics)
Scientific Computing Group

Heidelberg Institute for Theoretical Studies (HITS gGmbH)
Schloss-Wolfsbrunnenweg 35
D-69118 Heidelberg

Tel.: +49 6221 533 264
Fax: +49 6221 533 298
Email: andre.aberer <at> h-its <dot> org

Alexandre Selvatti

unread,
Jan 18, 2013, 9:31:52 AM1/18/13
to ra...@googlegroups.com, andre....@googlemail.com
Dear Andre, my sincere apologies for not creating a proper new topic. This will not have a second time.

Thank you immensely for the explanations, they're really helpful. I just did not fully understood the last topic (the one initiated with =>). How exactly this combination of a rogue search with a search on the support values of the best-known likelihood tree can be performed?

Finally, I found the resulting tree rather different from a raxml output (viewing in Archaeopteryx, as recommended). Some branches were drawn vertically and some clades seem politomies. My question for this would be: how to interpret this roguenarok output tree, is it only for visualising the increase on the BS values? Should I run another ML inference on the "rogue excluded" alignment under the same parameters used for the "original rogued" one?

Many thanks for the attention so far,

bests,

Alex

Andre J. Aberer

unread,
Jan 19, 2013, 3:37:27 AM1/19/13
to Alexandre Selvatti, ra...@googlegroups.com
Hi Alex,

> Dear Andre, my sincere apologies for not creating a proper new topic. This
> will not have a second time.
>

no problem!

>
> Thank you immensely for the explanations, they're really helpful. I
> just did not fully understood the last topic (the one initiated with
> =>). How exactly this combination of a rogue search with a search on
> the support values of the best-known likelihood tree can be performed?
>

I just meant to play around a bit:
1 do a search on the bootstrap treeset (start with a low "maximal
dropset" parameter, then increase)
2 do a search on the best-known tree with support values (you have to
have uploaded a ML tree and a bootstrap tree for this)
3 you could take the union of both sets (maybe decide on 2 different
cut-offs and re-do the analysis each time) or only take the rogues
from 1) and check strong rogues from 2) for what could happen to the
ML tree, if you also prune these (you can always use the visualization
tool to have a look at different pruning options)


> Finally, I found the resulting tree rather different from a raxml
> output (viewing in Archaeopteryx, as recommended). Some branches were
> drawn vertically and some clades seem politomies.

that is, because this is a bootstrap tree (just rogues excluded => then
it runs raxml -J <someBootstrapOption> )

> My question for this would be: how to interpret this roguenarok output
> tree, is it only for visualising the increase on the BS values?

if you visualize once the complete tree (w/o pruning) and afterwards
select some rogue taxa and click prune/visualize, you get a side-by-side
view of how the rogues affect the *consensus* tree.

You can do the same again (that's what i meant above) with the support
values ON the ML-tree...you may get different rogues, since different
taxa can be responsible for reducing support values.


> Should I run another ML inference on the "rogue excluded" alignment
> under the same parameters used for the "original rogued" one?
>

Yes. Decide on 1 or more sets of rogues, remove and re-run.


HTH
-Andre

Guanyang Zhang

unread,
Jan 22, 2017, 8:02:56 PM1/22/17
to raxml, phyl...@gmail.com
Hi Joseph, 

Do you know any published studies that discuss the issue of bootstrap sampling based on a sparse matrix? Thanks. 

Guanyang
Reply all
Reply to author
Forward
0 new messages