RAxML help - Model, number of runs and bootstrap

1,030 views
Skip to first unread message

Lucas Andrade Meirelles

unread,
Sep 8, 2013, 1:27:26 AM9/8/13
to ra...@googlegroups.com
Hello

I'm starting to use RAxML for my ML analysis, specially because the software requires less computacional demand.
I need to analyse about 50-100 taxa and create a tree. I'm a beginner user of Linux and I'm enjoying to use it. I've already compiled the software and I also made some simple analyses, but I have doubts in three points:

1- As the program uses differents models (differente from those provided by jModeltest), which model should I use? Is there a program that analyse my DNA aligment and tell me the best model as happens for  jModeltest-GARLI? Or should I use the standard GTRGAMMA?

2- What is a realible number of runs (trees) for this number of taxa? In the manual, the pattern is #20, is that a good number?

3- I think 1000 bootstrap pseudoreplicates are enough (although computationally heavier than the 100 pseudoreplicates in the manual), aren't it?

Thank you guys


Fernando Izquierdo

unread,
Sep 9, 2013, 5:05:43 AM9/9/13
to ra...@googlegroups.com
Hi Lucas,

1- As the program uses differents models (differente from those provided by jModeltest), which model should I use? Is there a program that analyse my DNA aligment and tell me the best model as happens for  jModeltest-GARLI? Or should I use the standard GTRGAMMA?

As far as I know there is not such an option, for DNA data you can use GTRGAMMA or  (faster and less memory consuming) GTRCAT. If you use CAT, you can later optimize the branch lengths under GAMMA.
 

2- What is a realible number of runs (trees) for this number of taxa? In the manual, the pattern is #20, is that a good number?

That can be a good number to start with, but this depends mostly on your data. What you can do is to compute RF distances (see -f r) on your final trees, this will give you an idea on how different are the best topologies you are inferring. Then you can decide if you want to run more independent searches, or rather stick to the so far best tree for the bootstrap analysis.
 

3- I think 1000 bootstrap pseudoreplicates are enough (although computationally heavier than the 100 pseudoreplicates in the manual), aren't it?

Yes, but you can do an a posteriori bootstrap analysis (see -f I), which will give you some confidence on  whether the number of bootstrap replicates you have so far is enough.

 Cheers,
Fernando


Thank you guys


--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lucas Andrade Meirelles

unread,
Sep 10, 2013, 12:29:08 AM9/10/13
to ra...@googlegroups.com
Thanks Fernando,

I did what you said for the number of trees (calculating the RF distances) and testing the bootstrap (calculating TC scores for 100 pseudoreplicates bootstraps).

1- For the RF I used the "-f r" command for a txt file containing all the 20 trees created and I got: "Average relative RF in this set: 0.034903". Is it a good value? I mean, I didn't find this information but I think the value is good when it is near zero, am I right?

2- I used the "-f i" comand in a file containing 100 bootstrap psedoreplicates and I got: 

"Tree certainty for this tree: 20.306742
Relative tree certainty for this tree: 0.534388

Tree certainty including all conflicting bipartitions (TC-All) for this tree: 20.384441
Relative tree certainty including all conflicting bipartitions (TC-All) for this tree: 0.536433"

I took a look in the Salichos & Rokas 2013 paper and I saw that the values they classified as good (if I understand correctly) was lower then 20. Does it mean that my bootstrap is ok?

Sorry for taking you time, but I haven't had time to study this part yet and I didn't find the commands "-f r" and "-f i" and their respective explanations in the version of the manual that I have.
If you have some interesting material that explains these things, I would be greatfull.

Cheers
Lucas

Fernando Izquierdo

unread,
Sep 10, 2013, 5:19:43 AM9/10/13
to ra...@googlegroups.com
Hi Lucas,

I did what you said for the number of trees (calculating the RF distances) and testing the bootstrap (calculating TC scores for 100 pseudoreplicates bootstraps).

I actually was referring the the a posteriori bootstopping analysis (e.g., -I autoMRE -z replicates_file), with capital i
 

1- For the RF I used the "-f r" command for a txt file containing all the 20 trees created and I got: "Average relative RF in this set: 0.034903". Is it a good value? I mean, I didn't find this information but I think the value is good when it is near zero, am I right?

The relative RF distance lies in the range [0,1] and will tell you the percentage of different bipartitions between 2 topologies. Thus, 0.0 means all bipartitions are shared and the topologies are identical.
 

2- I used the "-f i" comand in a file containing 100 bootstrap psedoreplicates and I got: 

"Tree certainty for this tree: 20.306742
Relative tree certainty for this tree: 0.534388

Tree certainty including all conflicting bipartitions (TC-All) for this tree: 20.384441
Relative tree certainty including all conflicting bipartitions (TC-All) for this tree: 0.536433"

I took a look in the Salichos & Rokas 2013 paper and I saw that the values they classified as good (if I understand correctly) was lower then 20. Does it mean that my bootstrap is ok?

Sorry for taking you time, but I haven't had time to study this part yet and I didn't find the commands "-f r" and "-f i" and their respective explanations in the version of the manual that I have.
If you have some interesting material that explains these things, I would be greatfull.

See above about -f i, but if you are also interested in the IC/TC pls look carefully at the original paper, and also at the manual describing its RAxML implementation, which can be found searching this group.

Cheers,
Fernando

Lucas Andrade Meirelles

unread,
Sep 10, 2013, 11:54:01 PM9/10/13
to ra...@googlegroups.com
Thank you again Fernando,

this is the last thing I'll ask, I promisse

I did the a posteriori bootstrap analysis as you suggested and the result was:

"Found 100 trees in File RAxML_bootstrap.T14

# Trees      Avg WRF in %      # Perms: wrf <= 3.00 %
50                  5.85                             0
100                4.98                              1
Bootstopping test did not converge after 100 trees"

What do these averages mean?? Does it mean that 100 isn't enough (as I think)? What is a value that indicates a good bootstrap analysis?

Thank you for your time and for your advices

Cheers

Lucas

Fernando Izquierdo

unread,
Sep 11, 2013, 4:27:52 AM9/11/13
to ra...@googlegroups.com
Hi Lucas,

Your interpretation is correct, when enough n replicates are there (according to the criterion), raxml will print a message like "converged after n replicates".

But for a small dataset you dont need to specify the number of bootstraps and compute the criterion a posteriori all the time. You can also let raxml automatically stop, please read carefully the description on bootstrapping usage here:

http://sco.h-its.org/exelixis/hands-On.html

The details on bootstopping and how many replicates are required are described on this paper:

http://link.springer.com/chapter/10.1007%2F978-3-642-02008-7_13

Cheers,
Fernando


Lucas Andrade Meirelles

unread,
Sep 12, 2013, 1:42:28 PM9/12/13
to ra...@googlegroups.com
Thank you Fernando for the valuable advices and suggestions

You're doing a very good work with RAxML and the support for the users.

Cheers

Lucas 

Em domingo, 8 de setembro de 2013 02h27min26s UTC-3, Lucas Andrade Meirelles escreveu:

Lucas Andrade Meirelles

unread,
Sep 16, 2013, 10:35:16 PM9/16/13
to ra...@googlegroups.com
Fernando, I was analysing another dataset which seems to be very difficult to resolve.

The RF mean for the result trees is 0.29... I did the analysis using 20, 100 and 1000 trees and the result is the same.
What is a good value for the RF distance in this case? I mean, 0.29 is high, but even if I increase the number of runs, the result is the same and the best tree is also similar.

When I tested the bootstrap a posteriori, it converged in 850 replicates. But the values, of course, were low for the several nodes (especially in the ancestral part of the tree)

It means that my sample is very dificult to group, isn't it?

Cheers,

Lucas

Em domingo, 8 de setembro de 2013 02h27min26s UTC-3, Lucas Andrade Meirelles escreveu:

Fernando Izquierdo

unread,
Sep 17, 2013, 3:52:46 AM9/17/13
to ra...@googlegroups.com
Hi Lucas,

Yes, I guess that is expected and makes sense.

But please keep in mind that the group is meant to provide solely help with raxml usage and technical problems. We cannot really help you that much with interpretation of results.

Cheers,
Fernando


Alexandros Stamatakis

unread,
Sep 17, 2013, 8:09:04 AM9/17/13
to ra...@googlegroups.com


On 09/17/2013 09:52 AM, Fernando Izquierdo wrote:
> Hi Lucas,
>
> Yes, I guess that is expected and makes sense.
>
> But please keep in mind that the group is meant to provide solely help with
> raxml usage and technical problems. We cannot really help you that much
> with interpretation of results.

well put Fernando :-)

Lucas, the results you are getting indeed indicate that you are dealing
with a hard to resolve dataset,

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org
Message has been deleted

Lucas Andrade Meirelles

unread,
Sep 18, 2013, 12:27:00 AM9/18/13
to ra...@googlegroups.com
Hi Fernando and Alexis,

I know about the interpretation of the results.. =)
They're only some tests that I'm making to learn how to use the software and how are the results with differents datasets. 
It's hard to learn when nobody in the lab has experience with the technique, but I'm trying. 

Again, thanks for the support

Cheers

Lucas 
Reply all
Reply to author
Forward
0 new messages