Re: [raxml] substitution model

Message has been deleted

Alexandros Stamatakis

unread,

Sep 13, 2022, 10:59:48 AM9/13/22

to ra...@googlegroups.com

In case of doubt run Modeltest-NG for model selection to be on the safe
side:

https://academic.oup.com/mbe/article/37/1/291/5552155

Alexis

On 13.09.22 17:57, Florinda D'Archivio wrote:
> I tried to check for the substitution model to use for my analysis.
> RAxML (I am using the GUI version) is suggesting the GTR model. If I
> check with the same file on MEGA, it suggests using the GTR+G+I model.
> Which one am I supposed to use then?
>
> Thank you in advance.
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/be971561-7134-4493-9312-59374610034bn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/be971561-7134-4493-9312-59374610034bn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Affiliated Scientist, Evolutionary Genetics and Paleogenomics (EGP) lab,
Institute of Molecular Biology and Biotechnology, Foundation for
Research and Technology Hellas

www.exelixis-lab.org

Grimm

unread,

Sep 14, 2022, 8:46:33 AM9/14/22

to raxml

Hi Florinda,

to be on the really safe side, and if feasible (time-wise), make three inferences.

One using the simplest model, JC, one with the model suggested by Modeltest-NG (if in-between) and one using the most parameter rich (GTR+G+I). Then compare the results:

If appyling a model is necessary, the JC topology will be substantially different from the suggested-best model and GTR+G+I but the latter two will give you about the same results.
If the model choice is very important, i.e. the results are strongly model-dependent, then the suggested-best model result may be different from the GTR+G+I.
If you get the same trees, your data follows a quasi-strict clock/ evolves peacefully and straightforward (quite rare with big datasets)

For far the most datasets, it'll be 1. The reason for this is that RAxML-ng, like RAxML before, does a pretty good job optimising a fitting model during the final optimisation step. I.e. if you have data following evolving according to a HKY-like model, Modeltest-NG will give you this as result. When you then run RAxML with GTR+G+I, it will optimise model parameters that approach a HKY-ish model, and give you a similar tree. You can check the optimised model parameters in the log-output files.

But running a more complex-evolving dataset on a parameter-poor model (without variables in the case of JC) hinders RAxML from properly fit the model to the data and inviting branching artefacts, so-called "false positives".

The fewer variables, we have to optimise, the faster the analysis. Which is (actually) the only reason we make a partition- and modeltesting (in case of nucleotide data). To save CPU time. There are also data situations in which over-parametrising may have a detrimental effect. Why using the suggested best model is usually safe. But not always: Biological data hardly follows models and any modeltesting averages across the entire data/pre-defined partitions. So, if it's 2, then it can be an indication to re-partition the data and/or to do a more in-depth analysis of the signal in your nucleotide matrix.