Hi Florinda,
to be on the really safe side, and if feasible (time-wise), make three inferences.
One using the simplest model, JC, one with the model suggested by Modeltest-NG (if in-between) and one using the most parameter rich (GTR+G+I). Then compare the results:
- If appyling a model is necessary, the JC topology will be substantially different from the suggested-best model and GTR+G+I but the latter two will give you about the same results.
- If the model choice is very important, i.e. the results are strongly model-dependent, then the suggested-best model result may be different from the GTR+G+I.
- If you get the same trees, your data follows a quasi-strict clock/ evolves peacefully and straightforward (quite rare with big datasets)
For far the most datasets, it'll be 1. The reason for this is that RAxML-ng, like RAxML before, does a pretty
good job optimising a fitting model during the final optimisation step. I.e. if you have data following
evolving according to a HKY-like model, Modeltest-NG will give you this
as result. When you then run RAxML with GTR+G+I, it will optimise model parameters that
approach a HKY-ish model, and give you a similar tree. You can check the optimised model parameters in the log-output files.
But running a more complex-evolving dataset on a
parameter-poor model (without variables in the case of JC) hinders RAxML from properly fit the model to the data and inviting branching artefacts, so-called "false positives".
The fewer variables, we have to optimise, the faster the analysis. Which is (actually) the only reason we make a partition- and modeltesting (in case of nucleotide data). To save CPU time. There are also data situations in which over-parametrising may have a detrimental effect. Why using the suggested best model is usually safe. But not always: Biological data hardly follows models and any modeltesting averages across the entire data/pre-defined partitions. So, if it's 2, then it can be an indication to re-partition the data and/or to do a more in-depth analysis of the signal in your nucleotide matrix.
If it's 3., you can skip ML: a quick-to-infer NJ tree will be as optimal :)
Cheers, Guido