Empirical vs ML-optimised frequencies

Kenta Renard

unread,

Jul 3, 2024, 7:06:22 AM7/3/24

to raxml

Dear All,

In your experience, to what extent does the use of ML-estimated vs empirical frequencies (F vs FO in RAxML-NG) impact phylogenetic analyses? I am aware that in most cases the likelihood is better, but I have often seen that the best trees in terms of likelihood do not always reflect what is the most biologically plausible hypothesis.

I specifically worried about over-fitting of the data due to too many free parameters, but in practice how does using ML-estimated frequencies impact the number of free parameters? Have you seen cases where ML-estimated frequencies impacted the inferred trees in negative ways (e.g. artefacts or topologies that are likely not correct or inaccurate branch lengths)?

Best wishes,

Kenta

Alexandros Stamatakis

unread,

Jul 4, 2024, 2:51:54 AM7/4/24

to ra...@googlegroups.com

Dear Kenta,

You can use ModelTest-NG to determine the best-fit model.

My gut feeling is that the base frequencies shouldn't have too much of
an impact on topologies, but there are some studies on this:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0985-x

if you od an ML estimate of base freqs on a DNA data partition the
increase in number of free parameters is 3 and 19 on AA data respectively.

Alexis

> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/b3eb0ed9-cb43-448a-8086-356c1426bf21n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/b3eb0ed9-cb43-448a-8086-356c1426bf21n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

ERA Chair, Institute of Computer Science, Foundation for Research and
Technology - Hellas
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.biocomp.gr (Crete lab)
www.exelixis-lab.org (Heidelberg lab)

Oleksiy Kozlov

unread,

Jul 4, 2024, 4:11:53 AM7/4/24

to ra...@googlegroups.com

Dear Kenta,

IIRC empirical frequencies are also considered free parameters in ModelTest-NG and RAxML-NG.

The reasoning is that in both +FO and +F modes frequencies are estimated from the MSA, just using
different methods (ML vs. simple counting).

Best,
Oleksiy

Reply all

Reply to author

Forward