Indeed, heterotachy models were a poor fit for my data.
A related question regarding cases where K>>n, as in my dataset. I can attempt removing closely related sequences (ie. cluster to remove sequences with >0.9 identity), which might help some, but I know that this will still leave me with K>>n when running modelfinder. Is it the case that when K>>n the parameter estimates are strictly unreliable, or are they just unstable. ie. If I run modelfinder on this data 10 times, and observe a low variance in the log likelihoods (or BIC/AIC values) can I conclude that the results are reliable, or do I need to be worried about a systematic issue with parameter estimates that could bias multiple runs.
Would it be helpful to estimate gamma shape parameter, free rates parameters, etc. beforehand using iqtree and specify them for modelfinder?
Details of my analysis:
608 sequences x 680 sites
I've run modelfinder 5 times on this data, and it always returns WAG+R7 (BIC), WAG+R8 (AIC), or WAG+G4 (AICc) as the best model.
I estimated a tree before running using Fasttree, and I've used that as the starting tree input (-t), I've also tried running modelfinder without a starting tree, and with a fixed topology, and it returns the same result regardless.
Thank you,
Will