When does ModelFinder skip models?

30 views
Skip to first unread message

Heiko Schmidt

unread,
Sep 1, 2021, 10:17:40 AM9/1/21
to IQ-TREE Forum
Dear Minh,

I have a question about the ModelFinder step.

If one compares the output of IQTREE1 and IQTREE2 one realizes
that version 2 outputs less models in the ModelFinder step.

So my question is how does the ModelFinder code in IQTREE2
decide which models can be skipped?

I know roughly how MF decides when to not anymore increase
the number of rate classes in FreeRate models (+R), but
I am unaware what the decision is based on to leave out
Certain models.


Furthermore, is there an option to force IQTREE2 to
still inspect all models (e.g. all 88 DNA models with
-m TESTONLY) if one is interested in the distributions
of likelihoods and/or BIC/AIC - even though in most
cases my be ‘hopeless’ ;)

Best wishes from Vienna,
Heiko


-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max Perutz Labs
http://www.cibiv.at/
-----------------------------------------------------------------------------

Bui Quang Minh

unread,
Sep 2, 2021, 9:23:51 AM9/2/21
to iqt...@googlegroups.com
Hi Heiko

On Thu, Sep 2, 2021 at 12:17 AM Heiko Schmidt <heiko....@univie.ac.at> wrote:
Dear Minh,

I have a question about the ModelFinder step.

If one compares the output of IQTREE1 and IQTREE2 one realizes
that version 2 outputs less models in the ModelFinder step.

So my question is how does the ModelFinder code in IQTREE2
decide which models can be skipped?

This feature in IQ-TREE 2 is still unpublished, but it works as follows. Assuming that you have DNA.

1. Construct a quick tree under GTR+I+G model using nearest neighbor interchange from the parsimony tree. This tree topology is fixed throughout model selection.
2. Test all the remaining rate heterogeneity models combined with GTR, e.g., GTR+I, GTR+G, GTR+Rx... 
3. Among the variants of GTR+XXX,  find the best rate heterogeneity model and call all the variants that have the BIC score worse than the best model by more than 10 units as "bad". For the remaining we only test "good" models.
4. Test all the remaining substitution models plus the 'good' rate heterogeneity models.

Hope that answers your question. With protein we just change GTR to LG.


I know roughly how MF decides when to not anymore increase
the number of rate classes in FreeRate models (+R), but
I am unaware what the decision is based on to leave out
Certain models.


Furthermore, is there an option to force IQTREE2 to
still inspect all models (e.g. all 88 DNA models with
-m TESTONLY) if one is interested in the distributions
of likelihoods and/or BIC/AIC - even though in most
cases my be ‘hopeless’ ;)


You can use "-mrate ALL" option. -mrate is the option to specify the list of rate heterogeneity to consider. ALL keyword tells it to use all models, ignoring the strategy I explained above.

Cheers
Minh
 
Best wishes from Vienna,
Heiko


-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max Perutz Labs
http://www.cibiv.at/
-----------------------------------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/298CE0B8-CAE8-4E60-A362-50AECE5EA67B%40univie.ac.at.
Reply all
Reply to author
Forward
0 new messages