Hetertachy models in modelfinder

137 views
Skip to first unread message

Will Chase

unread,
Dec 30, 2017, 2:05:58 PM12/30/17
to IQ-TREE
Hello, I gather that in v1.6 the heterotachy models can be directly tested in ModelFinder (great addition!). I'm wondering if there's a flag to ask ModelFinder to test heterotachy models (ie. H1-H4), I can't find it in the documentation. 

I ran a quick test today with the latest release using -m TESTNEWONLY and didn't see the heterotachy models, so I assume you must need to specify with another flag. 

Thanks!

Bui Quang Minh

unread,
Dec 31, 2017, 4:38:11 AM12/31/17
to IQ-TREE, Will Chase
Dear Will, 
Thanks for your interest. We will add this soon to the documentation, but the quick answer is to use -mrate option. For example, “-mrate G,R,H” will test all combinations of base models (JC, …, GTR) plus Gamma (+G), free-rate (+R) or heterotachy (+H) models. For +H the procedure is similar to +R, i.e., ModelFinder will increasingly test +H2, +H3, … up to a certain number of classes where there is no gain in AIC/BIC score. So the number of classes will be automatically determined.

Note that since we haven’t thoroughly tested this feature, feedback is welcome!

Cheers,
Minh


--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

Will Chase

unread,
Jan 1, 2018, 12:39:25 PM1/1/18
to IQ-TREE
Hi Minh, I tried this and got the heterotachy models to work. I did encounter numerical underflow issues with the +H models, but turning on safe mode allowed me to get around this ("Numerical underflow for lh-derivative-mixlen"). For reference my data is 608 sequences x 680 sites, and it's a rather gappy and divergent alignment, but this is the only model that I've encountered numerical underflow. 

Thanks for the help!

Bui Quang Minh

unread,
Jan 2, 2018, 7:09:37 AM1/2/18
to IQ-TREE, Will Chase
Hi Will,

Then I’m not surprised about the numerical underflow, because each extra heterotachy class will add 2n-3 branch length parameters (n=#sequences). Whereas you have only 680 sites, compared with n=608. Thus, fitting too many parameters will cause estimation problem, leading to numerical instability. Actually I wouldn’t recommend to use +H model for such short alignments (ModelFinder might have concluded that anyway).

Cheers, Minh

Will Chase

unread,
Jan 2, 2018, 4:31:23 PM1/2/18
to IQ-TREE
Indeed, heterotachy models were a poor fit for my data. 

A related question regarding cases where K>>n, as in my dataset. I can attempt removing closely related sequences (ie. cluster to remove sequences with >0.9 identity), which might help some, but I know that this will still leave me with K>>n when running modelfinder. Is it the case that when K>>n the parameter estimates are strictly unreliable, or are they just unstable. ie. If I run modelfinder on this data 10 times, and observe a low variance in the log likelihoods (or BIC/AIC values) can I conclude that the results are reliable, or do I need to be worried about a systematic issue with parameter estimates that could bias multiple runs. 

Would it be helpful to estimate gamma shape parameter, free rates parameters, etc. beforehand using iqtree and specify them for modelfinder?

Details of my analysis:
608 sequences x 680 sites

I've run modelfinder 5 times on this data, and it always returns WAG+R7 (BIC), WAG+R8 (AIC), or WAG+G4 (AICc) as the best model. 

I estimated a tree before running using Fasttree, and I've used that as the starting tree input (-t), I've also tried running modelfinder without a starting tree, and with a fixed topology, and it returns the same result regardless. 

Thank you,
Will

Bui Quang Minh

unread,
Jan 4, 2018, 2:02:45 PM1/4/18
to IQ-TREE, Will Chase
Hi Will,

This is a typically difficult data set due to short sequences. Reducing number of sequences is one way, and another way is to sequence more genes to increase the sequence length. If that is not possible: I would think toward unreliable estimates of the tree topology and not much concerned with model parameter estimates. Here, the trees may likely be different from different runs and also from different programs. Thus I will use different programs (ML, Bayesian) and multiple runs per program. Then collect all trees and do some tree topology tests. I would also perform some congruence test with the 16S/18S rRNA tree e.g. with Internode/Tree certainty (IC/TC) scores. 

Other people here may have further advice to deal with your short data set.

Cheers,
Minh
Reply all
Reply to author
Forward
0 new messages