Numerical underflow error with GHOST model

256 views
Skip to first unread message

Sergio Andrés Muñoz Gomez

unread,
Sep 27, 2017, 10:10:26 AM9/27/17
to IQ-TREE
Dear Stephen,

I've been getting 'ERROR: Numerical underflow (lh-derivative-mixlen)' every time I try to specify the GHOST model with more than 4 classes, even when I use the -safe argument. I have been trying the GHOST model on an amino acid dataset of 120 taxa and 54,400 sites.
This is the command line I've been using: iqtree-1.6-beta3 -s 120x200.phy -nt 12 -pre iqtree3.3.4 -m LG+H5 -wspm -bb 1000 -alrt 1000. I've also tried LG+F0*H5 with no success. When I use the -safe mode, iqtree doesn't crash, keep running, but the log gets full of ERROR messages. I wonder whether I should ignore them and wait for the result.

Thank you,
Sergio

jp.fland...@gmail.com

unread,
Sep 28, 2017, 2:17:19 PM9/28/17
to IQ-TREE
I got the same message and for small trees this does not seems to affect the result, but I have been obliged to stop the job for a much bigger tree as the test for optimal number of cores seems blocked.
JPF

Stephen Crotty

unread,
Sep 28, 2017, 2:19:54 PM9/28/17
to iqt...@googlegroups.com
Hi Sergio,

There are a couple of things to be wary of here. The error message you are receiving should not be ignored, especially if it is occurring many times. It usually occurs if many branch lengths are close to zero. The default setting for the minimum branch length in IQ-TREE is 10^-6, by increasing the minimum branch length might help. In order to set a larger minimum branch length you can use the option -blmin. So for example, add “-blmin 10^-5” to your command line. You can play around with the value and hopefully it helps.

The second thing is the number of parameters you are estimating vs the amount of sites in your alignment. With 120 taxa and 5 classes you are trying to estimate (2*120 - 3)*5 = 1185 branch lengths. Adding the +FO option means estimating another 19 amino acid frequency parameters per class, so all up in the vicinity of 1300 parameters. It is not clear to us at this stage how the GHOST model performs when estimating so many parameters. We have not yet done simulation studies to test the limits of the GHOST model, but it is on our to do list. It would be great to get feedback (either positive or negative) from users about what they find when using GHOST on large datasets.

Good have some luck getting some results,

Stephen


--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

Stephen Crotty

unread,
Sep 28, 2017, 2:23:11 PM9/28/17
to iqt...@googlegroups.com
Thanks for the feedback JPF, I’ve just replied to Sergio before I saw your message, I discussed the number of parameters as a possible problem. Would you mind telling me how big your dataset is that has given you trouble, and also the size of the datasets that have run well?

Thanks,

Stephen

jp.fland...@gmail.com

unread,
Sep 29, 2017, 2:47:05 AM9/29/17
to IQ-TREE
Dear Stephen,

I was running this 
bin/iqtree-1.6.beta4-MacOSX/bin/iqtree -s ACTINO_EXTENDED_WORKSHOP/METAPROT_H/TREES/concatenat.fst -m bin/RPRM-ACTINO.paml+FO*H4  -mredo -alrt 1000  -bb 1000 -nt auto -safe 
My tree is 813 leaves long for 6063 columns 
RPRM_ACTINO.paml is the specific exchange matrix for these data. 
I tried to set the -nt parameter to 4 but this does not help.
The dataset is free of very similar sequences but I know that some internal nodes distances may be  small.
I have run a similar reconstruction with ~300 leaves, the error message was also massively present but the program has given a satisfying result.
Of course (?) with other smaller datasets (subsets of the big tree) I dont get the error message or only a few time..
I will try your suggested solution.

I am working on the OSX binaries

Sincerely

JPF

Bui Quang Minh

unread,
Sep 29, 2017, 9:40:54 AM9/29/17
to iqt...@googlegroups.com, jp.fland...@gmail.com
Dear JPF,

As Stephen explained, the GHOST model is very parameter-rich. The number of free parameters is equal to k*(2n-3), where k is the number of classes and 2n-3 is the number of branches (n=number of taxa). For your analysis (k=4, n=813), you have more parameters than the sample size (typically #columns in the alignment), and thus you can’t do statistical inference here. In the .iqtree report file there will be a WARNING about this issue, which in turn led to the numerical underflow problem.

Thus you should try the GHOST model with smaller k. That also explains why the numerical problem did not occur for smaller data sets (smaller n).

Hope that helps,
Minh
--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

jp.fland...@gmail.com

unread,
Sep 29, 2017, 11:25:29 AM9/29/17
to IQ-TREE
Thank you Minh

I should have taken care of the number of parameter ! This is a very basic point that I have missed, I was too exited by the new possibility that GHOST give me.
I tried to simplify the model, but the best will be to limit the number of leaves to well selected representatives.
This will be done next week.

Sincerely

JPF
Reply all
Reply to author
Forward
0 new messages