codeml error: Warning: Hessian matrix may be unreliable for zero branch lengths

1,507 views
Skip to first unread message

Heidi Viitaniemi

unread,
Oct 1, 2013, 3:01:23 AM10/1/13
to pamlso...@googlegroups.com
Hi again,

I'm running paml4.7 on win7 with 64bit OS. For the codeml I have list multiple alignments (26 genes) I want to analyse. I used phylip to build a tree based on the 26 genes to use as my tree file for codeml.
When I start codeml with stating ndata as 26 and choosing the codons and M0 (from typical codon model selection pane), the run is killed when analysing the first data set with the errorlisted in codeml_screen_output.txt.
I cannot interpret what is wrong with my input files here, since I managed to start the codeml run with self-made test tree for 4 genes and they ran fine, but now I cannot get the real data set started.

My tree file looks like this
 3 1
(Orig:0.00019,Alt:0.00098,NS:0.06584);

But I also tried
 3 1
((Orig:0.00019,Alt:0.00098),NS:0.06584);

But neither option helped.

Any advice would be appreciated!
Thanks,
Heidi
codeml.ctl.tmp
tr_OrigAltNS_align_to_PAML.nuc
codeml_screen_output.txt

Heidi Viitaniemi

unread,
Oct 1, 2013, 3:20:05 AM10/1/13
to pamlso...@googlegroups.com
I'm answering my own question with a further question.
Playing around with the parameters lead me to remove the getSe tick after which the  codeml run finished analysing the 26 genes without errors.
So what in my data could be causing the algorithm to diverge, because my tree does not have zero branch lengths?

Heidi


cajawe

unread,
Oct 5, 2013, 6:53:48 PM10/5/13
to pamlso...@googlegroups.com
Your input tree doesn't have branches with length = 0, but, because of your control settings (fix_blengths), codeml is ignoring them and coming up with its own estimates.  

Since sequences 2 and 3 are identical in alignment #1, the branch lengths are estimated as zero.  And since zero branch lengths cause problems for the SE estimation algorithm (as reported in your mlc file), this is causing your run to crash.

Heidi Viitaniemi

unread,
Oct 8, 2013, 2:18:20 AM10/8/13
to pamlso...@googlegroups.com
Thank you for you response cajawe.
The run worked out fine when I left the SE calculation out and changed the setting in fix_blengths.

Heidi

Raquel Dias

unread,
Jan 8, 2014, 2:33:27 PM1/8/14
to pamlso...@googlegroups.com
Hey guys. I had trouble to generate ancestral sequences with codeml. I had the same error of "Hessian matrix may be unreliable for zero branch lengths" and "xmax = 0.0000e+00". 

Following your discussion I could find out that my problem was in the getSE and fix_blength options of my ctl file.

Thanks for this information!

Ziheng

unread,
Feb 1, 2014, 4:36:36 AM2/1/14
to pamlso...@googlegroups.com
Yes, cajawe is right.
It is possible that the program just printed out a warning message but didn't crash. You can ignore. The warning message. Also getSE = 0 should suffice if you are not trying to calculate the SEs.
Ziheng.

HZ

unread,
Mar 5, 2014, 10:23:44 AM3/5/14
to pamlso...@googlegroups.com
then please give the useful answers a thumb-up so other users can find them quicker :-)

Alexandros Vasilikopoulos

unread,
Nov 10, 2017, 11:07:39 AM11/10/17
to PAML discussion group
Dear all,

I am having a similar problem. I am trying to estimate the hessian matrix for a series of aa alignments. This i need
to estimate divergence times with the approximate likelihood separately for each dataset with mcmctree.

I am doing it with codeml and the tmp0001.ctl files generated when you run mcmctree with the option usedata=3.
eg

seqfile = tmp0001.txt
treefile = tmp0001.trees
outfile = tmp0001.out
noisy = 3
seqtype = 2
model = 2
aaRatefile = $model
fix_alpha = 0
alpha = 0.5
ncatG = 4
Small_Diff = 0.1e-6
getSE = 2
method = 1

Now the output of the log file for some of them looks like this:

lnL  = -20360.141952
Out..
lnL  = -20360.141952
164 lfun, 0 eigenQcodon, 219888 P(t)
Calculating SE's

Warning: Hessian matrix may be unreliable for zero branch lengths

xmax = 0.0000e+00 close to zero at 233!   

The resulting rst2 file with the hessian is generated but the program exits without completing (see above).
Is the calculation of the hessian reliable in this case? If you use getSE=0 the rst2 file is empty then
and the hessian is not calculated.

Many thanks
Alex

Ziheng

unread,
Mar 27, 2018, 3:54:15 PM3/27/18
to PAML discussion group
it is quite troublesome to calculate the hessian matrix numerically, as codeml is doing. 
first the step length is specified by
Small_Diff = 0.1e-6
you can change this and see whether it makes any difference.  in general, if you have lots of sites so that the SE is small, you should use a small value but it is hard to specify.  i have used anything between 1e-6 to 1e-9 in the past. 
the second issue is that your tree seems to have some zero branch lengths.  if you have very few sites in the alignment/partition, perhaps you should think about merging some partitions to have more sites.  Or if some sequences are nearly identical, will it work if you use only one of the identical sequences?
ziheng

Alexandros Vasilikopoulos

unread,
Nov 24, 2018, 7:40:10 AM11/24/18
to PAML discussion group
Hello Ziheng thanks for your reply and sorry for my late answer,

The problem is that i can't remove taxa from the metapartitions because i want to summarize the results from each partition into one combined mcmc file later.
I have 209 partition files. All of them have the same taxa and i use one universal tree with all taxa in each separate analysis. Therefore if i remove some taxa then the datasets
are incomplete and cannot be combined.

MY question more specifically was regarding two parts of the problem.
For some of the 209 analyses i get the following message.

Warning: Hessian matrix may be unreliable for zero branch lengths.

and then the program finishes normally.

For some others i get the second error as well:

eg. xmax = 0.0000e+00 close to zero at 233!
and then the program exits. What is the difference between the 2?

Since all of these very closely related sequences (eg node 233 above) only refer to the very shallow nodes of the tree, they define nodes which are not
so important for me to look at. Would the Hessian matrix be completely unreliable for all nodes (also the deep ones) of the tree? Or is it just a problem which
influences only the nodes defined by the identical sequences?

best wishes and thanks again


Alexandros Vasilikopoulos

unread,
Nov 24, 2018, 7:44:45 AM11/24/18
to PAML discussion group
I forgot to mention that i tried SE = 1e-05 -1e-10 but it did not change something.

best and many thanks again
alex

Ziheng

unread,
Feb 16, 2019, 11:34:21 AM2/16/19
to PAML discussion group
Warning: Hessian matrix may be unreliable for zero branch lengths.

this is a warning because some branch lengths are 0.  obviously if you have some identical sequences, some branch lengths will be 0.  i think you can ignore the warning.


eg. xmax = 0.0000e+00 close to zero at 233!

this one means that the hessian matrix is singular.  the program tries to invert the matrix to get the approximate SEs and if the matrix is singular, inversion will not be possible.  this can be caused by zero branch lengths as well.
ziheng

Frank Papa

unread,
Oct 21, 2019, 1:49:52 PM10/21/19
to PAML discussion group
I'm running into this same error.  I have getSE=1 and fix_blength = 1.  However, I get this error randomly.  About half of the time, the codeml process completes correctly.  About half of the time, the exact same run (same codeml.ctl, same input files) will fail with this error.  Why would this error not appear consistently?

yuzhenp...@gmail.com

unread,
Feb 8, 2021, 10:21:35 PM2/8/21
to PAML discussion group
set  SE = 0. 
Reply all
Reply to author
Forward
0 new messages