nan error in rst file

Jules

unread,

Dec 21, 2017, 8:57:44 AM12/21/17

to PAML discussion group

Hey google group members,

I just figured out that my BEB analysis of codeml has some "nan" entries. I was wondering what it means?
Here some extra information. My codeml analysis ran on one gene with 4 species. The phylogenetic tree is a species tree based on other genes (raxML). I ran the M0 1 2 3 7 8 and 8a model of codeml and everything went fine.

But I got this result in the rst file under model 8:

Bayes Empirical Bayes (BEB) probabilities for 11 classes (class) & postmean_w
(amino acids refer to 1st sequence: GPE)

   1 S   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   2 A   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   3 H   0.00000 0.00068 0.00150 0.00273 0.00455 0.00731 0.01175 0.01979 0.03866 0.00000 0.95432 (11) 1.494 +- 0.000
   4 G   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   5 R   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   6 R   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   7 V   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   8 T   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
   9 P   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
10 Q   0.00000   nan   nan   nan   nan   nan   nan   nan   nan 0.00000   nan ( 1) 0.000 +- 0.000
11 ...

During the run there was also a permanent terminal output - something like:

strange: f[105] = -0.0319803 very small
(but several times with different values)

I know that this value should not be negative. But how can I change that? I already used different trees (optimized one), set the branch length to "initial" and "fixed", but nothing changed in the output. The alignment is fine (no gaps, same length)
Actually, this is just for a graphic, but I wondered if this also means that my analysis got totally wrong?
All others results seem to be OK.

Cheers and THANK YOU IN ADVANCE!
Jules

Ziheng

unread,

Jan 8, 2018, 10:59:51 PM1/8/18

to PAML discussion group

nan means not a number.

yes, you are right: this means that something has gone horribly wrong.

but i am not sure what happened. what happens if you run the analysis another time?

if you can't figure out, you can send me the three files to me by email, and let me know which version you used.

ziheng

cajawe

unread,

Mar 16, 2018, 2:33:57 PM3/16/18

to PAML discussion group

Dear Ziheng,

I ran into this same problem where BEB fails under M8. I was running codeml v4.9e, but I then saw that you posted an updated version (v4.9g) that introduces a fix for this bug so I downloaded this new version and re-did my analyses.

But when running this new version, I found that BEB under M8 (with ncatG=10) operated assuming 5 site classes. The M8 run optimized over 11 site classes (10 from the beta-10 distribution, plus the pos-sel class), and NEB showed 11 classes as well: it was only BEB that used a lower number of classes. This confused me, so I went back to 4.9a, which performs BEB the way I'm used to.

To demonstrate the issue, I ran M8 on the HIVNSsites example using 4.9g, with the only change to the /examples/HIVNSites/codeml.ctl file being NSsites = 0 1 2 ----> NSsites = 8. Here are the first few lines of the NEB and BEB sections:

Supplemental results for CODEML (seqf: HIVenvSweden.txt treef: HIVenvSweden.trees)

MLEs of dN/dS (w) for site classes (K=11)

p: 0.07995 0.07995 0.07995 0.07995 0.07995 0.07995 0.07995 0.07995 0.07995 0.07995 0.20050

w: 0.00000 0.00087 0.01830 0.12591 0.42349 0.77529 0.95196 0.99483 0.99983 1.00000 3.47036

Naive Empirical Bayes (NEB) probabilities for 11 classes & postmean_w & P(w>1)

(amino acids refer to 1st sequence: U68496)

1 V 0.00000 0.00000 0.00000 0.00009 0.00721 0.04665 0.08000 0.08907 0.09015 0.09018 0.59666 (11) 2.455 0.597

2 V 0.20277 0.20241 0.19536 0.15706 0.08623 0.04276 0.03016 0.02772 0.02745 0.02744 0.00066 ( 1) 0.207 0.001

...

Bayes Empirical Bayes (BEB) probabilities for 5 classes (class) & postmean_w

(amino acids refer to 1st sequence: U68496)

1 V 0.00031 0.00716 0.03134 0.10007 0.86113 ( 5) 2.185 +- 0.800

2 V 0.69536 0.16724 0.07863 0.04811 0.01066 ( 1) 0.265 +- 0.293

I am unsure whether this reduced number of site classes for BEB is an undocumented feature or a bug. Can you clarify for me? If it is indeed a feature, what is the rationale?

Many thanks.

Cam

Ziheng

unread,

Mar 28, 2018, 9:26:27 AM3/28/18

to PAML discussion group

int n1d = 4, M2a = 1, ternary = 1, trianglePriorM8 = 0, ip[4] = { 0 };

this line in codeml.c in the function lfunNSsites_M2M8 should be

int n1d = 10, M2a = 1, ternary = 1, trianglePriorM8 = 0, ip[4] = { 0 };

its a bug. i probably changed n1d = 10 to n1d = 4 in the code to do some testing in my working copy, and afterwards forgot to change it back. i'll check and fix this.

thanks for pointing this out.

ziheng

BMG

unread,

Apr 11, 2024, 3:13:37 PMApr 11

to PAML discussion group

Dear Ziheng,

I hope this message finds you well. I am revisiting results from an older codeml run (v4.9, model M8) and noticed significant differences and higher omega values when I reran it with a different version. Could you please clarify if there were any changes to codeml that might have caused this discrepancy? I encountered the same issue as the original poster, where I received 'nan' values, but this problem was resolved when I used the newer version. For version 4.9, I ran the analysis twice and obtained similar results and 'nan' values both times. The inputs for both versions were exactly the same.

Version 4.10.0

Snippet of output file

Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)

Positively selected sites (*: P>95%; **: P>99%)

(amino acids refer to 1st sequence: ref)

Pr(w>1) post mean +- SE for w

32 G 0.861 8.839 +- 3.410

70 G 0.897 9.191 +- 3.005

174 D 0.857 8.803 +- 3.445

Snippet of rst file

Bayes Empirical Bayes (BEB) probabilities for 11 classes (class) & postmean_w

(amino acids refer to 1st sequence: ref)

1 M 0.53875 0.08336 0.05745 0.04458 0.03627 0.03014 0.02521 0.02101 0.01735 0.01778 0.12812 ( 1) 1.468 +- 3.348

2 G 0.55965 0.08617 0.05894 0.04540 0.03667 0.03026 0.02514 0.02081 0.01708 0.01739 0.10250 ( 1) 1.209 +- 3.040

3 G 0.55965 0.08617 0.05894 0.04540 0.03667 0.03026 0.02514 0.02081 0.01708 0.01739 0.10250 ( 1) 1.209 +- 3.040

4 N 0.59409 0.09124 0.06217 0.04768 0.03834 0.03148 0.02602 0.02143 0.01749 0.01772 0.05233 ( 1) 0.705 +- 2.231

5 G 0.55965 0.08617 0.05894 0.04540 0.03667 0.03026 0.02514 0.02081 0.01708 0.01739 0.10250 ( 1) 1.209 +- 3.040

.

32 G 0.02437 0.01086 0.01227 0.01315 0.01358 0.01364 0.01334 0.01270 0.01177 0.01333 0.86099 (11) 8.839 +- 3.410

Version 4.9, March 2015 (first run)

Snippet of output file

Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)

Positively selected sites (*: P>95%; **: P>99%)

(amino acids refer to 1st sequence: ref)

Pr(w>1) post mean +- SE for w

23 A 0.969* 1.554 +- 0.000

32 G 0.959* 1.500 +- 0.000

70 G 0.958* 1.497 +- 0.000

133 A 0.969* 1.554 +- 0.000

174 D 0.958* 1.498 +- 0.000

217 G 0.958* 1.503 +- 0.000

219 G 0.958* 1.470 +- 0.141

Snippet of rst file

1 M 0.00000 nan nan nan nan nan nan nan nan 0.00000 nan(1) 0.000+-0.000

2 G 0.00000 nan nan nan nan nan nan nan nan 0.00000 nan(1) 0.000+-0.000

3 G 0.00000 nan nan nan nan nan nan nan nan 0.00000 nan(1) 0.000+-0.000

4 N 0.00000 nan nan nan nan nan nan nan nan 0.00000 nan(1) 0.000+-0.000

5 G 0.00000 nan nan nan nan nan nan nan nan 0.00000 nan(1) 0.000+-0.000

.

32 G 0.00000 0.00051 0.00126 0.00245 0.00425 0.00702 0.01152 0.01971 0.03898 0.00000 0.95851(11) 1.500+-0.000

Thanks inadvance!

Janet Young

unread,

Apr 23, 2024, 8:27:16 PMApr 23

to PAML discussion group

I have also seen some discrepancies like this in the past. I found the results with the "nan" lines made a lot less sense - the "nan"s themselves were suspicious, but also the overall estimates of the dN/dSs for each class were not as realistic. Ziheng was able to see why the different PAML versions gave different results.

The note here https://github.com/abacus-gene/paml/blob/master/doc/pamlHistory.txt about version 4.9g tells us that versions 4.9b-f had a bug relating to BEB calculations. I did a bunch of testing - if I remember right, those versions were the ones that gave me frequent "nan" lines.

I had also realized, when tracking down another issue, that there were some discrepancies in the github files for version 4.10.6 - the source code tarballs on github are actually from older versions, whereas the compiled versions truly are from v4.10.6.

I'd go with 4.10.7 if I were you to avoid any doubt about version numbering and to avoid that BEB bug