Hi PAML users,
I am looking for positively selected sites in plant color genes and have a question about the output site classes in CodeML site models M2a, M8 (PAML 4.7a). My control file is based on the example HIVenvSweden dataset (thus, model = 0, NSsites = 0 1 2 7 8). I can replicate the results from the HIV example just fine. However, when I run my own data, the final site class for positively selected sites (in M2a, M8) is always w = 1.0000, never higher (such that w>1).
For example, the output for M2a has two w = 1.0000 classes, even though I would have expected both a neutral (w = 1.0000 p: 0.187) category AND a positive selection category (w>1, and if there was no positive selection, p: would equal 0.00).
Model M2a
dN/dS (w) for site classes (K=3)
p: 0.74765 0.18752
0.06483
w: 0.10420 1.00000
1.00000
Similarly, the output for M7, M8 are as follows, and there is no 11th category in M8 where w>1.0.
Model M7
dN/dS
(w) for site classes (K=10)
p: 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
0.10000
w: 0.00061 0.00874
0.03020 0.06849 0.12659
0.20743 0.31426 0.45121
0.62451 0.84781
Model M8
dN/dS
(w) for site classes (K=11)
p: 0.09101 0.09101
0.09101 0.09101 0.09101
0.09101 0.09101 0.09101
0.09101 0.09101 0.08989
w: 0.00162 0.01263
0.03321 0.06354 0.10457
0.15811 0.22741 0.31857
0.44538 0.65602 1.00000
Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)
Positively selected sites (*: P>95%; **: P>99%)
Pr(w>1) post mean +- SE for w
18 T 0.590 1.764 +- 1.413
Is it simply the case that there is no significant positive selection detected, and so the output will never show a category where w>1? (e.g. In the case of M2a, is the third site class constrained to a value of w>1, so that if there’s no positive selection and if codeml doesn’t estimate an w value of >1, then it automatically outputs exactly 1.0000 again?)
If anyone has any thoughts on this, that would be much appreciated - thanks!