Is there a cutoff for omega to be considered reliable

671 views
Skip to first unread message

Rosaline Macharia

unread,
Aug 11, 2015, 7:23:30 AM8/11/15
to PAML discussion group
Dear Paml users,

  I am a newbie to codeml and would like to ask if there is a maximum value of omega that is considered reliable when analyzing for selection in gene sequences.

I have carried out analysis on genes across 5 species of the same genus that are not too divergent. I compared M2a-M0, M2a-M1a and M8-M8a models and found that the latter was representing the data better compared to the first two models.
My concern though is  the omega values I got with M8 model  which range between 1 - 999. I am wondering if there is a cutoff to consider the results reliable. from my reading it seems that high omega values result due to lack of synonymous change across sequences.

You advise is highly appreciated.


Kind regards;
Rosaline

haoya...@gmail.com

unread,
May 3, 2016, 8:16:30 AM5/3/16
to PAML discussion group
Hi, Rosaline
Have you solved the problem about reasonable omega value? I have the same question as yours, could you give me some suggestion about outliers-omega?

Best wishes
Yan Hao 

Alex Sullivan

unread,
Jun 3, 2016, 6:33:31 AM6/3/16
to PAML discussion group
I also have the same question.

I have a gene 334 codons in length, characterized in 65 species/subspecies in the same genus. dN/dS over the whole gene is typical, 0.3584. 

Both M2a and M8 are significant and identify the same selected sites. But the parameter estimate for w2 is very high: 

Model 2: PositiveSelection
dN/dS (w) for site classes (K=3)
p:   0.91088  0.07718  0.01195
w:   0.00000  1.00000 106.13246

I see that this omega corresponds to the dN/dS value estimated using NEB. The BEB method is more reasonable, omegas around 10.4.

I've tried running codeml using different trees and different t and w starting values and get the same results.

So can I trust the results of this analysis? And is w=10.4 a better a better estimate for w2 than 106.13?

cajawe

unread,
Jun 7, 2016, 4:39:34 AM6/7/16
to PAML discussion group
w2 = 106.13 is your best estimate—your maximum likelihood estimate—for the strength of selection in the positively selected site class when considering your entire data set.

w2 = 10.4 is your best estimate—your adjusted (Bayesian) posterior mean estimate—for the strength of selection for a particular site.  

Which is better depends on your requirement: to speak about the gene as a whole, or to speak about a particular site.

In terms of whether your w2 ML estimate is high...
  • Your w2 = 106.13 estimate probably has a large margin of error; is it significantly greater than, say, 100?  50?  10?  
  • Is there a meaningful difference between w2 = 100, 50, or 10, etc. given that all of them are quite high relative to w2 = 1, and given that the size of the positively selected site class (p2) is small?
The first question can be addressed by either estimating the standard error for your w2 parameter (via the getSE flag in the control file) or by fitting a null model with w2 fixed at 100, 50, 10, etc.  

The second question is more difficult to answer with certainty, but it's been argued elsewhere on this message board that no, there isn't really much of a meaningful difference between such large values.

Vigzy 77

unread,
Mar 28, 2019, 12:12:43 PM3/28/19
to PAML discussion group
I have a similar question about NEB and BEB but in Branch Site tests. 
For some branches, I get w2 estimates of 200,500,999 etc. 

How can I find the omega value using BEB?

What exactly are these values below  and how to interpret them?
The grid (see ternary graph for p0-p1)

w0:   0.050  0.150  0.250  0.350  0.450  0.550  0.650  0.750  0.850  0.950
w2:   1.500  2.500  3.500  4.500  5.500  6.500  7.500  8.500  9.500 10.500


Posterior on the grid

w0:   0.166  0.834  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
w2:   0.723  0.094  0.050  0.033  0.024  0.019  0.016  0.015  0.013  0.013


Regards,
Vigzy

Ziheng

unread,
Mar 30, 2019, 5:03:09 AM3/30/19
to PAML discussion group
there is no cutoff.  if you have the SE, it tells you how large the sampling errors are.  or if you use the LRT, the test result also indicate how reliable the estimate is.
this is analogous to a coin tossing experiment.  if you toss a coin once and observe 0 heads, the MLE of probability of heads is 0, but it may not be surprising.  but if you toss a coin 10 times and observe 0 heads, the MLE of probability of heads is still 0, but it is surprising.  the LRT assesses how likely the extreme data occur by chance.
ziheng

Ziheng

unread,
Mar 30, 2019, 5:04:43 AM3/30/19
to PAML discussion group
the beb results are explained in the following paper:
Yang, Z., et al. (2005). "Bayes empirical Bayes inference of amino acid sites under positive selection." Mol. Biol. Evol. 22: 1107-1118.
see figure 1 in the paper.
ziheng

Reply all
Reply to author
Forward
0 new messages