Is there a cutoff for omega to be considered reliable

Rosaline Macharia

unread,

Aug 11, 2015, 7:23:30 AM8/11/15

to PAML discussion group

Dear Paml users,

I am a newbie to codeml and would like to ask if there is a maximum value of omega that is considered reliable when analyzing for selection in gene sequences.

I have carried out analysis on genes across 5 species of the same genus that are not too divergent. I compared M2a-M0, M2a-M1a and M8-M8a models and found that the latter was representing the data better compared to the first two models.
My concern though is the omega values I got with M8 model which range between 1 - 999. I am wondering if there is a cutoff to consider the results reliable. from my reading it seems that high omega values result due to lack of synonymous change across sequences.

You advise is highly appreciated.

Kind regards;
Rosaline

haoya...@gmail.com

unread,

May 3, 2016, 8:16:30 AM5/3/16

to PAML discussion group

Hi, Rosaline

Have you solved the problem about reasonable omega value? I have the same question as yours, could you give me some suggestion about outliers-omega?

Best wishes

Yan Hao

Alex Sullivan

unread,

Jun 3, 2016, 6:33:31 AM6/3/16

to PAML discussion group

I also have the same question.

I have a gene 334 codons in length, characterized in 65 species/subspecies in the same genus. dN/dS over the whole gene is typical, 0.3584.

Both M2a and M8 are significant and identify the same selected sites. But the parameter estimate for w2 is very high:

Model 2: PositiveSelection

dN/dS (w) for site classes (K=3)

p: 0.91088 0.07718 0.01195

w: 0.00000 1.00000 106.13246

I see that this omega corresponds to the dN/dS value estimated using NEB. The BEB method is more reasonable, omegas around 10.4.

I've tried running codeml using different trees and different t and w starting values and get the same results.

So can I trust the results of this analysis? And is w=10.4 a better a better estimate for w2 than 106.13?

cajawe

unread,

Jun 7, 2016, 4:39:34 AM6/7/16

to PAML discussion group

w2 = 106.13 is your best estimate—your maximum likelihood estimate—for the strength of selection in the positively selected site class when considering your entire data set.

w2 = 10.4 is your best estimate—your adjusted (Bayesian) posterior mean estimate—for the strength of selection for a particular site.

Which is better depends on your requirement: to speak about the gene as a whole, or to speak about a particular site.

In terms of whether your w2 ML estimate is high...

Your w2 = 106.13 estimate probably has a large margin of error; is it significantly greater than, say, 100? 50? 10?
Is there a meaningful difference between w2 = 100, 50, or 10, etc. given that all of them are quite high relative to w2 = 1, and given that the size of the positively selected site class (p2) is small?

The first question can be addressed by either estimating the standard error for your w2 parameter (via the getSE flag in the control file) or by fitting a null model with w2 fixed at 100, 50, 10, etc.

The second question is more difficult to answer with certainty, but it's been argued elsewhere on this message board that no, there isn't really much of a meaningful difference between such large values.

Vigzy 77

unread,

Mar 28, 2019, 12:12:43 PM3/28/19

to PAML discussion group

I have a similar question about NEB and BEB but in Branch Site tests.

For some branches, I get w2 estimates of 200,500,999 etc.

How can I find the omega value using BEB?

What exactly are these values below and how to interpret them?

The grid (see ternary graph for p0-p1)

w0: 0.050 0.150 0.250 0.350 0.450 0.550 0.650 0.750 0.850 0.950

w2: 1.500 2.500 3.500 4.500 5.500 6.500 7.500 8.500 9.500 10.500

Posterior on the grid

w0: 0.166 0.834 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

w2: 0.723 0.094 0.050 0.033 0.024 0.019 0.016 0.015 0.013 0.013

Regards,

Vigzy

Ziheng

unread,

Mar 30, 2019, 5:03:09 AM3/30/19

to PAML discussion group

there is no cutoff. if you have the SE, it tells you how large the sampling errors are. or if you use the LRT, the test result also indicate how reliable the estimate is.

this is analogous to a coin tossing experiment. if you toss a coin once and observe 0 heads, the MLE of probability of heads is 0, but it may not be surprising. but if you toss a coin 10 times and observe 0 heads, the MLE of probability of heads is still 0, but it is surprising. the LRT assesses how likely the extreme data occur by chance.

ziheng

Ziheng

unread,

Mar 30, 2019, 5:04:43 AM3/30/19

to PAML discussion group

the beb results are explained in the following paper:

Yang, Z., et al. (2005). "Bayes empirical Bayes inference of amino acid sites under positive selection." Mol. Biol. Evol. 22: 1107-1118.