How to determine the positive selection and purifying selection based on codeml output

Li Edwin

unread,

Mar 28, 2024, 4:49:22 AM3/28/24

to PAML discussion group

Hi everyone,

I‘m running codeml program to determine the positive selection of genes. I have 3 problems.

First, I found some cases that gene clusters are positively selected based on LRT test of M1a vs M2a and M7 vs M8. However, the M0 output indicates omega less than 1 (even very low in some cases). So I wonder why it happens, and can I determine the gene clusters under positive selection in these cases?

Second, inversely in some cases, some gene clusters don't pass the LRT test, indicating no significant positive selection happens. But M0 output indicates omega more than one. So how can I interpret this result? Also, how can I confidently judge the purifying selection (just like the LRT test of positive selection)? If not, can I determine purifying selection based on omega? (Since we hope to compare the purifying selection in the microbial communities)

Third, some omega values are 999 or 0.0001, the upper and lower limit of M0 output. I wonder how to treat these omega values. How about dismissing them?

I really appreciate any help you can provide.

Sandra AC

unread,

Apr 3, 2024, 1:58:34 PM4/3/24

to PAML discussion group

Hi there,

Thanks for your message!

I assume that you are testing for positive selection under the sites model, but I may be wrong (sorry!). The first LRT you could run is M0 vs M1a given that these models are nested (i.e., this is a test for variability of selective pressure among amino acid sites rather than a test of positive selection). If you cannot reject the null hypothesis ("M0"), then a model with one ω ratio for all sites and across all lineages may fit your dataset better than a site model. If you have an a priori hypothesis about a lineage in your phylogeny that may be under selective pressure, you may want to run CODEML under the branch model and the branch-site model A (check our protocol for details, examples and code are available in the `positive-selection` GitHub repository). If an M0 model fits your data better, you could interpret the estimated value of ω under M0 as "an overall measure of selective constrain averaged over sites and lineages" (quote from our protocol). In that case, if the value of the ω ratio is lower than 1 for a given gene, then you may say that this gene is overall under purifying selection.

Regarding the extreme values that you are getting, have you run the same analysis twice to confirm that the runs are stable? If there is a lack of synonymous substitutions, you may get 999 (i.e., what the program uses as equivalent to "infinity"). If you lack nonsynonymous substitutions, you may get 0 (the other extreme). In such cases, the LRT is still valid even though estimating a precise value of omega is hard. You may want to check the dN and dS estimates in the output file (nonsynonymous and synonymous rates, respectively), which may help you (biologically) explain the extreme values you are getting for some genes. We show an example of how to interpret results when ω=999 under a branch model in our protocol, which perhaps you find useful. Lastly, please note that you may also get extreme estimates if (i) your sequences are too similar or too divergent or (ii) there are issues with your sequence alignment (see original thread of messages where this issue was discussed).

These are just some thoughts on the matter and I may not be entirely correct. Other PAML users may have better suggestions about what can be happening :)

Hope this helps!
S.

Li Edwin

unread,

Apr 11, 2024, 10:36:27 PM4/11/24

to PAML discussion group

Hi Sandra,

I appreciate your useful explanation!

Your assumption is right. I'm testing for positive selection under the site model. So can I conclude that the M0 output omega doesn't fit my data if LRT test of M0 vs M1a rejects M0, and most cases abovementioned in question 1 and 2 happened when LRT test rejected M0 (p < 0.05)? So it's hard to use a single omega value to describe the selection happening on these gene clusters.

Thank you for your suggestion!

Best,

Edwin

Reply all

Reply to author

Forward