Positive selection on branch but not on sites?

702 views
Skip to first unread message

camilla....@gmail.com

unread,
Apr 27, 2016, 1:01:56 PM4/27/16
to PAML discussion group

Hello,

I've been running codeml on a large dataset and investigating possible positive selection along certain branches. I used the likelihood ratio test to compare my test model (model = 2, fix_omega=0, NSsites =2) with the null model (model = 2, fix_omega=1, NSsites=2). I used LR = 2*(lnL1-lnL2)  then I calculated p-values with 1 d.f.

The results look ok apart from one branch that is under pos selection with high significance (p-value = 1.68E-12, lnL1= -179449.502514, lnL2 = -179474.413117, LR = 24.91) but has no sites identified by NEB or BEB analyses:

kappa (ts/tv) =  1.52404

dN/dS (w) for site classes (K=4)

site class             0        1       2a       2b
proportion       0.92529  0.07471  0.00000  0.00000
background w     0.06474  1.00000  0.06474  1.00000
foreground w     0.06474  1.00000  1.00000  1.00000

Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)
Positive sites for foreground lineages Prob(w>1):


There don't look to be convergence problems with this branch but I have tried starting omegas of <1 and >1 anyway. This doesn't change the BEB output. Is there a scenario where a foreground branch would be under positive selection but there are no individual sites that are significantly under PS? Or, is it more likely that this is a problem with my dataset?

Any help greatly appreciated, I am fairly new to PAML.

Thanks!

Milly


Message has been deleted

Jeremy Chase

unread,
Apr 28, 2016, 4:26:09 PM4/28/16
to PAML discussion group
Are you sure you're calculating the p-value correctly?

Using python and the chi-squared distribution...
>from scipy import stats
>lnL1 = -179449.502514
>lnL2 = -179474.413117
>print stats.chi2.cdf(2*(lnL1 - lnL2), 1)

0.999999999998


camilla....@gmail.com

unread,
Apr 28, 2016, 5:24:20 PM4/28/16
to PAML discussion group
Hi Jeremy,

Thank you for your reply. I actually quoted my LR value incorrectly it should be 49.82121 but that doesn't affect the results here. To calculate my P-value I'm using these commands in R:

> lnL1=-179449.50251
> lnL2=-179474.413117
> lr=2*(lnL1-lnL2)
> pchisq(lr, df=1, lower.tail=F)
[1] 1.684132e-12

I can replicate your P-value in python but now I'm not sure why we are getting wildly different values. I'm not familiar with stats.chi2.cdf. I checked the literature and thought the R function pchisq with lower.tail=F was what I should be using. I can also replicate that result in excel. Is there a problem with my P-value calculation? That would certainly explain the discrepancy between my branch and site results...

Thanks again,

Milly





cajawe

unread,
Apr 28, 2016, 6:27:25 PM4/28/16
to PAML discussion group
No, you shouldn't expect to see any BEB or NEB sites: your parameter values show p0 + p1 = 1 (and p2 = 0), indicating that none of your alignment is falling under the "positive selection" class.  

Why you have a significant test result is the more confusing issue, as your branch-site alternative model is fitting the same as a M1a random-sites model.  If specified properly, your branch-site null model should do the same, giving you a LRT of 0 and a P of 1.  I'd double-check the fit of the null.

Jeremy Chase

unread,
Apr 29, 2016, 8:15:19 PM4/29/16
to PAML discussion group
Does lnL1 refer to your null or alternative model?

camilla....@gmail.com

unread,
Apr 29, 2016, 9:11:22 PM4/29/16
to PAML discussion group
Hmm, that is strange. I ran the null model twice (giving the same output) but apart from that I'm not sure how best to check the fit of my model. What would be the best indicator? The parameters for the null and alternative models are exactly the same apart from changing fix_omega = 0 (alternative) to fix_omega = 1 (null).

lnL1 is from my alternative model (-179449.502514) and lnL2 (-179474.413117) is from my null model. I've rechecked these and they are correct.

Thanks again for the replies!

Milly
 


Jeremy Chase

unread,
Apr 29, 2016, 10:37:39 PM4/29/16
to PAML discussion group
Ah, for the record the python results are essentially the same, just forgot to subtract from 1.
>>> print (1 - stats.chi2.cdf(2*(lnL1 - lnL2), 1))
1.68409730605e-12

If you are willing to share, what does your tree file look like?

Ziheng

unread,
Apr 30, 2016, 8:17:16 AM4/30/16
to PAML discussion group
As pointed out above, the parameter estimates suggest that the two lnL should be equal, and there is no evidence for positive selection.
It may be some kind of numerical problem with the iteration, but I can't tell what it may be.
How many sequences do you have? How long are they? Perhaps copy the control file here, as well as the tree with estimated branch lengths.
Ziheng

camilla....@gmail.com

unread,
May 2, 2016, 3:07:28 PM5/2/16
to PAML discussion group
I've attached my two control files (for both alternative "ALT.ctl" and "NULL.ctl" model runs). I've also attached a copy of my tree with estimated branch lengths with the problematic branch labeled (#1). I have replaced the sequence names with numbers but apart from that they should be the same as those I used in my analysis.

There are 312 sequences and the majority of these should be ~1,100 bp however there are some gene fragments included. 

Thanks again!

Milly





ALT.ctl
NULL.ctl
unlabelled.tree_539.treefile
Reply all
Reply to author
Forward
0 new messages