Positive selection on branch but not on sites?

camilla....@gmail.com

unread,

Apr 27, 2016, 1:01:56 PM4/27/16

to PAML discussion group

Hello,

I've been running codeml on a large dataset and investigating possible positive selection along certain branches. I used the likelihood ratio test to compare my test model (model = 2, fix_omega=0, NSsites =2) with the null model (model = 2, fix_omega=1, NSsites=2). I used LR = 2*(lnL1-lnL2) then I calculated p-values with 1 d.f.

The results look ok apart from one branch that is under pos selection with high significance (p-value = 1.68E-12, lnL1= -179449.502514, lnL2 = -179474.413117, LR = 24.91) but has no sites identified by NEB or BEB analyses:

kappa (ts/tv) = 1.52404

dN/dS (w) for site classes (K=4)

site class             0        1       2a       2b
proportion       0.92529 0.07471 0.00000 0.00000
background w     0.06474 1.00000 0.06474 1.00000
foreground w     0.06474 1.00000 1.00000 1.00000

Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol. Biol. Evol. 22:1107-1118)
Positive sites for foreground lineages Prob(w>1):

There don't look to be convergence problems with this branch but I have tried starting omegas of <1 and >1 anyway. This doesn't change the BEB output. Is there a scenario where a foreground branch would be under positive selection but there are no individual sites that are significantly under PS? Or, is it more likely that this is a problem with my dataset?

Any help greatly appreciated, I am fairly new to PAML.

Thanks!

Milly

Message has been deleted

Jeremy Chase

unread,

Apr 28, 2016, 4:26:09 PM4/28/16

to PAML discussion group

Are you sure you're calculating the p-value correctly?

Using python and the chi-squared distribution...

>from scipy import stats

>lnL1 = -179449.502514

>lnL2 = -179474.413117

>print stats.chi2.cdf(2*(lnL1 - lnL2), 1)

0.999999999998

camilla....@gmail.com

unread,

Apr 28, 2016, 5:24:20 PM4/28/16

to PAML discussion group

Hi Jeremy,

Thank you for your reply. I actually quoted my LR value incorrectly it should be 49.82121 but that doesn't affect the results here. To calculate my P-value I'm using these commands in R:

> lnL1=-179449.50251

> lnL2=-179474.413117

> lr=2*(lnL1-lnL2)

> pchisq(lr, df=1, lower.tail=F)

[1] 1.684132e-12

I can replicate your P-value in python but now I'm not sure why we are getting wildly different values. I'm not familiar with stats.chi2.cdf. I checked the literature and thought the R function pchisq with lower.tail=F was what I should be using. I can also replicate that result in excel. Is there a problem with my P-value calculation? That would certainly explain the discrepancy between my branch and site results...

Thanks again,

Milly

cajawe

unread,

Apr 28, 2016, 6:27:25 PM4/28/16

to PAML discussion group

No, you shouldn't expect to see any BEB or NEB sites: your parameter values show p0 + p1 = 1 (and p2 = 0), indicating that none of your alignment is falling under the "positive selection" class.

Why you have a significant test result is the more confusing issue, as your branch-site alternative model is fitting the same as a M1a random-sites model. If specified properly, your branch-site null model should do the same, giving you a LRT of 0 and a P of 1. I'd double-check the fit of the null.

Jeremy Chase

unread,

Apr 29, 2016, 8:15:19 PM4/29/16

to PAML discussion group

Does lnL1 refer to your null or alternative model?

camilla....@gmail.com

unread,

Apr 29, 2016, 9:11:22 PM4/29/16

to PAML discussion group

Hmm, that is strange. I ran the null model twice (giving the same output) but apart from that I'm not sure how best to check the fit of my model. What would be the best indicator? The parameters for the null and alternative models are exactly the same apart from changing fix_omega = 0 (alternative) to fix_omega = 1 (null).

lnL1 is from my alternative model (-179449.502514) and lnL2 (-179474.413117) is from my null model. I've rechecked these and they are correct.

Thanks again for the replies!

Milly

Jeremy Chase

unread,

Apr 29, 2016, 10:37:39 PM4/29/16

to PAML discussion group

Ah, for the record the python results are essentially the same, just forgot to subtract from 1.

>>> print (1 - stats.chi2.cdf(2*(lnL1 - lnL2), 1))

1.68409730605e-12

If you are willing to share, what does your tree file look like?

Ziheng

unread,

Apr 30, 2016, 8:17:16 AM4/30/16

to PAML discussion group

As pointed out above, the parameter estimates suggest that the two lnL should be equal, and there is no evidence for positive selection.
It may be some kind of numerical problem with the iteration, but I can't tell what it may be.
How many sequences do you have? How long are they? Perhaps copy the control file here, as well as the tree with estimated branch lengths.
Ziheng

camilla....@gmail.com

unread,

May 2, 2016, 3:07:28 PM5/2/16

to PAML discussion group

I've attached my two control files (for both alternative "ALT.ctl" and "NULL.ctl" model runs). I've also attached a copy of my tree with estimated branch lengths with the problematic branch labeled (#1). I have replaced the sequence names with numbers but apart from that they should be the same as those I used in my analysis.

There are 312 sequences and the majority of these should be ~1,100 bp however there are some gene fragments included.

Thanks again!

Milly

ALT.ctl

NULL.ctl

unlabelled.tree_539.treefile

Reply all

Reply to author

Forward