Likelihood Ratio tests for pairwise comparisons

675 views
Skip to first unread message

Dave Carlson

unread,
May 14, 2015, 6:54:29 PM5/14/15
to pamlso...@googlegroups.com
Hi PAML users,

I have a question about pairwise comparisons in codeml.  Previously, I have used the branch and branch-site models to test for positive selection along specific branches of a phylogeny.  I now have pairwise comparison data.  I have used the pairwise option in codeml to estimate, dN, dS, and omega and found some results suggesting an elevated omega among a biologically-interesting subset of my pairwise comparisons.  To explore this further, I would like to explicitly test whether a model that allows omega to be greater than one is a better fit to the data than a model that doesn't (similar to tests using the branch, sites and branch-site models) .  I am curious, however, if it is possible (and reasonable!) to implement a likelihood ratio test of positive selection from these pairwise comparisons?  Is there any reason why I couldn't or shouldn't do this?

Thanks!
Dave

cajawe

unread,
May 15, 2015, 4:40:08 AM5/15/15
to pamlso...@googlegroups.com
Conducting hypothesis tests after having first searching for patterns to guide the hypothesis testing isn't a good idea.  Since you're already checked for and observed elevated dN/dS in particular pairwise comparisons, P values obtained from tests of H0: dN/dS = 1 in this subset will be biased.  

So, if you proceed down this road, you might get called out by reviewers for data dredging.  Perhaps an alternative approach is to report the confidence intervals for the dN/dS estimates obtained from each pairwise comparison (not just your focal subset) and to simply conclude that a survey of the data set strongly suggests that dN/dS is elevated in a particular subset.  



Dave Carlson

unread,
May 15, 2015, 1:34:10 PM5/15/15
to pamlso...@googlegroups.com
Hi cajawe,

Thanks for your response.  What you've suggested is, I think, fairly close to what I've done.  I guess my question, however, is a little bit different.  I'm wondering if it's even possible to implement LRTs to test different hypotheses using pairwise comparison data.  And if it is possible, what type of chi-square distribution would the test statistic follow?  In the paml manual and and the paml 4 paper, LRTs derived from codeml's pairwise comparison option are not even discussed, which makes me think that their implementation is not appropriate, however I am hoping to confirm whether or not this is the case.

杜康

unread,
Mar 17, 2017, 11:12:53 AM3/17/17
to PAML discussion group
Dear Dave,

I'm facing the same problem as you had. Do you have any conclusion for this question now?

Hoping for you  reply. Thank you.

                Sincerely,
                         Kang


P.S. I've searched on the internet for a long time, following is something relevant but not useful, just for your information.

https://faculty.washington.edu/wjs18/lecture2-1.pdf  This one talked about likelihood ratio test for pairwise comparisons, but only calculated the 2*delta lnL, no d.f. was referred.

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0030170   This one do the likelihood ratio test but not with  chi-square distribution.

Ziheng

unread,
Apr 28, 2017, 12:39:52 PM4/28/17
to PAML discussion group

there is nothing wrong with the LRT applied to a pair of sequences. The power may be low.
there are two cases.

case 1.
H0: w = 1 (fix_omega = 1, omega = 1)
Ha: w > 0 (fix_omega = 0)

you use the LRT or chi square with df = 1.  The critical values are 3.84 for 5% and 6.63 for 1%.


case 2.
H0: w = 1 (fix_omega = 1, omega = 1)
Ha: w <= 1 (fix_omega = 0. If the estimated omega < 1, you set omega = 1 and \Delta \ell = 0, and the test is not significant)

you use the 1:1 mixture of 0 and the chi square with df = 1.  The critical values are 2.71 at 5% and 5.41 at 1%.

I think you are trying to use case 2. this was described somewhere, but I can't seem to remember where it was.

ziheng

Jilguero ostras

unread,
Feb 13, 2019, 3:11:51 AM2/13/19
to PAML discussion group
Hi,

I am following these examples in order to make a pairwise comparison of the CDS of orthologous genes of two plant genomes. I am using case 1:

H0: runmode = -2;model = 0;fix_omega = 1;omega = 1
H1: runmode = -2;model = 0;fix_omega = 0;omega = .4

and after using a LRT in order to identify statistical significance. I am suspect of the result because among the significant ones, all of them have an omega << 1 (e.g. 0.1219; e.g. 0.0786, etc), and none of the pairwise comparisons with omega > 1 are identified as significant (e.g. 1.473; e.g. 2.8465, etc). I am attaching a picture with the distribution of omega values I get (blue=as significant; red=as non significant based on the LRT test).

Please, how can I check up if I am introducing kind of bias in the test by mistake or by any other reason?, because I would expect that some of the genes under positive selection are identified as significant.

Comparison of the omega values regardless of the significance (test) would be correct?. Thank you.

Jilguero.


pairwise.omegas.png
Reply all
Reply to author
Forward
0 new messages