Correct way to do LRT statistics?

131 views
Skip to first unread message

kim.ve...@gmail.com

unread,
Feb 27, 2023, 12:26:34 PM2/27/23
to PAML discussion group
I am calculating 2*(modelLR - nestedModelLR) and then in R doing pchisq(value, df, lower.tail=FALSE). I decided on lower.tail=FALSE because my hypothesis is that the model with positive selection will be a better fit (so a one-direction prediction).

My questions are:
(1) Is this the correct chi-squared test?
(2) Do I have the tails correct in my chi-squared test?
(3) Do I use the same statistical method for branch, site, and branch-site models?

Thank you in advance, and apologies for the basic question--I haven't found a good explanation for this step.

Best,
Kim

Sandra AC

unread,
May 10, 2023, 5:59:07 AM5/10/23
to PAML discussion group
Hi Kim, 

You may want to have a look at the R scripts we provide as part of the GitHub repository for the recently published CODEML protocol. If you clone the GitHub repository and go to directory `01_protocol_analyses`, you have R scripts called `Find_bestmodel.R` that you can adapt to your analyses inside `01_protocol_analyses/01_site_models`, ` 01_protocol_analyses/02_branch_models`, and ` 01_protocol_analyses/03_branchsite_models` (note that the content you will see depends on the analyses with a specific test data we describe in the protocol). You will see that the procedure to calculate the LRT is also explained in the protocol :) 

Hope this helps!
Sandra

Denis

unread,
May 10, 2023, 11:56:08 AM5/10/23
to PAML discussion group
Dear Kim,

Many thanks for the links you shared. They are very informative. I quickly checked the  `Find_bestmodel.R` files for the  branch, site, and branch-site models. In all cases you have used the same "pchisq(value, df, lower.tail=FALSE)" R command to calculate p-values for the LRT tests. Am i right ? Could you confirm please that is recommended way now , as previously i saw information that this approach for p-values calculation is suitable only for the branch models?

Many thanks,
Denis



среда, 10 мая 2023 г. в 12:59:07 UTC+3, sandr...@gmail.com:

Denis

unread,
May 10, 2023, 12:17:16 PM5/10/23
to PAML discussion group
Dear  Sandra,

Thank you for the links you provided!

Regards,
Denis

среда, 10 мая 2023 г. в 18:56:08 UTC+3, Denis:
Message has been deleted

Sandra AC

unread,
May 12, 2023, 12:06:17 PM5/12/23
to PAML discussion group
Hi Denis, 

I just saw the other message that was addressed to Kim, but just realised it might have been addressed to my reply to her message. 

You are correct, the chi-square approximation has some regularity conditions (e.g., H0 corresponds to H1 with the parameter inside the parameter space; not at the boundary). This does not apply to the site models or branch-site models. For example, p2=0 in the M1vsM2 comparison is at the boundary of the space. However, the "correct" null distribution is in general not known under the branch and the branch-site models, and the use of the chi-square tends to make the test more conservative (i.e., not rejecting the null as often as it should). See for a reference lines 50-62 in the R script `Find_bestmodel.R` used for the branch-site model and also Yang 2007.

Hope this helps!
Sandra

Denis

unread,
Jul 14, 2023, 7:57:09 AM7/14/23
to PAML discussion group
Dear  Sandra,
I have to apologize for my super slow response. I much appreciate your detailed explanation. May i ask you one more question?

According to the PAML documentation, we can use regular chi-square distribution in branch and branch-site models for p-value calculation.  But what should we do to calculate p-values for the site models? I didn't find that in the documentation.

P.S. Will you plan to release a similar tutorial about the application of the clade models?

Many thanks,
Denis

пятница, 12 мая 2023 г. в 19:06:17 UTC+3, sandr...@gmail.com:

Sandra AC

unread,
Jul 14, 2023, 8:36:57 AM7/14/23
to PAML discussion group
Dear Denis, 

You will find the R script for the site models if you follow this link -- hope this is what you were looking for! Please take a look at the GitHub repository we released as it is supposed to be used together with the CODEML tutorial :)

Despite the fact that I would like to contribute with other tutorials for PAML software, I do not know when I am going to have some free time to do so, unfortunately -- working on (too many) other projects at the moment... When the time comes, however, I shall let the PAML community know :)

Have a good one!
Sandra

Denis

unread,
Jul 14, 2023, 9:07:57 AM7/14/23
to PAML discussion group
Dear  Sandra,

Thank you so much for your detailed response and the link you provided!
Now it's clear that the regular chi-square distribution along with the pchisq () R function could be used for p-value calculation in the site, branch and branch-site models.

I'm looking forward to seeing new PAML tutorials!

Many thanks,
Denis

пятница, 14 июля 2023 г. в 15:36:57 UTC+3, sandr...@gmail.com:
Reply all
Reply to author
Forward
0 new messages