Different definitions of the posP parameter between the m2 and m8 models

22 views
Skip to first unread message

Karolis Ramanauskas

unread,
Jul 16, 2021, 8:07:51 PM7/16/21
to bali-phy-users
Hi,

Sorry for the possibly silly question. I am a little confused. Below is the relevant output from bali-phy (version 3.6.0) help commands:

bali-phy help m2:       posP: The fraction of positively selected sites.
bali-phy help m2a:      posP: The fraction of positively selected sites.
bali-phy help m8:       posP: The fraction of neutral sites.
bali-phy help m8a:      posP: The fraction of neutral sites.

Is this a mistake in the documentation or does posP value have different meanings between m2 and m8 models?

Thanks,
Karolis

Benjamin Redelings

unread,
Jul 16, 2021, 8:24:05 PM7/16/21
to bali-ph...@googlegroups.com

Hi Karolis,

Yeah, that is definitely a mistake in the documentation!  I have fixed it on the master branch.  If I make a 3.7.0 release before I do a 4.0, then I will include it.

All the site models have `posP` as the fraction of sites that are positively selected with dN/dS = `posW`.

-BenRI

--
You received this message because you are subscribed to the Google Groups "bali-phy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bali-phy-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bali-phy-users/0dc8756b-71e9-4472-8c8b-2f20a7fcb0e2n%40googlegroups.com.

Benjamin Redelings

unread,
Jul 17, 2021, 5:56:49 PM7/17/21
to bali-ph...@googlegroups.com

OK, so actually my original response here wasn't quite right.  I've updated the command-line documentation so that we have:

m2a         posP: The fraction of positively selected sites.
m2a_test    posP: The fraction of positively selected sites.
m8a         posP: The fraction of *neutral* sites.
m8          posP: The fraction of positively selected sites.
m8a_test    posP: The fraction of positively-selected or neutral sites.

On the master branch, I've updated the command-line documentation for the m8a and m8 models, and added new documentation for the m8a_test model

It is still the case that all the site models have `posP` as the fraction of sites with dN/dS=`posW`.  But m8a (which is the same as m8[posW=1]) does not have an explicit posW parameter.  Since posW=1 for m8a, posP is the fraction of NEUTRAL sites.

The relationship between all the m<number> models from PAML is a bit weird and complicated.  You may have to read the references in the m7, m8a, and m2a models to get the full picture.

-BenRI

P.S. The command line documentation comes from JSON files, like this one:

https://github.com/bredelings/BAli-Phy/blob/master/bindings/models/m8.json

P.P.S. Here is the new description for m8a_test -- it is somewhat long:

Description:

   This is a Bayesian version of the M8a test for positive selection. The
   original M8a test is a likelihood-ratio test. The Bayesian version of the
   test chooses a prior with 50% mass on the null hypothesis H0 and 50% on
   the alternative hypothesis H1:
   
      H0: no positive selection
      H1: some positive selection.
   
   When posSelection = 0, the value of posW is ignored and dN/dS=1. When
   posSelection = 1, the value of posW is used and dN/dS=posW. The posterior
   probability of posSelection gives the posterior probability of H1.  The
   Bayes Factor is given by Pr(posSelection=1)/Pr(posSelection=0).
   
   The M8a test (Swanson et al, 2003) has null and alternative hypotheses
   given by:
   
      H0: m8[posW=1] = m8a
      H1: m8
   
   This is an improvement over the test originally proposed by Yang (2000):
   
      H0: m8[posP=1] = m7
      H1: m8
   
   In the originally proposed likelihood-ratio test, a mixture of conserved
   and neutral sites would not fit the null hypothesis (m7), leading to the
   erroneous inference of positive selection. In the M8a test, the null
   hypothesis (m8a) has been improved to handle both neutral and conserved
   sites, so that positive selection is not incorrectly inferred.

Karolis Ramanauskas

unread,
Jul 19, 2021, 11:57:16 AM7/19/21
to bali-phy-users
Thank you. I was wondering about M8a, this clarifies a lot. For some reason, I am not getting the emails on replies (even though I am subscribed). I actually tracked this down yesterday in the Haskell file SModel.hs:

m8a_test_omega_dist mu gamma n_bins posP posW 0 = m8_omega_dist mu gamma n_bins posP 1.0
m8a_test_omega_dist mu gamma n_bins posP posW _ = m8_omega_dist mu gamma n_bins posP posW

It is nice to have this documented, however.

Thanks again.

Karolis Ramanauskas

unread,
Jul 19, 2021, 12:37:25 PM7/19/21
to bali-phy-users
Just to confirm,

In a hypothetical scenario where m8a_test is used. Assuming the resulting estimate of the parameter posSelection is zero (or very close to zero). The estimate of posW should always be one (or close to one). Is that correct?

Thanks

Benjamin Redelings

unread,
Jul 20, 2021, 4:10:25 PM7/20/21
to bali-ph...@googlegroups.com
-------- Forwarded Message --------
Subject: Re: Different definitions of the posP parameter between the m2 and m8 models
Date: Mon, 19 Jul 2021 10:32:40 -0700
From: Benjamin Redelings <benjamin....@gmail.com>
To: Karolis Ramanauskas <kra...@uic.edu>


On 7/19/21 9:33 AM, Karolis Ramanauskas wrote:
Just to confirm,

In a hypothetical scenario where m8a_test is used. Assuming the resulting estimate of the parameter posSelection is zero (or very close to zero). The estimate of posW should always be one (or close to one). Is that correct?

Not the way it is logged right now.

1. When posSelection=0, the value of posW is never used.  We just use 1.0 instead.  So when posSelection=0, posW is actually drawn from the prior of posW.

When I have used these models, I reported Pr(posSelection=1) and E(posW|posSelection=1) instead of E(posW), so this did not come up.

2. To get expectations conditional on posSeletion=1, I used a command like this:

stats-select -sm8a_test:posSelection=1 < C1.log | statreport --mean -
Note that Pr(posSelection=1) = E(posSelection).

3. It is certainly possible IN THEORY to log `if posSelection == 1 then posW else 1.0` instead of `posW`.  That is a Haskell expression that would do what you want.  However, until version 4 is ready, this might be difficult.

Hmm... do you think E(if posSelection == 1 then posW else 1.0) makes sense mathematically?  I guess it would  correspond to estimating posW under a prior with a point mass on 1.0....

-BenRI

Karolis Ramanauskas

unread,
Jul 20, 2021, 4:31:25 PM7/20/21
to bali-phy-users
Hi,

Points #1 and #2 make sense and clarify the behavior of BAli-Phy. Regarding your question in #3 about the correctness of conditionally logging the sampled posW value (if posSelection=1) or 1.0 (if posSelection=0). I am afraid this question is a little over my head, I have a very rudimentary (and fragmented) understanding of the related theory (although I am slowly learning) and it would be dishonest of me to weigh in. I am sorry.

Thank you for taking the time to help me understand all of this better,
Karolis
Reply all
Reply to author
Forward
0 new messages