initial values for a Bayesian mixture model

336 views
Skip to first unread message

FLORA XU

unread,
May 31, 2016, 4:46:49 PM5/31/16
to Stan users mailing list

Hi All,

 

I am in the middle of coding a Bayesian mixture model in STAN and get stuck during the process. I would very much like to seek for your help. 

This is link to the paper (model was discussed in section 2 Methods): http://bioinformatics.oxfordjournals.org/content/30/15/2098.long

 

In summary, the paper proposed a folded normal-gamma mixture model for the z-scores of SNPs:

f(z_i) = pi_0 f_0(z_i) + pi_1 f_1(z_i),

where            

1)        f_0(z) = 2 phi(z) I_{z >=0}, where phi(z) is the normal density with mean 0 and standard deviation sigma_0, and I_{z>=0} is an indicator function that takes 1 when z >=0 and 0 otherwise. Additionally, if |z| < 0.68, that SNP is considered null.

2)        f_1(z) ~ gamma( a(x), beta ); covariates x are used to modulate the shape parameter a(x) = exp( x’ * alpha ) with vector alpha following a normal distribution and the rate parameter beta is assumed to have a gamma distribution.

3)        pi_0 + pi_1 = 1 and covariates x again is used to modulate the non-null portion via a logistic regression

            pi_1(x) = Pr (delta_i = 1 | x ) = inv_logit( exp(x’ * gamma))

where deltas were generated with a Bernoulli full conditional distribution, for k in {0, 1}

Weakly informative priors to unknown parameters (alpha, beta, gamma, sigma_0^2) were:

            alpha ~ Normal( 0, Sigma_alpha )

            beta ~ Gamma( a_0, b_0)

            gamma ~ Norma( 0, Sigma_gamma )

            sigma_0^2 ~ InverseGamma( a_0, b_0 )

where the diagonal of both Sigma’s has variance 10000 and a_0 = b_0 = .001

 

The stan code I came up is shown below, as well as attached in the post:

 

functions {

  real cal_theta(real xgamma, real a, real z, real mu, real beta, int ind, real sigmasq){

    /*

    Function cal_theta to calculate the parameter theta which will be

      used to draw the latent indicator delta (0 null; 1 nonnull) following

      Bernoulli distribution

    */

    real log_P_delta[2];

    real P_delta[2];

 

    log_P_delta[1] <- xgamma;

    log_P_delta[2] <- 0;

   

                        log_P_delta[1] <- log_P_delta[1] + ( (a-1)*log(fabs(z) - mu) -

                          beta*(fabs(z) - mu) ) + a*log(beta) - log(exp(lgamma(a)));

                        log_P_delta[2] <- log_P_delta[2] +

                          log(2) - 0.5*log(2*pi()*sigmasq) - (1/(2*sigmasq))*z^2;

 

                        P_delta[1] <- exp(log_P_delta[1]/log_sum_exp(log_P_delta));

                        P_delta[2] <- exp(log_P_delta[2]/log_sum_exp(log_P_delta));

                       

                        if (ind == 1){

                        P_delta[1] <- 1/(1 + exp(log_P_delta[2] - log_P_delta[1]));

                        }

            P_delta[2] <- 1/(1 + exp(log_P_delta[1] - log_P_delta[2]));

                       

    if (ind == 0){// nulls

      P_delta[1] <- 0;

      P_delta[2] <- 1;

    }

              return P_delta[2];

  }

 

}

 

data {

  int N;              // number of SNPs

  int K;              // number of the covariances

  matrix[N, K+1] X;   // matrix of covariances, first column all ones (intercept)

  real Z[N];          // z-scores (observations)

  // location parameter for truncation

  real<lower=0> mu;   // contant cutoff = .68

}

 

transformed data {

  // covariance matrix for alpha & beta

  matrix[K+1, K+1] Sigma;

  Sigma <- diag_matrix(rep_vector(10000, K+1));

}

 

parameters {

  vector[K+1] alpha;    // Normal: shape parameters of gamma in the mixture

  vector[K+1] gamma;    // Normal: fitting through logistic regression for pi

  real beta;            // gamma: rate parameters of gamma in the mixture

  real<lower=0> sigmasq;// inverse gamma: squared scale parameter for the normal in the mixture

  real<lower=0, upper=.5> pi1;

}

 

transformed parameters {

  vector[N] a;          // shape parameter

  vector[N] xgamma;     // dependence of pi1 on x

  real sigma;           // scale parameter

 

  a <- exp(X*alpha);

  xgamma <- X*gamma;

  sigma <- sqrt(sigmasq);

}

 

model {

  real ps[2];     // temp for log component densities

  real theta[N];  // probability when delta is 1

  int delta[N];   // latent indicators

 

  alpha ~ multi_normal(rep_vector(0, K+1), Sigma);   

  gamma ~ multi_normal(rep_vector(0, K+1), Sigma); 

 

  beta ~ gamma(.001, .001);      

  sigmasq ~ inv_gamma(.001, .001);

 

  // calculate theta, for drawing the latent indicators

  for (n in 1:N){

    if (fabs(Z[n]) <= mu) {

      theta[n] <- cal_theta(xgamma[n], a[n], Z[n], mu, beta, 0, sigmasq);

    } else {

      theta[n] <- cal_theta(xgamma[n], a[n], Z[n], mu, beta, 1, sigmasq);

    }

  }

  delta ~ bernoulli(theta);

 

  // two group mixture model

  for (n in 1:N){

    if (fabs(Z[n]) <= mu )

      // f0 folded normal

      ps[1] <- log(2) + log1m(pi1) + normal_log(fabs(Z[n]), 0, sigma);

    else

    // f1 gamma

      ps[2] <- delta[n]*(log(pi1) + gamma_log(fabs(Z[n] - mu), a[n], beta));

     

    increment_log_prob(log_sum_exp(ps));

  }

}

 

At first I just called “stan” and got several error messages suggesting that I should specify the initial values. Now I supplied a initial function “initf1” and call run.R (both the data and run script are attached)

 

load("simData.RData")

 

data <- list(N = length(Z),

             K = ncol(X)-1,

             X = X,

             Z = Z,

             mu = .68)

 

K <- ncol(X)-1; # number of covariates

 

initf1 <- function() {

  list(alpha = rep(0, K+1),

       gamma = c(logit(.05), rep(0, K)),

       beta = 0.1,

       sigmasq = 1,

       pi1 = .05

       )

}

# Fit the model with Stan

fit <- stan(#file = "/space/synwes/1/sxu/R/packages/pathLoo/example/gamma.stan",

           file = "~/R/bayesFactor_pathway/selectPath/gamma.stan",

           data = data, chains = 1, iter = 2000, warmup = 200, thin = 4,

           init = initf1,

           seed = 123)

 

 

Now in addition to the warning message before the run:

DIAGNOSTIC(S) FROM PARSER:

Warning (non-fatal):

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.

If so, you need to call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.

Left-hand-side of sampling statement:

    delta ~ bernoulli(...)


I’m also getting an error message aborting the run

SAMPLING FOR MODEL 'gamma' NOW (CHAIN 1).

[1] "Error : Rejecting initial value:"                             

[2] "  Log probability evaluates to log(0), i.e. negative infinity."

[3] "  Stan can't start sampling from this initial value."         

error occurred during calling the sampler; sampling not done

 

I obtained these initial values from the software generously shared by the authors of the paper.

 

In addition, the simulation data X (10000 by 4) and Z (10000) are generated with parameters

alpha <- c(1.3, .03, .3, .4)

beta <- 2

gamma <- c(-2.94, 1, .5, -.5)

sigmaSq <- 1.2

 

Please let me know if there’s any additional information I could provide for troubleshooting. As a newbie (I started this last week), I didn't know how to debug the program, e.g., when it complaints about the log(0), which log is it calculating? How to locate it? Also, I wasn't quite sure about my way coding latent indicators in stan. I'd really appreciate any input.


Thank you very much in advance for taking a look at this long question! Any insight would be welcome!

 

Thanks!

Flora

run.R
gamma.stan
simData.RData

Bob Carpenter

unread,
Jun 1, 2016, 1:00:50 AM6/1/16
to stan-...@googlegroups.com
You need to track down where the log density goes off
the rails. You can use print statements to track it as
it goes and to evaluate inputs and outputs to make sure
everything's well defined before being called.

But I'd really suggest starting with a simpler working
model and building up. That way, you can see where things
go wrong. It's hard to debug a complicated program that
doesn't work.

Stan should have finite log densities for any input meeting
the constraints of the parameters.

You almost never need initial values in Stan. It can
solve some problems, but it's rare.

What you're calling weakly informative are very diffuse
priors. We don't recommend the gamma(0.001, 0.001) things
anywhere, or really even normal(0, 100) when you expect
the result to have order 1.

I couldn't follow your model. Lots of pieces of it
look problematic (like dividing by a log_sum_exp)
and it looks like you're redefining things
for which there are built-in functions.

You really don't want to do this for efficiency:

> transformed data {
>
> // covariance matrix for alpha & beta
>
> matrix[K+1, K+1] Sigma;
>
> Sigma <- diag_matrix(rep_vector(10000, K+1));
>
> }
> gamma ~ multi_normal(rep_vector(0, K+1), Sigma);

You can just use the vectorized form

gamma ~ normal(0, 100);

which should be orders of magnitude faster.

- Bob


Flora Xu Shu Jing

unread,
Jun 3, 2016, 3:01:45 PM6/3/16
to stan-...@googlegroups.com
Hi Bob, 

Thanks so much for showing me the way boosting the efficiency, as well as some problematic expressions! I was too focused on translating from the software to STAN line by line (to reduce the mismatches to minimal), and hadn't got to the phase of code optimization. 

I'm still figuring out what STAN deems legal. I was able to print out the parameters in the "transformed parameters" block; didn't work very well when placing it inside "model". Now it seems that the major problem lies in drawing those deltas, latent variables in {0, 1}; even after change the inverse gamma prior to a uniform or normal.

To my best knowledge, this model is most similar to what was shown in the User's Guide (version 2.8.0) 9. Finite Mixtures. Instead of a mixture of K normals, this one is a mixture of a folded normal and a gamma, each distribution with hyper parameters. As pointed out in the manual, STAN doesn't support discrete parameter, but the simplex parameter theta to indicate mixing proportions. 

In the paper, the authors have the mixing proportions (indicated as pi(x_i)) depend on the covariate matrix X (data). With no explicit example in the manual (maybe I missed it somewhere), I am not sure if I can model those mixing proportions the way the paper proposed. Do you have any comments or recommendation as how to work around it? 

I had tried two naive ways, yet both got me the same error messages (code attached, uncommenting // in the code gives the second):
[476] "Initialization between (--2, -2) failed after 100 attempts. "                                                                            
[477] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."                                   
error occurred during calling the sampler; sampling not done

1) Ignore the dependence of data, simply adopted the simplex parameter in the manual
2) First define a function to calculate theta -- parameter for the Bernoulli distribution; draw the latent indicator from Bernoulli; then calculate one mixing portion by sum(delta==1)/N 

I can really use some help: can this model be coded in STAN at all? Thank you very much!!!

- Flora


--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/iAunwddMneA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gamma.stan

Bob Carpenter

unread,
Jun 3, 2016, 4:03:35 PM6/3/16
to stan-...@googlegroups.com
Yes, the model can be coded in Stan. At least in the
sense that you can make mixture out of anything.

What's legal is defined in the language reference
part of the manual, but it's about 90 pages of definitions.

What you're probably missing given the error is that every
value of the parameters that satisfies the declared constraints
should have a finite log density.

When you say the print statements didn't work in the model,
what I would recommend is removing all of the model, then
adding it back line-by-line --- there's a bug in the
current Stan or RStan that's ignoring print statements in
the model when the log density is illegal.

- Bob
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <gamma.stan>

Flora Xu Shu Jing

unread,
Jun 27, 2016, 11:33:49 PM6/27/16
to stan-...@googlegroups.com
Hi Bob,

Thanks a lot for the helpful comments!
 
I followed your suggestions and started small. After getting accurate estimate for a pure distribution model, I moved up to the mixture model (simplified version of the desired model): a folded normal and gamma. What I noticed that in stan, if the data was declared one side, e.g., greater than 0, then adding log2() to the log posterior doesn't make much a difference.

Anyhow, I simulated some data with a mixture density: .95 f_0 + .05 f_1 where f_0 is the folded normal (0, sigma), and f_1 is the density of gamma distribution (alpha, beta). 

N <- 10000
pi1 <- .05
pi0 <- 1 - pi1

set.seed(1234)
delta <- sample(c(0, 1), N, replace = T, prob = c(pi0, pi1))

dat <- rep(NA, N)
# normal
sigma <-  1.1
dat[delta==0] <- abs(rnorm(n = sum(1-delta), mean = 0, sd = sigma))

# gamma
alpha <- 1
beta <- 2
dat[delta==1] <- rgamma(sum(delta), shape = alpha, rate = beta)

mix.model <- "
  data {
    int N;
    vector<lower=0>[N] Z;
  }
  parameters {
    real<lower=0> a;
    real<lower=0> beta;
    real<lower=0> sigma;
    simplex[2] pi;
  }

  transformed parameters {
    print(1); print(a);
    print(2); print(beta);
    print(3); print(sigma);
    print(4); print(pi[1]);
  }
  model {
    real ps[2];

    for (n in 1:N){
      ps[1] <- log(pi[1]) + normal_log(Z[n], 0, sigma);
      ps[2] <- log(pi[2]) + gamma_log(Z[n], a, beta);
      increment_log_prob(ps);
    }
  }
"
mix.dat <- list(N = N, Z = abs(dat))

mix.fit <-  stan(model_code = mix.model,
                 data = mix.dat,
                 iter = 5e3, chains = 4, warmup = 1000,
                 thin = 4, seed = 1234)
mix.e <- extract(mix.fit)
lapply(mix.e  , summary)

####################################################
####################################################
# $a
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.229   1.284   1.295   1.295   1.306   1.357
#
# $beta
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.441   1.509   1.525   1.525   1.540   1.620
#
# $sigma
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.048   1.069   1.074   1.074   1.079   1.099
#
# $pi
# V1               V2
# Min.   :0.4876   Min.   :0.4884
# 1st Qu.:0.4976   1st Qu.:0.4975
# Median :0.5000   Median :0.5000
# Mean   :0.5000   Mean   :0.5000
# 3rd Qu.:0.5025   3rd Qu.:0.5024
# Max.   :0.5116   Max.   :0.5124

From the result, there are clear an overestimate on both the shape and rate parameters, a pretty good estimate on sigma, however the proportions are far from the true values. 

Would you point out if there's any coding mistakes I made, or something else that led this results?

I really appreciate the help and insights!!!

Best, 
Flora

Michael Betancourt

unread,
Jun 28, 2016, 6:12:17 AM6/28/16
to stan-...@googlegroups.com
increment_log_prob(ps)

should be

increment_log_prob(log_sum_exp(ps[1], ps[2]))

Or can simply the model by having a single
probability,

real<lower=0, upper=1> pi;

and then using the log_mix function,

increment_log_prob(log_mix(pi, normal_log(Z[n], 0, sigma), log(pi[2]) + gamma_log(Z[n], a, beta)

Flora Xu Shu Jing

unread,
Jun 28, 2016, 11:27:07 AM6/28/16
to stan-...@googlegroups.com
Thanks Michael for catching the error! That was really silly of me.

After making the change from increment_log_prob(ps); to increment_log_prob(log_sum_exp(ps)); and reduced the number of chains from 4 to 1, just to see the result faster, I ran the model and the sigma just blew up... So I added a prior on sigma: 
sigma ~ cauchy(0, 2.5). Both rate and shape parameters were overestimated, while sigma was underestimated (I focus on the median). 

# lapply(mix.e  , summary)
# $a
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.251   1.294   1.311   1.313   1.331   1.378
#
# $beta
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.451   1.521   1.542   1.543   1.566   1.632
#
# $sigma
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
# 0.0007   0.0034   0.0070   5.2070   1.7540 521.2000
#
# $pi
# V1                  V2
# Min.   :2.300e-08   Min.   :0.9935
# 1st Qu.:1.323e-04   1st Qu.:0.9975
# Median :1.128e-03   Median :0.9989
# Mean   :1.457e-03   Mean   :0.9985
# 3rd Qu.:2.531e-03   3rd Qu.:0.9999
# Max.   :6.477e-03   Max.   :1.0000

Is there any way that I could improve this? I also did an experiment, making pi[1] and pi[2] known:

# lapply(mix2.e  , summary)
# $a
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.245   1.287   1.298   1.299   1.310   1.358
#
# $beta
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.463   1.519   1.536   1.537   1.554   1.618
#
# $sigma
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.9221  1.0870  1.1400  1.1370  1.1840  1.3850


Thanks!
Flora






Flora Xu Shu Jing

unread,
Jun 28, 2016, 1:58:48 PM6/28/16
to stan-...@googlegroups.com
Corrections on the experiment (I mis-labeled the mixing proportions. Apologize for the confusions). The result should be 

# lapply(mix2.e  , summary)
# $a
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.6049  0.7928  0.8861  0.9105  0.9960  2.5450
#
# $beta
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.841   1.830   2.397   2.538   3.032   8.855
#
# $sigma
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 1.053   1.087   1.093   1.093   1.098   1.124

Compared to the true values (a = 1, beta = 2, sigma = 1.1), this outcome looks fine. 

Yet the result from the model where pi needs to be estimated seems far away from what one expects to see, have I missed something?

Thanks,
Flora
 

Michael Betancourt

unread,
Jun 28, 2016, 2:02:18 PM6/28/16
to stan-...@googlegroups.com
Look at the mixing coefficient — it’s pegging at 1,
reducing the model to completely Gamma.  The
Gaussian component of the data causes the bias
to larger values when fit with the Gamma, while 
the sigma is entirely unconstrained by the data
and hence roaming to very small and very large
values.  I’ll leave you to diagnose what’s going on
when trying to fit pi.

Reply all
Reply to author
Forward
0 new messages