initial values for a Bayesian mixture model

FLORA XU

unread,

May 31, 2016, 4:46:49 PM5/31/16

to Stan users mailing list

Hi All,

I am in the middle of coding a Bayesian mixture model in STAN and get stuck during the process. I would very much like to seek for your help.

This is link to the paper (model was discussed in section 2 Methods): http://bioinformatics.oxfordjournals.org/content/30/15/2098.long

In summary, the paper proposed a folded normal-gamma mixture model for the z-scores of SNPs:

f(z_i) = pi_0 f_0(z_i) + pi_1 f_1(z_i),

where

1) f_0(z) = 2 phi(z) I_{z >=0}, where phi(z) is the normal density with mean 0 and standard deviation sigma_0, and I_{z>=0} is an indicator function that takes 1 when z >=0 and 0 otherwise. Additionally, if |z| < 0.68, that SNP is considered null.

2) f_1(z) ~ gamma( a(x), beta ); covariates x are used to modulate the shape parameter a(x) = exp( x’ * alpha ) with vector alpha following a normal distribution and the rate parameter beta is assumed to have a gamma distribution.

3) pi_0 + pi_1 = 1 and covariates x again is used to modulate the non-null portion via a logistic regression

pi_1(x) = Pr (delta_i = 1 | x ) = inv_logit( exp(x’ * gamma))

where deltas were generated with a Bernoulli full conditional distribution, for k in {0, 1}

Weakly informative priors to unknown parameters (alpha, beta, gamma, sigma_0^2) were:

alpha ~ Normal( 0, Sigma_alpha )

beta ~ Gamma( a_0, b_0)

gamma ~ Norma( 0, Sigma_gamma )

sigma_0^2 ~ InverseGamma( a_0, b_0 )

where the diagonal of both Sigma’s has variance 10000 and a_0 = b_0 = .001

The stan code I came up is shown below, as well as attached in the post:

functions {

real cal_theta(real xgamma, real a, real z, real mu, real beta, int ind, real sigmasq){

/*

Function cal_theta to calculate the parameter theta which will be

used to draw the latent indicator delta (0 null; 1 nonnull) following

Bernoulli distribution

*/

real log_P_delta[2];

real P_delta[2];

log_P_delta[1] <- xgamma;

log_P_delta[2] <- 0;

log_P_delta[1] <- log_P_delta[1] + ( (a-1)*log(fabs(z) - mu) -

beta*(fabs(z) - mu) ) + a*log(beta) - log(exp(lgamma(a)));

log_P_delta[2] <- log_P_delta[2] +

log(2) - 0.5*log(2*pi()*sigmasq) - (1/(2*sigmasq))*z^2;

P_delta[1] <- exp(log_P_delta[1]/log_sum_exp(log_P_delta));

P_delta[2] <- exp(log_P_delta[2]/log_sum_exp(log_P_delta));

if (ind == 1){

P_delta[1] <- 1/(1 + exp(log_P_delta[2] - log_P_delta[1]));

}

P_delta[2] <- 1/(1 + exp(log_P_delta[1] - log_P_delta[2]));

if (ind == 0){// nulls

P_delta[1] <- 0;

P_delta[2] <- 1;

}

return P_delta[2];

}

data {

int N; // number of SNPs

int K; // number of the covariances

matrix[N, K+1] X; // matrix of covariances, first column all ones (intercept)

real Z[N]; // z-scores (observations)

// location parameter for truncation

real<lower=0> mu; // contant cutoff = .68

}

transformed data {

// covariance matrix for alpha & beta

matrix[K+1, K+1] Sigma;

Sigma <- diag_matrix(rep_vector(10000, K+1));

}

parameters {

vector[K+1] alpha; // Normal: shape parameters of gamma in the mixture

vector[K+1] gamma; // Normal: fitting through logistic regression for pi

real beta; // gamma: rate parameters of gamma in the mixture

real<lower=0> sigmasq;// inverse gamma: squared scale parameter for the normal in the mixture

real<lower=0, upper=.5> pi1;

}

transformed parameters {

vector[N] a; // shape parameter

vector[N] xgamma; // dependence of pi1 on x

real sigma; // scale parameter

a <- exp(X*alpha);

xgamma <- X*gamma;

sigma <- sqrt(sigmasq);

}

model {

real ps[2]; // temp for log component densities

real theta[N]; // probability when delta is 1

int delta[N]; // latent indicators

alpha ~ multi_normal(rep_vector(0, K+1), Sigma);

gamma ~ multi_normal(rep_vector(0, K+1), Sigma);

beta ~ gamma(.001, .001);

sigmasq ~ inv_gamma(.001, .001);

// calculate theta, for drawing the latent indicators

for (n in 1:N){

if (fabs(Z[n]) <= mu) {

theta[n] <- cal_theta(xgamma[n], a[n], Z[n], mu, beta, 0, sigmasq);

} else {

theta[n] <- cal_theta(xgamma[n], a[n], Z[n], mu, beta, 1, sigmasq);

}

delta ~ bernoulli(theta);

// two group mixture model

for (n in 1:N){

if (fabs(Z[n]) <= mu )

// f0 folded normal

ps[1] <- log(2) + log1m(pi1) + normal_log(fabs(Z[n]), 0, sigma);

else

// f1 gamma

ps[2] <- delta[n]*(log(pi1) + gamma_log(fabs(Z[n] - mu), a[n], beta));

increment_log_prob(log_sum_exp(ps));

}

At first I just called “stan” and got several error messages suggesting that I should specify the initial values. Now I supplied a initial function “initf1” and call run.R (both the data and run script are attached)

load("simData.RData")

data <- list(N = length(Z),

K = ncol(X)-1,

X = X,

Z = Z,

mu = .68)

K <- ncol(X)-1; # number of covariates

initf1 <- function() {

list(alpha = rep(0, K+1),

gamma = c(logit(.05), rep(0, K)),

beta = 0.1,

sigmasq = 1,

pi1 = .05

)

}

# Fit the model with Stan

fit <- stan(#file = "/space/synwes/1/sxu/R/packages/pathLoo/example/gamma.stan",

file = "~/R/bayesFactor_pathway/selectPath/gamma.stan",

data = data, chains = 1, iter = 2000, warmup = 200, thin = 4,

init = initf1,

seed = 123)

Now in addition to the warning message before the run:

DIAGNOSTIC(S) FROM PARSER:

Warning (non-fatal):

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.

If so, you need to call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.

Left-hand-side of sampling statement:

delta ~ bernoulli(...)

I’m also getting an error message aborting the run

SAMPLING FOR MODEL 'gamma' NOW (CHAIN 1).

[1] "Error : Rejecting initial value:"

[2] " Log probability evaluates to log(0), i.e. negative infinity."

[3] " Stan can't start sampling from this initial value."

error occurred during calling the sampler; sampling not done

I obtained these initial values from the software generously shared by the authors of the paper.

In addition, the simulation data X (10000 by 4) and Z (10000) are generated with parameters

alpha <- c(1.3, .03, .3, .4)

beta <- 2

gamma <- c(-2.94, 1, .5, -.5)

sigmaSq <- 1.2

Please let me know if there’s any additional information I could provide for troubleshooting. As a newbie (I started this last week), I didn't know how to debug the program, e.g., when it complaints about the log(0), which log is it calculating? How to locate it? Also, I wasn't quite sure about my way coding latent indicators in stan. I'd really appreciate any input.

Thank you very much in advance for taking a look at this long question! Any insight would be welcome!

Thanks!

Flora

run.R

gamma.stan

simData.RData

Bob Carpenter

unread,

Jun 1, 2016, 1:00:50 AM6/1/16

to stan-...@googlegroups.com

You need to track down where the log density goes off
the rails. You can use print statements to track it as
it goes and to evaluate inputs and outputs to make sure
everything's well defined before being called.

But I'd really suggest starting with a simpler working
model and building up. That way, you can see where things
go wrong. It's hard to debug a complicated program that
doesn't work.

Stan should have finite log densities for any input meeting
the constraints of the parameters.

You almost never need initial values in Stan. It can
solve some problems, but it's rare.

What you're calling weakly informative are very diffuse
priors. We don't recommend the gamma(0.001, 0.001) things
anywhere, or really even normal(0, 100) when you expect
the result to have order 1.

I couldn't follow your model. Lots of pieces of it
look problematic (like dividing by a log_sum_exp)
and it looks like you're redefining things
for which there are built-in functions.

You really don't want to do this for efficiency:

> transformed data {
>
> // covariance matrix for alpha & beta
>
> matrix[K+1, K+1] Sigma;
>
> Sigma <- diag_matrix(rep_vector(10000, K+1));
>
> }

> gamma ~ multi_normal(rep_vector(0, K+1), Sigma);

You can just use the vectorized form

gamma ~ normal(0, 100);

which should be orders of magnitude faster.

- Bob

Flora Xu Shu Jing

unread,

Jun 3, 2016, 3:01:45 PM6/3/16

to stan-...@googlegroups.com

Hi Bob,

Thanks so much for showing me the way boosting the efficiency, as well as some problematic expressions! I was too focused on translating from the software to STAN line by line (to reduce the mismatches to minimal), and hadn't got to the phase of code optimization.

I'm still figuring out what STAN deems legal. I was able to print out the parameters in the "transformed parameters" block; didn't work very well when placing it inside "model". Now it seems that the major problem lies in drawing those deltas, latent variables in {0, 1}; even after change the inverse gamma prior to a uniform or normal.

To my best knowledge, this model is most similar to what was shown in the User's Guide (version 2.8.0) 9. Finite Mixtures. Instead of a mixture of K normals, this one is a mixture of a folded normal and a gamma, each distribution with hyper parameters. As pointed out in the manual, STAN doesn't support discrete parameter, but the simplex parameter theta to indicate mixing proportions.

In the paper, the authors have the mixing proportions (indicated as pi(x_i)) depend on the covariate matrix X (data). With no explicit example in the manual (maybe I missed it somewhere), I am not sure if I can model those mixing proportions the way the paper proposed. Do you have any comments or recommendation as how to work around it?

I had tried two naive ways, yet both got me the same error messages (code attached, uncommenting // in the code gives the second):

[476] "Initialization between (--2, -2) failed after 100 attempts. "

[477] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."

error occurred during calling the sampler; sampling not done

1) Ignore the dependence of data, simply adopted the simplex parameter in the manual

2) First define a function to calculate theta -- parameter for the Bernoulli distribution; draw the latent indicator from Bernoulli; then calculate one mixing portion by sum(delta==1)/N

I can really use some help: can this model be coded in STAN at all? Thank you very much!!!

- Flora

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/iAunwddMneA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gamma.stan

Bob Carpenter

unread,

Jun 3, 2016, 4:03:35 PM6/3/16

to stan-...@googlegroups.com

Yes, the model can be coded in Stan. At least in the
sense that you can make mixture out of anything.

What's legal is defined in the language reference
part of the manual, but it's about 90 pages of definitions.

What you're probably missing given the error is that every
value of the parameters that satisfies the declared constraints
should have a finite log density.

When you say the print statements didn't work in the model,
what I would recommend is removing all of the model, then
adding it back line-by-line --- there's a bug in the
current Stan or RStan that's ignoring print statements in
the model when the log density is illegal.

- Bob

> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

> <gamma.stan>

Flora Xu Shu Jing

unread,

Jun 27, 2016, 11:33:49 PM6/27/16

to stan-...@googlegroups.com

Hi Bob,

Thanks a lot for the helpful comments!

I followed your suggestions and started small. After getting accurate estimate for a pure distribution model, I moved up to the mixture model (simplified version of the desired model): a folded normal and gamma. What I noticed that in stan, if the data was declared one side, e.g., greater than 0, then adding log2() to the log posterior doesn't make much a difference.

Anyhow, I simulated some data with a mixture density: .95 f_0 + .05 f_1 where f_0 is the folded normal (0, sigma), and f_1 is the density of gamma distribution (alpha, beta).

N <- 10000

pi1 <- .05

pi0 <- 1 - pi1

set.seed(1234)

delta <- sample(c(0, 1), N, replace = T, prob = c(pi0, pi1))

dat <- rep(NA, N)

# normal

sigma <- 1.1

dat[delta==0] <- abs(rnorm(n = sum(1-delta), mean = 0, sd = sigma))

# gamma

alpha <- 1

beta <- 2

dat[delta==1] <- rgamma(sum(delta), shape = alpha, rate = beta)

mix.model <- "

data {

int N;

vector<lower=0>[N] Z;

}

parameters {

real<lower=0> a;

real<lower=0> beta;

real<lower=0> sigma;

simplex[2] pi;

}

transformed parameters {

print(1); print(a);

print(2); print(beta);

print(3); print(sigma);

print(4); print(pi[1]);

}

model {

real ps[2];

for (n in 1:N){

ps[1] <- log(pi[1]) + normal_log(Z[n], 0, sigma);

ps[2] <- log(pi[2]) + gamma_log(Z[n], a, beta);

increment_log_prob(ps);

}

"

mix.dat <- list(N = N, Z = abs(dat))

mix.fit <- stan(model_code = mix.model,

data = mix.dat,

iter = 5e3, chains = 4, warmup = 1000,

thin = 4, seed = 1234)

mix.e <- extract(mix.fit)

lapply(mix.e , summary)

####################################################

# $a

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.229 1.284 1.295 1.295 1.306 1.357

#

# $beta

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.441 1.509 1.525 1.525 1.540 1.620

#

# $sigma

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.048 1.069 1.074 1.074 1.079 1.099

#

# $pi

# V1 V2

# Min. :0.4876 Min. :0.4884

# 1st Qu.:0.4976 1st Qu.:0.4975

# Median :0.5000 Median :0.5000

# Mean :0.5000 Mean :0.5000

# 3rd Qu.:0.5025 3rd Qu.:0.5024

# Max. :0.5116 Max. :0.5124

From the result, there are clear an overestimate on both the shape and rate parameters, a pretty good estimate on sigma, however the proportions are far from the true values.

Would you point out if there's any coding mistakes I made, or something else that led this results?

I really appreciate the help and insights!!!

Best,

Flora

Michael Betancourt

unread,

Jun 28, 2016, 6:12:17 AM6/28/16

to stan-...@googlegroups.com

increment_log_prob(ps)

should be

increment_log_prob(log_sum_exp(ps[1], ps[2]))

Or can simply the model by having a single

probability,

real<lower=0, upper=1> pi;

and then using the log_mix function,

increment_log_prob(log_mix(pi, normal_log(Z[n], 0, sigma), log(pi[2]) + gamma_log(Z[n], a, beta)

Flora Xu Shu Jing

unread,

Jun 28, 2016, 11:27:07 AM6/28/16

to stan-...@googlegroups.com

Thanks Michael for catching the error! That was really silly of me.

After making the change from increment_log_prob(ps); to increment_log_prob(log_sum_exp(ps)); and reduced the number of chains from 4 to 1, just to see the result faster, I ran the model and the sigma just blew up... So I added a prior on sigma:

sigma ~ cauchy(0, 2.5). Both rate and shape parameters were overestimated, while sigma was underestimated (I focus on the median).

# lapply(mix.e , summary)

# $a

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.251 1.294 1.311 1.313 1.331 1.378

#

# $beta

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.451 1.521 1.542 1.543 1.566 1.632

#

# $sigma

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 0.0007 0.0034 0.0070 5.2070 1.7540 521.2000

#

# $pi

# V1 V2

# Min. :2.300e-08 Min. :0.9935

# 1st Qu.:1.323e-04 1st Qu.:0.9975

# Median :1.128e-03 Median :0.9989

# Mean :1.457e-03 Mean :0.9985

# 3rd Qu.:2.531e-03 3rd Qu.:0.9999

# Max. :6.477e-03 Max. :1.0000

Is there any way that I could improve this? I also did an experiment, making pi[1] and pi[2] known:

# lapply(mix2.e , summary)

# $a

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.245 1.287 1.298 1.299 1.310 1.358

#

# $beta

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.463 1.519 1.536 1.537 1.554 1.618

#

# $sigma

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 0.9221 1.0870 1.1400 1.1370 1.1840 1.3850

Thanks!

Flora

Flora Xu Shu Jing

unread,

Jun 28, 2016, 1:58:48 PM6/28/16

to stan-...@googlegroups.com

Corrections on the experiment (I mis-labeled the mixing proportions. Apologize for the confusions). The result should be

# lapply(mix2.e , summary)

# $a

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 0.6049 0.7928 0.8861 0.9105 0.9960 2.5450

#

# $beta

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 0.841 1.830 2.397 2.538 3.032 8.855

#

# $sigma

# Min. 1st Qu. Median Mean 3rd Qu. Max.

# 1.053 1.087 1.093 1.093 1.098 1.124

Compared to the true values (a = 1, beta = 2, sigma = 1.1), this outcome looks fine.

Yet the result from the model where pi needs to be estimated seems far away from what one expects to see, have I missed something?

Thanks,

Flora

Michael Betancourt

unread,

Jun 28, 2016, 2:02:18 PM6/28/16

to stan-...@googlegroups.com

Look at the mixing coefficient — it’s pegging at 1,

reducing the model to completely Gamma. The

Gaussian component of the data causes the bias

to larger values when fit with the Gamma, while

the sigma is entirely unconstrained by the data

and hence roaming to very small and very large

values. I’ll leave you to diagnose what’s going on

when trying to fit pi.

Reply all

Reply to author

Forward