Handling a latent variable in stan

1739 views
Skip to first unread message

Jangho Yang

unread,
Sep 28, 2015, 9:31:47 PM9/28/15
to Stan users mailing list
Hi,

I've been playing around with a mixed Gaussian model with a latent variable and have tried to run this model in stan. Here is a brief description of the model. 

Two Gaussians are mixed with the weigh parameter \q, and the latent variable \z turns on and off each one of the Gaussians.  \z follows the Bernoulli with \q. Simply, the model is

 f(y_i| q, theta1, theta2) = q*Normal1(theta1) + (1-q)*Normal2(theta2)

and 

 f(y_i| q, z_i, theta1, theta2) = Normal1(theta1)^z_i * Normal2(theta2)^(1-z_i)

finally,

z_i ~ Bernoulli(q).

Using stan, I want to estimate theta1, theta2, q and z, among which z is a vector of 0,1 and gives how many times each one of Gaussians is chosen in the simulation. (As Swupnil kindly pointed out) This model boils down to finding the posterior: p(y | z, theta1, theta2) p(z | q) p(q) p(theta1) p(theta2) because the likelihood of y is not anymore dependent on q when it is conditioned on z. 

Here is my code, which does not work at all..

##normal 
latent<-
"
data{
  int<lower=0> N;
  real y[N];
}
parameters{
  real signal_mu;
  real<lower=0,upper=10> signal_sig;
  real noise_mu;
  real<lower=0,upper=100> noise_sig;
  real<lower=0,upper=1> q;
  int<lower=0,upper=1> z[N];

}
model{
  real ps[N];
  for(i in 1:N){
    ps[i] <- z[i]*normal_log(y[i],signal_mu,signal_sig) +
      (1-z[i])*normal_log(y[i],noise_mu,noise_sig);
  }
    increment_log_prob(log_sum_exp(ps));

    signal_sig~uniform(0,10);
    noise_sig~uniform(0,100);

    z~bernoulli(q);
}
"
N<-length(data_all$bias)
y<-data_all$bias

library(rstan)
fit<-stan(model_code = "latent",data=c("N","y"),iter=100,chain=2)
 

The following error sign keeps coming out. 

integer parameters or transformed parameters are not allowed;  found declared type int, parameter name=z
Problem with declaration.

Then I checked out the manual and noticed that stan does not support sampling discrete parameters and I have to marginalize out those parameters. And I tried to derive the likelihood function marginalized on z

p(y | theta1, theta2, q) =  sum_z (p(z,y | theta1, theta2, q)

                                    =  sum_z (p(z | q)*p(y | theta1, theta2, z, q)

                                    =  sum_z (p(z | q)*p(y | theta1, theta2, z)
                                   
                                    =  sum_z Bernoulli (z_i, q) * Normal1(theta1)^z_i * Normal2(theta2)^(1-z_i)

But I do not know how to code this in stan mainly because I still feel confused with the latent variable z, especially where and how to declare it. Could you kindly tell me how to make my code work? Any advice would be greatly appreciated! 

Jangho

Bob Carpenter

unread,
Sep 29, 2015, 5:06:16 PM9/29/15
to stan-...@googlegroups.com
That's not right. Check out the mixtures chapter of the manual
and work through what log_sum_exp is doing.

The correct form for a mixture is this:

for (n in 1:N) {
real lp1;
real lp2;
lp1 <- bernoulli_log(1, q) + normal_log(y[i], mu[1], sigma[1]);
lp2 <- bernoulli_log(0, q) + normal_log(y[i], mu[2], sigma[2]);
increment_log_prob(log_sum_exp(lp1,lp2));
}

or

for (n in 1:N)
increment_log_prob(log_mix(q,
normal_log(y[i], mu[1], sigma[1]),
normal_log(y[i], mu[2], sigma[2])));

- Bob


>
> for(i in 1:N){

> ps[1] <- z[i]*normal_log(y[i],signal_mu,signal_sig); +
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jangho Yang

unread,
Sep 30, 2015, 1:11:21 PM9/30/15
to stan-...@googlegroups.com
Thanks Bob for your reply. Actually, I did try the second form before and it worked fine. The problem is that I want to know not only the mixture weight \q but also the latent variable \z for each data point (in each iteration). In the above mixture model, one of two Gaussians is selected according to the z_i and I want to see whether 1 or 0 has been assigned to each data point. The primary reason I am doing this is to remove outliers (noise points) in my data. If a certain data point has many 0s, then it means this point comes from the noise distribution and I can remove it based on some criterion such as 95% of zeros in the iterations. (In the actual model, the signal distribution is the exponential normal, and the noise distribution is the normal with a large standard deviation.)

I wanted to code this up in stan but it is not working as you can see above. It would be appreciated, if you could give some advice! 

JH



You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/Wkeitft_J7c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Michael Betancourt

unread,
Sep 30, 2015, 2:34:06 PM9/30/15
to stan-...@googlegroups.com
There is no “true latent variable”.  Because you don’t measure the latent variable
directly you’ll always be uncertain about it and hence have some mixture of 0 and 1,
which is what q tells you directly.  If you want to remove outliers you can now do
something well posed, like construct a loss function and choose the option that
minimizes expected loss.  Or just take z = 1 if  q > 0.5 and z = 0 otherwise.

Jangho Yang

unread,
Oct 1, 2015, 5:34:38 PM10/1/15
to stan-...@googlegroups.com
Thank you so much for your reply. Reading your email, I just realized that the q in my simulation should have been estimated for each data point. I ran it again and I was able to use the method you suggested. It worked perfectly. Thanks again. I really appreciate it. 

JH

Mikey T

unread,
Nov 1, 2016, 2:22:02 PM11/1/16
to Stan users mailing list
Hi all,
I'm trying to properly code a marginalized latent discrete parameter and I'm having trouble. I know that I am writing the code improperly, but I'm not sure what is wrong and wondering if someone can please help. The model is as follows:

Process model:

logit(psi[i]) = mu + X*Beta


Observation model:

z[i] ~ Bernoulli(psi[i]*rho + (1-psi[i])*(1-phi))


Data model:

y[i] ~ Bernoulli(z[i]*pi)


I simulated data using the following code:
set.seed(135)
n <- 1000
mu <- 0.1 # background infection rate
rho <- 0.90 # detection probability
phi <- 0.95 # 1-probabiity of false detection (probbaility not detected when not present)
pi <- 0.2
covariates <- replicate(3, rnorm(n, 0, 1))
colnames(covariates) <- c('X1', 'X2', 'X3')
X <- cbind(covariates)
coefs <- c(1, -0.5, 0.3)
resp <- X %*% coefs + mu
inv.logit <- function(x){
  exp(x)/(exp(x)+1)
}
psi <- inv.logit(resp)
z <- rbinom(n, 1, psi*rho + (1-psi)*(1-phi))
y <- rbinom(n, 1, z*pi)

Here is the stan model:
# specify model in stan
mod7 <- '
  data{
    int<lower=1> N;
    int y[N];
    int<lower=1> K;
    matrix[N,K] X;
  }
parameters{
  vector[K] beta;
  real<lower=0,upper=1> mu;
  real<lower=0,upper=1> rho;
  real<lower=0,upper=1> phi;
  real<lower=0,upper=1> pi;
}
transformed parameters{
  vector[N] Xbeta;
  vector[N] psi;
  vector[N] omega;
 
  Xbeta <- X*beta;
 
  for(n in 1:N){
    psi[n] <- inv_logit(mu + Xbeta[n]);
  }
}
model{
 
  // priors - note strong priors just to see if I can get model to fit
  beta ~ normal(0,1);
  rho ~ beta(31.5, 3.5);
  phi ~ beta(10, 1.47);
  mu ~ beta(3.5, 31.5);
 
  for(n in 1:N){
    real z[2];
    if (y[n] > 0){
      increment_log_prob(bernoulli_log(y[n], psi[n]*rho + (1-psi[n])*(1-phi)));
      z[1] <- bernoulli_log(1, psi[n]*rho);// true presence
      z[2] <- bernoulli_log(0, (1-psi[n])*(1-phi)); // false positive
      increment_log_prob(log_sum_exp(z)); 
    } else {
      z[1] <- bernoulli_log(1, psi[n]*(1-rho)); // false negative
      z[2] <- bernoulli_log(0, (1-psi[n])*phi); // true negative
      increment_log_prob(log_sum_exp(z));
    }
    increment_log_prob(bernoulli_log(y[n], z[1]*pi));
  }
}'


mod7c <- stan_model(model_code=mod7)
mod7.fit <- sampling(mod7c, dat, iter=1000, chains=4, thin=1)

I get several errors including:
"Exception thrown at line 64: stan::math::bernoulli_log: Probability parameter is -1.64453, but must be between (0, 1)" 

I know that I'm improperly marginalizing the parameter, but I don't know how to fix it. I've played with it a lot, but I haven't been successful. I'd appreciate any help you all can provide.
Thank you,
Mikey

Bob Carpenter

unread,
Nov 1, 2016, 6:27:47 PM11/1/16
to stan-...@googlegroups.com

> On Nov 1, 2016, at 2:22 PM, Mikey T <tab...@gmail.com> wrote:
>
> Hi all,
> I'm trying to properly code a marginalized latent discrete parameter and I'm having trouble. I know that I am writing the code improperly, but I'm not sure what is wrong and wondering if someone can please help. The model is as follows:
> Process model:
>
> logit(psi[i]) = mu + X*Beta
>
>
>
> Observation model:
>
> z[i] ~ Bernoulli(psi[i]*rho + (1-psi[i])*(1-phi))
>
>
>
> Data model:
>
> y[i] ~ Bernoulli(z[i]*pi)


The trick is to write the joint density out then just turn the
crank to marginalize, then just code it up directly in Stan. First,
you have the joint distribution of the latent z and observed y based
on parameters

p(z, y | mu, X, beta, rho, psi, phi)

= PRODUCT_i p(z[i], y[i] | ...)

and then you can work on the inner term:

p(z[i], y[i] | ...)

= Bernoulli(z[i] | psi[i]*rho + (1-psi[i])*(1-phi))
* Bernoulli(y[i] | z[i] * pi)

where

psi[i] = inv_logit(mu + X * Beta); // sure that's not X[i] * Beta?

and then marginalize out the z[i]:

= SUM_{k in 0:1}
Bernoulli(k | psi[i]*rho + (1-psi[i])*(1-phi))
* Bernoulli(y[i] | k * pi)

Then put it on the log scale:

log(p(z[i], y[i] | ...)

= log(SUM_{k in 0:1}
Bernoulli(k | psi[i]*rho + (1-psi[i])*(1-phi))
* Bernoulli(y[i] | k * pi)

= log_sum_exp_{k in 0:1}
log Bernoulli(k | psi[i]*rho + (1-psi[i])*(1-phi))
+ log Bernoulli(y[i] | k * pi)

and you can just code that up directly using log_sum_exp
in Stan:

psi = inv_logit
log_sum_exp(bernoulli_lpmf(0 | psi[i] * rho + (1-psi[i]) * (1-phi))
+ bernoulli_lpmf(y[i] | 0),
bernoulli_lpmf(1 | psi[i] * rho + (1-psi[i]) * (1-phi))
+ bernoulli_lpmf(y[i] | pi));

You can also look at the zero-inflation and hurdle models
in the manual, which are similar.

- Bob

Mikey T

unread,
Nov 2, 2016, 10:15:06 AM11/2/16
to Stan users mailing list
Hi Bob,
Thanks a lot for your help with this. I really appreciate your clear and thorough explanation. I also appreciate all of your work on the software and manuals.
Thank you,
Mikey

Bob Carpenter

unread,
Nov 2, 2016, 3:48:30 PM11/2/16
to stan-...@googlegroups.com

> On Nov 2, 2016, at 10:15 AM, Mikey T <tab...@gmail.com> wrote:
>
> Hi Bob,
> Thanks a lot for your help with this. I really appreciate your clear and thorough explanation.

You're welcome, but I missed a step. You probably want to
branch this on y[i] == 0 vs. y[i] == 1,
because the first term is -infinity when y[i] == 1.

real psi;
real alpha;
psi = inv_logit(mu + X[i] * Beta);
alpha = psi * rho + (1 - psi) * (1 - phi);
if (y[i] == 1)
target += bernoulli_lpmf(1 | alpha) + bernoulli_lpmf(1 | pi);
else
target += log_sum_exp(bernoulli_lpmf(0 | alpha),
bernoulli_lpmf(1 | alpha)
+ bernoulli_lpmf(0 | pi));

It'll avoid NaN values in derivatives (not sure if they'll pop
up with a -infinity term and then an exp).

It'd be more arithmetically stable if you stayed on the logit scale
for alpha and used bernoulli_logit_lpmf.

You'll want to double-check the algebra, too.

- Bob

Mikey T

unread,
Nov 7, 2016, 11:42:57 AM11/7/16
to Stan users mailing list
Hi Bob,
Thanks again for your help. I have run both of these suggestions and tried tinkering with the code slightly, but I am not able to get the chains to converge. If I put informed priors on the parameters, some of the chains correctly estimate while others are estimating a completely different area of parameter space. However, the model works well if I remove the last two components of the logit function [i.e., (1-psi[i])*(1-phi)].
Also, I'm wondering if the code you recommended last should be modified so that the probability of each condition is explicitly expressed in the conditional. I'm pasting this code here, but it did not work either. 

 // separate out detection components?
  for(n in 1:N){
   if(y[n] ==1){
    target += log_sum_exp(bernoulli_lpmf(1|psi[n]*rho),
                          bernoulli_lpmf(0|(1-psi[n])*(1-phi))
                          + bernoulli_lpmf(1|pi));
   } else{
    target += log_sum_exp(bernoulli_lpmf(0|(1-psi[n])*phi),
                          bernoulli_lpmf(1|psi[n]*(1-rho))
                         + bernoulli_lpmf(0|pi));
   }
  }

I appreciate your help.
I have attached all R code below in case this helps.

Thank you,
Mikey

# r code to simulate data

set.seed(135)
n <- 1000
mu <- 0.1 # background infection rate
rho <- 0.90 # detection probability
phi <- 0.95 # 1-probabiity of false detection (probbaility not detected when not present)
pi <- 0.2
covariates <- replicate(3, rnorm(n, 0, .5))

colnames(covariates) <- c('X1', 'X2', 'X3')
X <- cbind(covariates)
coefs <- c(1, -0.5, 0.3)
resp <- X %*% coefs + mu
inv.logit <- function(x){
  exp(x)/(exp(x)+1)
}
psi <- inv.logit(resp)
#z <- rbinom(n, 1, psi*rho) # ignoring false detection here. model works when I do this

z <- rbinom(n, 1, psi*rho + (1-psi)*(1-phi))
y <- rbinom(n, 1, z*pi)

# specify model in stan
model <- '

  data{
    int<lower=1> N;
    int y[N];
    int<lower=1> K;
    matrix[N,K] X;
  }
parameters{
  vector[K] beta;
  real<lower=0,upper=1> mu;
  real<lower=0,upper=1> rho;
  real<lower=0,upper=1> phi;
  real<lower=0,upper=1> pi;
}
transformed parameters{
  vector[N] Xbeta;
  vector[N] psi;
  vector[N] omega;
 
  Xbeta = X*beta;
 
  for(n in 1:N){
    psi[n] = inv_logit(mu + Xbeta[n]);
    omega[n] = psi[n]*rho +(1-psi[n])*(1-phi);
    //omega[n] = psi[n]*rho;
    //omega[n] = inv_logit(psi[n]*rho +(1-psi[n])*(1-phi));
  }
}
model{
 
  // priors
  beta ~ normal(0,1);
  //beta[1] ~ normal(1,1);
  //beta[2] ~ normal(-.5, 1);
  //beta[3] ~ normal(0.3, 1);
  rho ~ beta(3.92, 1.32); //beta(31.5, 3.5);  //
  phi ~ beta(3.46, 1.13); // beta(17.1, 0.9); //
  mu ~ beta(1.32, 3.92); //beta(3.5, 31.5); //
  pi ~ beta(3.28, 10.12); //beta(12.6, 50.4); //

  // separate out detection components
  //for(n in 1:N){
  // if(y[n] ==1){
  //  target += log_sum_exp(bernoulli_lpmf(1|psi[n]*rho),
  //                        bernoulli_lpmf(0|(1-psi[n])*(1-phi))
  //                        + bernoulli_lpmf(1|pi));
  // } else{
  //  target += log_sum_exp(bernoulli_lpmf(0|(1-psi[n])*phi),
  //                        bernoulli_lpmf(1|psi[n]*(1-rho))
  //                       + bernoulli_lpmf(0|pi));
  // }
  //}

  // separating out conditional
  //for(n in 1:N){
   //if(y[n] == 1){
    //target += bernoulli_lpmf(y[n]|omega[n]) + bernoulli_lpmf(y[n]|pi);
   //} else{
    //target += log_sum_exp(bernoulli_lpmf(0|omega[n]),
    //                      bernoulli_lpmf(1|omega[n])
    //                      + bernoulli_lpmf(y[n]|pi));
   //}
  //}
 

  // one option
  for(n in 1:N){
   target += log_sum_exp(bernoulli_lpmf(0|omega[n]) +
             bernoulli_lpmf(y[n]|0),
             bernoulli_lpmf(1|omega[n])
             + bernoulli_lpmf(y[n]|pi));
  }
}'

Mikey T

unread,
Feb 16, 2017, 1:52:54 PM2/16/17
to Stan users mailing list
Hi all,
I changed my model structure and I'm once again having trouble with the latent discrete parameter (Z). I think I marginalized it correctly, but I am having trouble getting the chains to converge. Here is the new model structure:

y_i ~ bernoulli(Z_i * rho + (1-Z_i)*(1-phi)

Z_i ~ bernoulli(psi_i)

logit(psi_i) = X_i*B

I marginalized the latent parameter (Z), so that the code looks like this (in the model block):

for(n in 1:N){
    target += log_sum_exp(bernoulli_lpmf(1|psi[n]) + bernoulli_lpmf(y[n]|rho),
                          bernoulli_lpmf(0|psi[n]) + bernoulli_lpmf(y[n]|(1-phi)));
  }

The full code including code for simulation is below. (I have the same problem when I don't use those hyperpriors and just estimate rho and phi with the priors beta(1,1).)

Thank you in advance for your help,
Mikey

#- simulate some data
set.seed(135)
n <- 1000

rho <- 0.90 # detection probability
phi <- 0.95 # 1-probabiity of false detection (probbaility not detected when not present)
pi <- 0.2

covariates <- replicate(3, rnorm(n, 0, .5))
colnames(covariates) <- c('X1', 'X2', 'X3')
X <- cbind(rep(1, length(covariates[,1])),covariates)
coefs <- c(1, -1, -0.5, 0.3)

resp <- X %*% coefs
inv.logit <- function(x){
  exp(x)/(exp(x)+1)
}
psi <- inv.logit(resp)
z <- rbinom(n, 1, psi)
y <- rbinom(n, 1, z*rho +(1-z)*(1-phi))



# specify model in stan
mod <-'

data{
    int<lower=1> N;
    int y[N];
    int<lower=1> K;
    matrix[N,K] X;
  }
parameters{
  vector[K] beta;
  real<lower=0,upper=1> rho;
  real<lower=0,upper=1> phi;
  real<lower=0> sigma[K];
  //int<lower=0> lp2[N];
  real<lower=0,upper=1> mu_rho;
  real<lower=0> sigma_rho;
  real<lower=0,upper=1> mu_phi;
  real<lower=0> sigma_phi;

}
transformed parameters{
  vector[N] Xbeta;
  vector[N] psi;

  Xbeta = X*beta;

  for(n in 1:N){
    psi[n] = inv_logit(Xbeta[n]);
  }
}
model{
  // hyperpriors
  mu_rho ~ beta(42.573, 5.619222);
  sigma_rho ~ gamma(10.63844, 0.2);
  mu_phi ~ beta(42.573, 5.619222);
  sigma_phi ~ gamma(10.63844, 0.2);

  // priors
  rho ~ beta(mu_rho*sigma_rho, (1-mu_rho)*sigma_rho);
  phi ~ beta(mu_phi*sigma_phi, (1-mu_phi)*sigma_phi);
  //sigma ~ cauchy(0,2.5);
  //beta ~ normal(0,sigma);
  beta ~ normal(0,1);
 

  //**** Here is the marginalized parameter
  for(n in 1:N){
    target += log_sum_exp(bernoulli_lpmf(1|psi[n]) + bernoulli_lpmf(y[n]|rho),
                          bernoulli_lpmf(0|psi[n]) + bernoulli_lpmf(y[n]|(1-phi)));
  }
}
generated quantities{
  real<lower=0> mean_psi;
  vector[N] log_lik;
  real<lower=0> pi[N];
  real<lower=0> mean_pi;

  mean_psi = sum(psi)/N;
 
  // calculating log likelihood to get waic
  for(n in 1:N){
    log_lik[n] = bernoulli_lpmf(y[n]|psi[n]);
    pi[n] = psi[n]*(1-rho) + psi[n]*phi;
  }
 
  mean_pi = mean(pi);
}'


modc <- stan_model(model_code=mod)

Bob Carpenter

unread,
Feb 16, 2017, 2:36:13 PM2/16/17
to stan-...@googlegroups.com

> On Feb 16, 2017, at 1:52 PM, Mikey T <tab...@gmail.com> wrote:
>
> Hi all,
> I changed my model structure and I'm once again having trouble with the latent discrete parameter (Z). I think I marginalized it correctly, but I am having trouble getting the chains to converge. Here is the new model structure:
>
> y_i ~ bernoulli(Z_i * rho + (1-Z_i)*(1-phi)
>
> Z_i ~ bernoulli(psi_i)
>
> logit(psi_i) = X_i*B
>
> I marginalized the latent parameter (Z), so that the code looks like this (in the model block):
>
> for(n in 1:N){
> target += log_sum_exp(bernoulli_lpmf(1|psi[n]) + bernoulli_lpmf(y[n]|rho),
> bernoulli_lpmf(0|psi[n]) + bernoulli_lpmf(y[n]|(1-phi)));
> }

That looks OK given that you define the equivalent of

psi = inv_logit(X * B);

(It's just shorter and more efficient written as above assuming X is
a matrix.)

It's more efficient to just the bernoulli_logit directly on X[n] * B (saving
the X[n] * B calculation to be reused).

If you want to do the inv_logit yourself, then you can just use

target += log_mix(psi[n], bernoulli_lpmf(y[n] | rho),
bernoulli_lpmf(y[n] | 1 - phi));

Your priors for the sigmas might be causing problems

>
> sigma_rho ~ gamma(10.63844, 0.2);
>
> sigma_phi ~ gamma(10.63844, 0.2);

because they're not on the unit scale (mean 0, variance 1) but
instead on a roughly (mean 11, variance 200) scale.

But it's not extreme enough it should cause the fitting to fail.

What actually happens when you run it?

You also don't need a beta(1, 1) prior for a probability --- that's
just uniform, and uniform is the default. But again, that's just
for efficiency.

- Bob

Mikey T

unread,
Feb 16, 2017, 3:07:27 PM2/16/17
to Stan users mailing list
Thank you for your quick response. When I run the model, the mean estimates for all of the parameters are good, but there are really large credible intervals. Looking at the traceplots (attached), the chains don't appear to be converging properly. Also, the R_hats and n_eff are off what they should be.

Just to be clear, are you suggesting that I replace:

   target += log_sum_exp(bernoulli_lpmf(1|psi[n]) + bernoulli_lpmf(y[n]|rho),
                                         bernoulli_lpmf(0|psi[n]) + bernoulli_lpmf(y[n]|(1-phi)));
with:

  target += log_mix(psi[n], bernoulli_lpmf(y[n] | rho),
                            bernoulli_lpmf(y[n] | 1 - phi));
I'm not sure if these are identical? In the attached traceplot I did make this substitution and I used gamma(1,0.001) for the priors on sigmas.
trace.pdf

Mikey T

unread,
Feb 16, 2017, 3:27:19 PM2/16/17
to Stan users mailing list
I changed the priors for sigmas to gamma(2,0.0001), as I think this is more appropriate. As I mentioned, the estimates are getting close to proper values, but the n_eff is really low and Rhat is higher than it should be. Also, I get the error message that there are many divergent pairs. I attached the traceplot from using this prior.
Thank you for your help!
trace.pdf

Bob Carpenter

unread,
Feb 22, 2017, 1:23:55 PM2/22/17
to stan-...@googlegroups.com
Noooo! You don't want those super-fat gamma priors.
Andrew Gelman has written a bunch of papers on this and goes over
it in BDA.

If you expect really big values for sigma, you might want to
rescale the problem.

- Bob
> <trace.pdf>

Mikey T

unread,
May 18, 2017, 3:53:30 PM5/18/17
to Stan users mailing list
Hi all,
I'm working on another model now that has latent discrete parameters (this time they are unbounded, which makes it more complicated). I'll present with just the most simple model structure. It is an open population mixture model where I'm looking N (true population size) as a function of Survival (S) and gains (G), with n_it observations. I get the error message: "Rejecting initial value: Error evaluating log probability at initial value". I tried specifying initial values for omega and gamma, but maybe I am missing something. 

S_it ~ Binomial(N_it-1, omega)

G_it ~ Binomial(N_it-1, gamma)

N_it = S_it + G_it

n_it ~ Poisson(N_it)


Here is my Stan code:

'
data{
  int<lower=0> N_cam;    // number cameras (i)
  int<lower=0> N_t;      // number of occasions (t)
  int<lower=0> obs[N_cam, N_t];   // observations (nit)
  int<lower=1> K;        // upper bound of population size could allow this to change at each time?
}
transformed data{
  int<lower=0> max_obs[N_cam];

  for(i in 1:N_cam){
    max_obs[i] = max(obs[i]);
  }
}
parameters{
  real<lower=0,upper=1> gamma;                 // arrival rate
  real<lower=0,upper=1> omega;                 // survival probability
  //real<lower=0,upper=1> pi;                    // probability of detecting same individual more than once
  real<lower=0> sigma;
}
transformed parameters{

}
model{
  // specify priors
  gamma ~ beta(1,1);
  omega ~ beta(1,1);

  for(i in 1:N_cam){
   for(t in 2:N_t){

   int S[i,t];
   int G[i,t];
   int N;
   vector[K-max_obs[i] + 1] lp;

   // try specifying values at timestep 1
   S[i,1] = 10;
   G[i,1] = 2;

    // this is the part of the code that I know is incorrect, but Im not sure how to do it properly
    for(j in 1:(K-max_obs[i] +1)){ // or for (j in 1:K)
          lp[j] = binomial_lpmf(S[i,t]|(G[i,(t-1)] + S[i,(t-1)]), omega)
                + binomial_lpmf(G[i,t]|(G[i,(t-1)] + S[i,(t-1)]), gamma);
         target += log_sum_exp(lp);
         N = S[i,t] + G[i,t];
         sigma ~ gamma(2,1);
         //obs[i,t] ~ neg_binomial(-pow(mu,2)/(mu-sigma), mu/(mu-sigma));
         obs[i,t] ~ poisson(N);
      }

    }   
  }

}',


# And here is the code to simulate data

mean_pop <- 20
ncam <- 10
Nt <- 5 # might need to make this specific to each camera
gamma <- 0.1 # arrival rate
omega <- 0.8 # survival rate
pi <- 0.2 # pr detecting more than once
S <- matrix(NA, ncam, Nt)
G <- matrix(NA, ncam, Nt)
N_tot <- matrix(NA, ncam, Nt)
N1 <- rpois(ncam, mean_pop)
N_tot[,1] <- N1
for(i in 1:ncam){
  for(t in 2:Nt){
    S[i,t] <- rbinom(1,N_tot[,t-1],(omega*(1-pi)))
    G[i,t] <- rbinom(1,N_tot[,t-1], gamma)
    N_tot[i,t] <- rpois(1,S[i,t] + G[i,t])
  }
}


Thank you in advance for your help,

Mikey


Mikey T

unread,
May 18, 2017, 3:53:57 PM5/18/17
to Stan users mailing list

Thank you in advance for your help,

Mikey


On Wednesday, February 22, 2017 at 11:23:55 AM UTC-7, Bob Carpenter wrote:

Mikey T

unread,
May 18, 2017, 3:56:13 PM5/18/17
to Stan users mailing list
Sorry, Google cut off the actual model structure:
On Thursday, May 18, 2017 at 1:53:57 PM UTC-6, Mikey T wrote:
Hi all,
I'm working on another model now that has latent discrete parameters (this time they are unbounded, which makes it more complicated). I'll present with just the most simple model structure. It is an open population mixture model where I'm looking N (true population size) as a function of Survival (S) and gains (G), with n_it observations. I get the error message: "Rejecting initial value: Error evaluating log probability at initial value". I tried specifying initial values for omega and gamma, but maybe I am missing something. 

<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed

Bob Carpenter

unread,
May 18, 2017, 4:24:44 PM5/18/17
to stan-...@googlegroups.com
Stan can't handle latent discrete unbounded parameters unless you have an analytic marginalization.
The variables you used in your math didn't match your code, so I couldn't follow where you had
the latent discrete parameter.

If you can't get it to initialize, it means you don't have a finite log density at initial values.
That's usually because you've given something an illegal argument somewhere. If you're not getting
error messages, I think that's a bug in 2.15 RStan that shut off the messages and we'll be putting
out a patch release soon. Sorry about that.

- Bob

Mikey T

unread,
May 19, 2017, 11:00:28 AM5/19/17
to Stan users mailing list
Thank you Bob. I updated the math so that the symbols match the code. I used a maximum bound on population size (K) so that N (the latent discrete parameter) is bounded. The trouble is that S and G are also discrete and their sum makes up N.


S_it ~ Binomial(N_it-1, omega)

G_it ~ Binomial(N_it-1, gamma)

N_it = S_it + G_it

obs_it ~ Poisson(N_it)

         //sigma ~ gamma(2,1);

         //obs[i,t] ~ neg_binomial(-pow(mu,2)/(mu-sigma), mu/(mu-sigma));
         obs[i,t] ~ poisson(N);
      }

    }   
  }

}'

 

# And here is the R code to simulate data

mean_pop <- 20
ncam <- 10
Nt <- 5 # might need to make this specific to each camera
gamma <- 0.1 # arrival rate
omega <- 0.8 # survival rate
pi <- 0.2 # pr detecting more than once
S <- matrix(NA, ncam, Nt)
G <- matrix(NA, ncam, Nt)
N_tot <- matrix(NA, ncam, Nt)
N1 <- rpois(ncam, mean_pop)
N_tot[,1] <- N1
for(i in 1:ncam){
  for(t in 2:Nt){
    S[i,t] <- rbinom(1,N_tot[,t-1],(omega*(1-pi)))
    G[i,t] <- rbinom(1,N_tot[,t-1], gamma)
    N_tot[i,t] <- rpois(1,S[i,t] + G[i,t])
  }
}

data <- list(N_cam=ncam, N_t=Nt, obs=N_tot, K=1000)

Thank you for any help you can provide,
Mikey

Bob Carpenter

unread,
May 20, 2017, 3:16:45 PM5/20/17
to stan-...@googlegroups.com



> On May 19, 2017, at 11:00 AM, Mikey T <tab...@gmail.com> wrote:
>
> Thank you Bob. I updated the math so that the symbols match the code. I used a maximum bound on population size (K) so that N (the latent discrete parameter) is bounded. The trouble is that S and G are also discrete and their sum makes up N.
>
> S_it ~ Binomial(N_it-1, omega)
>
> G_it ~ Binomial(N_it-1, gamma)
>
> N_it = S_it + G_it

>
> obs_it ~ Poisson(N_it)

I still don't quite get it. I sorted out from obs_it that
this is just a not-quite-TeX not-quite code way to write obs[i,t]
But then I don't see any indexed N in your code, just an N_t.
This gets really confusing when N_ sometimes is used for naming
and sometimes for subscripting. Just remember it's *a lot* harder
for someone to read math than it is to write it.

Is N_it-1 supposed to be N[i, t-1]. And only obs[i,t] is observed
among N, S, and G?

If so, you can't code this model in Stan---it's combinatorially
intractable and we don't let you code discrete parameters.

- Bob

Mikey T

unread,
May 22, 2017, 8:02:41 AM5/22/17
to Stan users mailing list
Hi Bob,
Yes-you are correct about the improper way in which I wrote out the math (I'm not sure why I wrote it this way). I understand and thank you for your help. I'll work on building the model differently.
Thank you,
Mikey
Reply all
Reply to author
Forward
0 new messages