Divergence warnings and poor convergence in a hierarchical ZIP model

Zachary Marion

unread,

Aug 26, 2015, 4:10:00 PM8/26/15

to Stan users mailing list

Hello,

I am trying to model species abundance in a hierarchical zero-inflated poisson model. Currently I have 4 levels: a "global" level, a regional level with 10 regions, a patch level with 10 patches within each region, and the observational level with the results of 5 transects within each patch. For higher levels, the probability of presence is modeled as beta-distributed and the abundance is modeled as a lognormal. With simulated data, the results largely make sense, and a simpler model with only three levels (essentially excluding the transects) has decent convergence. However, the full model has several parameters with ESS/iteration less than 10%, and some parameters have Rhat values greater than 1.1. Additionally, if I run 4 chains with 10k iterations each, the diverging error is 19993 iterations. My lp_ ESS = 16, and the Rhat is 1.2. If I up the adapt_delta to 0.9, the model takes MUCH longer without any noticeable improvement in divergence, and with many iterations hitting maximum tree depth. Is there anything I can do to optimize this? Thanks for your help!

- Zach Marion

Model:

data {

int<lower=1> nR; # number of regions

int<lower=1> nP; # number of patches

int<lower=1> nObs; # number of observations

int<lower=1> region[nObs]; # vector of region identities

int<lower=1> patch[nObs]; # vector of patches

int<lower=0> obs[nObs]; # vector of abundance observations

}

parameters {

//----------------mean prob. of presence-------------------

real<lower=0, upper=1> piG;

vector<lower=0, upper=1>[nR] piR;

vector<lower=0, upper=1>[nP] piP;

//---------------Beta precision parameters-----------------

real<lower=0> kappaG;

vector<lower=0>[nR] kappaR;

vector<lower=0>[nP] kappaP;

//---------------Global Mean Abundance---------------------

real<lower=0> lambdaG;

//----------Std Normal Parameters for non-centering-------

vector[nR] logLambdaRawR;

vector[nP] logLambdaRawP;

vector[nObs] logLambdaRaw;

//----------Log-normal variance parameters----------------

real<lower=0> sigmaG;

vector<lower=0>[nR] sigmaR;

vector<lower=0>[nP] sigmaP;

//-----------individual-level prob. of presence-----------

vector<lower=0, upper=1>[nObs] eta;

}

transformed parameters {

//----------------Mean Abundances--------------------

vector<lower=0>[nR] lambdaR;

vector<lower=0>[nP] lambdaP;

vector<lower=0>[nObs] lambda;

//-----------------Log-normal Mu's--------------------

real muG;

vector[nR] muR;

vector[nP] muP;

//-----------Conditional Mean Abundances--------------

real<lower=0> meanG;

vector<lower=0>[nR] meanR;

vector<lower=0>[nP] meanP;

vector<lower=0>[nObs] meanS;

muG <- log(lambdaG) - 0.5 * square(sigmaG);

meanG <- lambdaG * piG;

for(r in 1:nR){

lambdaR[r] <- exp(muG + sigmaG * logLambdaRawR[r]);

muR[r] <- log(lambdaR[r]) - 0.5 * square(sigmaR[r]);

meanR[r] <- lambdaR[r] * piR[r];

}

for(p in 1:nP){

lambdaP[p] <- exp(muR[region[p]] + sigmaR[region[p]] *

logLambdaRawP[p]);

muP[p] <- log(lambdaP[p]) - 0.5 * square(sigmaP[p]);

meanP[p] <- lambdaP[p] * piP[p];

}

for(n in 1:nObs){

lambda[n] <- exp(muP[patch[n]] + sigmaP[patch[n]] *

logLambdaRaw[n]);

meanS[n] <- lambda[n] * eta[n];

}

model {

kappaG ~ cauchy(0,2);

kappaR ~ cauchy(0,2);

kappaP ~ cauchy(0,2);

sigmaG ~ cauchy(0,2);

sigmaR ~ cauchy(0,2);

sigmaP ~ cauchy(0,2);

lambdaG ~ lognormal(0, 2.5);

logLambdaRawR ~ normal(0,1);

logLambdaRawP ~ normal(0,1);

logLambdaRaw ~ normal(0,1);

piR ~ beta(piG * kappaG, (1 - piG) * kappaG);

for(p in 1:nP){

piP[p] ~ beta(piR[region[p]] * kappaR[region[p]],

(1-piR[region[p]]) * kappaR[region[p]]);

}

for(n in 1:nObs){

// probability of presence is drawn from a beta with patch-

// specific parameters

eta[n] ~ beta(piP[patch[n]] .* kappaP[patch[n]],

(1-piP[patch[n]]) .* kappaP[patch[n]]);

if(obs[n] == 0){

increment_log_prob(log_sum_exp(bernoulli_log(0,eta[n]),

bernoulli_log(1,eta[n]) + poisson_log(obs[n],

lambda[n])));

}

else{

increment_log_prob(bernoulli_log(1,eta[n]) +

poisson_log(obs[n], lambda[n]));

}

system profile:

R version 3.2.2 (2015-08-14)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.10.5 (Yosemite)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods

[7] base

other attached packages:

[1] markdown_0.7.7 rstan_2.7.0-1 inline_0.3.14 Rcpp_0.12.0

[5] shinystan_2.0.0 shiny_0.12.1 ggplot2_1.0.1

loaded via a namespace (and not attached):

[1] plyr_1.8.3 base64enc_0.1-3 shinyjs_0.1.0

[4] tools_3.2.2 xts_0.9-7 digest_0.6.8

[7] jsonlite_0.9.16 gtable_0.1.2 lattice_0.20-33

[10] yaml_2.1.13 parallel_3.2.2 proto_0.3-10

[13] gridExtra_2.0.0 stringr_1.0.0 dygraphs_0.4.5

[16] htmlwidgets_0.5 gtools_3.5.0 stats4_3.2.2

[19] grid_3.2.2 DT_0.1 R6_2.1.0

[22] reshape2_1.4.1 magrittr_1.5 threejs_0.2.1

[25] scales_0.2.5 codetools_0.2-14 htmltools_0.2.6

[28] shinythemes_1.0.1 MASS_7.3-43 mime_0.3

[31] xtable_1.7-4 colorspace_1.2-6 httpuv_1.3.3

[34] labeling_0.3 stringi_0.5-5 munsell_0.4.2

[37] zoo_1.7-12

ZIP Model LN 3.0.txt

zipData.R

Ben Goodrich

unread,

Aug 26, 2015, 4:14:03 PM8/26/15

to Stan users mailing list

On Wednesday, August 26, 2015 at 4:10:00 PM UTC-4, Zachary Marion wrote:

If I up the adapt_delta to 0.9, the model takes MUCH longer without any noticeable improvement in divergence, and with many iterations hitting maximum tree depth. Is there anything I can do to optimize this? Thanks for your help!

You need to first know where the divergent and / or treedepth limited transitions are occurring in the parameter space. The pairs() plot is best suited for that.

Ben

Zachary Marion

unread,

Aug 27, 2015, 10:31:17 AM8/27/15

to stan-...@googlegroups.com

Hi, thank you for the quick response! I apologize for being obtuse, but I am a bit unsure of what you are suggesting. Should I look at each parameter to see where this is occurring (i.e., all 500+ parameters)? Or are there certain ones I should focus on? I am also curious as to whether my poor mixing is the result of having so many hierarchical levels, or whether I have some model misspecification.

Thanks again for the help!

Zach

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/Enc6oL4bv54/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan

unread,

Aug 27, 2015, 11:23:32 AM8/27/15

to stan-...@googlegroups.com

You could extract the samples

fit <- stan(..)
samples <- extract(fit)

and then look at correlations between them to troubleshoot the problem.

plot(samples$muG, samples$muR)

One can also plot all variables at once using a scatter plot matrix, e.g. pairs(). But it is probably advisable to get a slightly simpler model to work first, before troubleshooting a complex multilevel model.

J

Ben Goodrich

unread,

Aug 27, 2015, 11:30:49 AM8/27/15

to Stan users mailing list

On Thursday, August 27, 2015 at 10:31:17 AM UTC-4, Zachary Marion wrote:

Hi, thank you for the quick response! I apologize for being obtuse, but I am a bit unsure of what you are suggesting. Should I look at each parameter to see where this is occurring (i.e., all 500+ parameters)? Or are there certain ones I should focus on? I am also curious as to whether my poor mixing is the result of having so many hierarchical levels, or whether I have some model misspecification.

The pairs() plot is only legible for 10 or 12 variables at a time. Definitely look at common parameters, variances / standard deviations, and lp__ . You might also look at hyperparameters and one or two of the parameters that depend on it.

Ben

Bob Carpenter

unread,

Aug 27, 2015, 1:21:00 PM8/27/15

to stan-...@googlegroups.com

A common problem is the weak identification and correlation of
hierarchical parameters. If there's not a lot of data informing each
level, then using the non-centered parameterization as described in
the manual will help.

- Bob

> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Zachary Marion

unread,

Aug 27, 2015, 1:22:17 PM8/27/15

to stan-...@googlegroups.com

I think I already did that for the lognormal parts of the model. I don’t know if there is a way to do it for the parameters that are beta distributed.

Bob Carpenter

unread,

Aug 27, 2015, 1:38:37 PM8/27/15

to stan-...@googlegroups.com

Thanks --- didn't see the code. I don't know how to change
the beta priors.

You can vectorize some of the code, and it'd be best to create
the beta parameters for region piR .* kappaR and (1 - piR) .* kappR
ahead of time and reuse them in the loop over nP. Same for the piP.

And there's a log_mix function you could use to replace that log_sum_exp.

But those will only make the code go faster --- it won't help with
mixing.

- Bob

Ben Goodrich

unread,

Aug 27, 2015, 9:13:55 PM8/27/15

to Stan users mailing list

Even if I increase adapt_delta and narrow some of the priors, there are still a ton of divergent transitions. The lambdaG-sigmaG plane is very funnelish.

Basically, Stan isn't going to be able to sample from this posterior well unless you can do some reparameterizing.

Ben

Reply all

Reply to author

Forward