Slow convergence with a hierarchical model

296 views
Skip to first unread message

Odile.S

unread,
Jun 1, 2016, 12:36:31 PM6/1/16
to Stan users mailing list
Hi everyone,

I apologize in advance if my questions are too naive. I have only been using STAN for a few days.

I have a dataset which consists in repeated measurements for a group of p (p=250) individuals. Each individual has a positive number n_i of observations (n_i is not necessarily the same for every individual). I would like to fit the following model to this dataset :

functions {
    real gamma0(real p, real t, real v, real s){
    real r;
    r <- 1/(1+((1/p)-1)*exp( -v*(s-t)/(p*(1-p)) ));
    return r;
  }
}

data {
  // int p = number of individuals
  // int K = number of observations
  // real t = time of observations (tij)
  // real y = observations (yij)
  // int group = grouping variables
  int<lower=1> p;
  int<lower=1> K;
  real t[K];
  real y[K];
  int<lower=1> group[K];
}

parameters {
  real<lower=0,upper=1> p0;
  real<lower=0,upper=100> t0;
  real v0;
  real<lower=0> sigma_eta;
  real<lower=0> sigma_tau;
  real<lower=0> sigma;
  real eta[p];
  real tau[p];
}

transformed parameters {
  real mu[K];
  for (j in 1:K) {
    mu[j] <- gamma0(p0,t0,v0,eta[group[j]]*(t[j]-t0-tau[group[j]])+t0);
  }
}

model {
  eta ~ lognormal(0,sigma_eta);
  tau ~ normal(0,sigma_tau);
  y ~ normal(mu,sigma);
}

With these definitions, I assume uniform priors on the parameters p0, t0 and flat improper priors on the other parameters. eta and tau are random effects of the model. group is a grouping variable (a vector of integers) which allow to determine which element in the data vector y belong to a given individual.
I run the analysis using :

fit <- stan(file = 'mymodel.stan', data = testdata, init = init_stan, chains = 1, iter = 5000, control = list(adapt_delta = 0.9, max_treedepth=11), pars = c("p0","t0","v0","sigma_eta","sigma_tau","sigma","lp__"))

The number of iterations is chosen arbitrarily. I use random initialisations even though this is not necessary. After a few tests, STAN advised me to increase adapt_delta and max_treedepth, which I did. Once the 5000 iterations are done, I run :

 summary(do.call(rbind, args = get_sampler_params(fit, inc_warmup = FALSE)),digits = 2)

and notice that the number of leapfrog steps after warmup is quite high. I believe that this is why STAN takes quite some time to perform the 5000 iterations. (This might be also because my code is not well-written). I also notice that the mean accept_stat__ is quite close to 1 (Is this a problem ?) The results are reported below.

accept_stat__    stepsize__      treedepth__  n_leapfrog__  n_divergent__
 Min.   :0.46   Min.   :0.0029   Min.   :10   Min.   :1023   Min.   :0   
 1st Qu.:0.92   1st Qu.:0.0029   1st Qu.:10   1st Qu.:1023   1st Qu.:0   
 Median :0.97   Median :0.0029   Median :10   Median :1023   Median :0   
 Mean   :0.94   Mean   :0.0029   Mean   :10   Mean   :1023   Mean   :0   
 3rd Qu.:0.99   3rd Qu.:0.0029   3rd Qu.:10   3rd Qu.:1023   3rd Qu.:0   
 Max.   :1.00   Max.   :0.0029   Max.   :10   Max.   :1023   Max.   :0   

My question is the following : what can I change to make the convergence faster / reduce the number of leapfrog steps ?

Thank you for your help and your advice.

For reproducibility, I attach the dataset I used. The R steps to load the dataset are the following :

data <- read.csv("Dataset_test_full.txt", sep = ",", col.names = c('X','Y','group'))
X <- data$X
Y <- data$Y
group <- data$group
testdata <- list(p = group[length(group)], K = length(group), t = X, y = Y, group = group)
 

Dataset_test_full.txt

Bob Carpenter

unread,
Jun 2, 2016, 12:09:13 AM6/2/16
to stan-...@googlegroups.com
You're maxing out treedepth, which indicates a problem with
the model (you probably get warnings about hitting max treedepth),
and causes computational problems because you won't hit the U-turns
and will wind up devolving to a simple diffusion.

I would try simulating from the model and seeing if you can
fit it. If there's a bad misspecification (model doesn't fit
data well), then sampling can be a problem. It's also good
to help debugging.

If you look at the posterior, you shoudl be able to detect
(for example, with the pairs() plot restricted to a subset
of parameters), if there's non-identifiability (very highly
correlated parameters).

I have no idea what that gamma0 function is supposed to
be doing and how it affects what's going on. You don't need to assign
to r and then return r by the way. You can just return the
expression directly.

Some general suggestions:

Non-centered parameterizations for the hierarchical parameters. For
example, instead of

> tau ~ normal(0,sigma_tau);


you can define a parameter tau_raw and set

tau <- tau_raw * sigma_tau;

and take

tau_raw ~ normal(0, 1);

You can do this for lognormals, but it's trickier.

I would also suggest more informative priors. But you should
look where your fit values go. If values are running away into
long tails, you definitely want priors.

Use Stan's built-in random inits --- you shouldn't need to
specify them yourself.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <Dataset_test_full.txt>

Odile.S

unread,
Jun 2, 2016, 6:04:23 AM6/2/16
to Stan users mailing list
Dear Bob,

Thank you for your answer.

I apologize, I should have mentioned that the dataset attached to the original post is indeed generated from the model. The data was generated using :

p0 = 0.24, t0 = 70, v0 = 0.034, sigma_eta = 0.5, sigma_tau = 7, sigma = 0.01

STAN estimated (with 5000 iterations and flat priors) the following values :

p0 = 0.21, t0 = 68.8, v0 = 0.030, sigma_eta = 0.53, sigma_tau = 6.7, sigma = 0.099.

I guess these estimates are not so bad. From these, I deduce that it is possible to fit the model on "synthetic data". Still, I followed your advice and tried the pairs() plot for parameters p0, t0 and v0. The point clouds are all oriented along the first diagonal, which suggest a strong positive correlation pairwise. In theory, the triplet (p,t,v) which defines the function gamma0 is not unique. This would clearly cause unidentifiability. Still, if t (for example) is fixed, the triplet becomes unique. Because of the definition

gamma0(p0,t0,v0,eta[group[j]]*(t[j]-t0-tau[group[j]])+t0);

where t0 plays a particular role, I thought that, in theory, the model is indeed identifiable. Am I mistaken ?

I ran STAN on a sub-dataset of the one I attached to the original post. I included only the 50 first individuals (p=50). The performance of STAN seems better (see results below) even though the pairs() plots have not really improved.

 Min.   :0.044   Min.   :0.0074   Min.   : 8.0   Min.   : 255   Min.   :0   
 1st Qu.:0.907   1st Qu.:0.0074   1st Qu.: 8.0   1st Qu.: 255   1st Qu.:0   
 Median :0.970   Median :0.0074   Median : 8.0   Median : 255   Median :0   
 Mean   :0.931   Mean   :0.0074   Mean   : 8.7   Mean   : 489   Mean   :0   
 3rd Qu.:0.993   3rd Qu.:0.0074   3rd Qu.: 9.0   3rd Qu.: 511   3rd Qu.:0   
 Max.   :1.000   Max.   :0.0074   Max.   :12.0   Max.   :4095   Max.   :0

It seems that STAN converges more slowly as the size of the dataset increases. Is this due to identifiability issues ? What is a usual workaround for such identifiability issues ? Use priors ?

Thank you
Message has been deleted

Odile.S

unread,
Jun 5, 2016, 7:18:26 AM6/5/16
to Stan users mailing list
Hello everyone, 

I would appreciate some advice regarding these problems. Can someone, please, help me ?

Thank you !

Daniel Lee

unread,
Jun 6, 2016, 7:18:14 AM6/6/16
to stan-...@googlegroups.com
Hi,

What Bob suggested is exactly what I would suggest:
1. Add tighter priors
2. Look at the non-centered reparameterization for your model. 

If you know you have an additive or multiplicative non-identifiably, then you know your posterior distribution isn't proper, so you'll need to think about your model. 

If synthetic data fits fine and real data doesn't, then I'd double check the model for misfit.




Daniel

Bob Carpenter

unread,
Jun 6, 2016, 10:30:59 AM6/6/16
to stan-...@googlegroups.com
The general advice is

* if possible, reparameterize so that the models are identifiable

* if not (say, more predictors than data points in a regression),
then control the behavior with priors (and accept the fact that
it'll be relatively slow)

- Bob


> On Jun 5, 2016, at 7:09 AM, Jean-Baptiste SCHIRATTI <jean.baptis...@gmail.com> wrote:
>
> Hello,
>
> Can anyone, please, give me some advice regarding these identifiability questions ?
>
> Thank you !
>
> Le jeudi 2 juin 2016 12:04:23 UTC+2, Odile.S a écrit :

Odile.S

unread,
Jun 6, 2016, 1:19:18 PM6/6/16
to Stan users mailing list
Thank you for your advice ! I will look into this and try to improve the performance of the model.
Reply all
Reply to author
Forward
0 new messages