Tree depth and Leapfrog

1,142 views
Skip to first unread message

Ben Lambert

unread,
Jul 13, 2015, 9:35:10 PM7/13/15
to stan-...@googlegroups.com
Hi,

I am fitting a model that seems to have an ok Rhat for all parameters. I have recently moved to a more natural parameterisation which has resulted in very few n_divergent iterations, but at the expense of tree depth being greater than 10; almost always 11 in fact. Correspondingly, n_leapfrog is now often much greater than 1000. 

My questions here are threefold:

1. What are the problems with treedepth being greater than 10 here? Does this mean that NUTS is terminating prematurely? What are the consequences of this for my posterior densities? I am trying to get to grips with the terminology here, and the concrete effects on my inferences of failing the specific diagnostics.

2. How should I deal with this? I assume that increasing max_treedepth will do the job, but is there anything else I need to do?

3. There is a trade off between parameterisations that have no divergent iterations, with large tree sizes, and those with many divergent iterations, but small tree sizes. I can understand why this trade off would come about when I vary the parameter adapt_delta, or stepsize, but am less sure why changing parameterisations would have this effect. In my case, I am moving from:

neg_binomial_2(mu,kappa) to neg_binomial_2(mu,1\phi)

To me, the second case is somewhat easier to cope with using priors for my data, since the bulk is not over-dispersed; meaning that kappa = 1\phi -> infinity, or alternatively phi = 0. In the first case, I am using a lognormal density for kappa. For the second, I am using a gamma density for phi.

Do you know why I may be seeing this trade off, and am I likely going about this the correct way?

My apologies for the verbosity of this message!

Best,

Ben

Bob Carpenter

unread,
Jul 13, 2015, 10:07:53 PM7/13/15
to stan-...@googlegroups.com

> On Jul 13, 2015, at 9:35 PM, Ben Lambert <ben.c....@googlemail.com> wrote:
>
> Hi,
>
> I am fitting a model that seems to have an ok Rhat for all parameters. I have recently moved to a more natural parameterisation which has resulted in very few n_divergent iterations, but at the expense of tree depth being greater than 10; almost always 11 in fact. Correspondingly, n_leapfrog is now often much greater than 1000.
>
> My questions here are threefold:
>
> 1. What are the problems with treedepth being greater than 10 here? Does this mean that NUTS is terminating prematurely?

Yes, if it bumps up against the limit, NUTS is stopping before
it hits a U-turn.

> What are the consequences of this for my posterior densities?

HMC becomes more of a random walk.

> I am trying to get to grips with the terminology here, and the concrete effects on my inferences of failing the specific diagnostics.
>
> 2. How should I deal with this? I assume that increasing max_treedepth will do the job, but is there anything else I need to do?

You should increase tree depth until you don't butt up against
the max.

It may also help to rescale parameters (usually by rescaling
predictors) and/or try to change parameterizations so there's not
so much posterior curvature (which is what causes the need for small
step sizes for a first-order approximation with gradients).

>
> 3. There is a trade off between parameterisations that have no divergent iterations, with large tree sizes, and those with many divergent iterations, but small tree sizes.

The problem with the divergent iterations is that they can bias the sampler;
rather than reducing to a random walk, which at least has the right stationary
properties, the divergences are due to arithmetic failures near boundaries that
might be important for estimates.

It's not such a big deal during warmup, though often reducing initial stepsize.

> I can understand why this trade off would come about when I vary the parameter adapt_delta, or stepsize, but am less sure why changing parameterisations would have this effect. In my case, I am moving from:
>
> neg_binomial_2(mu,kappa) to neg_binomial_2(mu,1\phi)
>
> To me, the second case is somewhat easier to cope with using priors for my data, since the bulk is not over-dispersed; meaning that kappa = 1\phi -> infinity, or alternatively phi = 0. In the first case, I am using a lognormal density for kappa. For the second, I am using a gamma density for phi.

What do the posteriors for phi (or 1 / phi) look like? Is it trying to get
close to 0?

How much do you care about getting really close to zero? Because
otherwise the direct kappa parameterization seems easier --- once it
gets fairly large, it'll be almost zero and in those cases you probably
don't care exactly what it is.

> Do you know why I may be seeing this trade off, and am I likely going about this the correct way?

It would help to see the actual model to debug the parameterization.

> My apologies for the verbosity of this message!

Mine too for the answer :-)

- Bob

Ben Goodrich

unread,
Jul 13, 2015, 11:21:54 PM7/13/15
to stan-...@googlegroups.com, ca...@alias-i.com
On Monday, July 13, 2015 at 10:07:53 PM UTC-4, Bob Carpenter wrote:
> My questions here are threefold:
>
> 1. What are the problems with treedepth being greater than 10 here? Does this mean that NUTS is terminating prematurely?

Yes, if it bumps up against the limit, NUTS is stopping before
it hits a U-turn.

I myself have wondered whether there is a way to distinguish between it leapfrogging up to max_treedepth and
  1. Deciding that it would want to U-turn
  2. Deciding that it would want to take at least one more leapfrog step but isn't allowed to

In this sense, hitting max_treedepth + 1 is not as unambiguous as n_divergent__ is.


Ben


Ben Lambert

unread,
Jul 14, 2015, 8:35:43 AM7/14/15
to stan-...@googlegroups.com
Hi,

Thanks Bob and Ben for your answers; those are helpful for my understanding. If I understand it correctly, does NUTS terminating early just mean HMC is like bog standard Metropolis; meaning that the posteriors should be ok, but the model will take a long time to run?

Ok - it actually seems that on reflection, the dispersion parameter of the negative binomial is not the part that is causing the problem. If I move to a hierarchical poisson regression model of the form below, I still run into issues of a large tree depth. Below K=106, and is the total number of individual time series. nSpecies=21, and shows the total number of distinct species across all the studies.

I have tried moving to a non-centered parameterisation, do you think this could help? If so, is there any way you could hint how I can do this, as the centered models I have tried are not working?

Are there any other ideas for how to make the below less likely to run into the issues of large tree sizes?

parameters {
  real mPsi[nSpecies];
  real<lower=0> sigmaPsi[nSpecies];
  real<lower=0,upper=1> psi[K];
  real mMu[nSpecies];
  real<lower=0> sigmaMu[nSpecies];
  real<lower=0,upper=1> mu[K];
  real<lower=0,upper=1000> kappa[K];


transformed parameters {
  real phi[K];
  real eta[K];
  for (i in 1:K)
  {
phi[i] <- logit(psi[i]);
eta[i] <- logit(mu[i]);
  }
}

model {
  for (i in 1:K) {
real lambda[S[i]];
int tempSpeciesIndex;
tempSpeciesIndex <- SpeciesIndex[i];
real R_times_psi;
R_times_psi <- RelNumber[PosRel[i]]*psi[i];
        ## S[i] denotes the number of data points for a given time series
for (j in 1:S[i])
{
lambda[j]<- R_times_psi*(1 - mu[i])^t[Pos[i] + j - 1];
Y[Pos[i] + j - 1] ~ poisson(lambda[j]);
              }
      }
  ## K=106, and the total number of individual studies in my model
  for (i in 1:K)
  {
   int speciesTemp;
   speciesTemp <- SpeciesIndex[i];
   eta[i] ~ student_t(1,mMu[speciesTemp],sigmaMu[speciesTemp]);
   phi[i] ~ student_t(1,mPsi[speciesTemp],sigmaPsi[speciesTemp]);
  }
  
## Species level parameters
  mMu ~ normal(0,10);
  mPsi ~ normal(0,10);
  sigmaMu ~ cauchy(0,2.5);
  sigmaPsi ~ cauchy(0,2.5);
  
  for (i in 1:K)
  {
increment_log_prob(-log(mu[i]*(1-mu[i])));
increment_log_prob(-log(psi[i]*(1-psi[i])));
  }

}

Thanks,

Ben

Bob Carpenter

unread,
Jul 15, 2015, 1:26:50 AM7/15/15
to stan-...@googlegroups.com

> On Jul 14, 2015, at 8:35 AM, Ben Lambert <ben.c....@googlemail.com> wrote:
>
> Hi,
>
> Thanks Bob and Ben for your answers; those are helpful for my understanding. If I understand it correctly, does NUTS terminating early just mean HMC is like bog standard Metropolis;

More like Langevin, but yes, more like a random walk.
The big advantage of HMC is that it can follow the Hamiltonian
across the posterior.

> meaning that the posteriors should be ok, but the model will take a long time to run?
>
> Ok - it actually seems that on reflection, the dispersion parameter of the negative binomial is not the part that is causing the problem. If I move to a hierarchical poisson regression model of the form below, I still run into issues of a large tree depth. Below K=106, and is the total number of individual time series. nSpecies=21, and shows the total number of distinct species across all the studies.
>
> I have tried moving to a non-centered parameterisation, do you think this could help?

Maybe --- depends on how much data you have. Read Michael's paper for
more details.

> If so, is there any way you could hint how I can do this, as the centered models I have tried are not working?
>

What are you trying to center?

> Are there any other ideas for how to make the below less likely to run into the issues of large tree sizes?
>

Well, you don't need locals for integer indexing --- they won't
hurt, but they won't help with time as they're not involved in
autodiff. You can just use Cauchy instead of a Student-t with 1 degree
of freedom. Those aren't very tight priors, which can be problematic
in some cases. I have no idea what the effect of the logit transform
is going to be on phi and eta. I take it that's what the Jacobian's for.
Is that really

log |d/du log(u / (1 - u))|

Rather than logit transforming a constrained variable between (0,1),
it might be better to do it the other way around --- just delcare the
variables you care about to be unconstrained.

- Bob

Ben Lambert

unread,
Jul 15, 2015, 8:10:31 AM7/15/15
to stan-...@googlegroups.com
On Wednesday, July 15, 2015 at 6:26:50 AM UTC+1, Bob Carpenter wrote:

> On Jul 14, 2015, at 8:35 AM, Ben Lambert <ben.c....@googlemail.com> wrote:
>
> Hi,
>
> Thanks Bob and Ben for your answers; those are helpful for my understanding. If I understand it correctly, does NUTS terminating early just mean HMC is like bog standard Metropolis;

More like Langevin, but yes, more like a random walk.
The big advantage of HMC is that it can follow the Hamiltonian
across the posterior.

> meaning that the posteriors should be ok, but the model will take a long time to run?
>
> Ok - it actually seems that on reflection, the dispersion parameter of the negative binomial is not the part that is causing the problem. If I move to a hierarchical poisson regression model of the form below, I still run into issues of a large tree depth. Below K=106, and is the total number of individual time series. nSpecies=21, and shows the total number of distinct species across all the studies.
>
> I have tried moving to a non-centered parameterisation, do you think this could help?

Maybe --- depends on how much data you have.  Read Michael's paper for
more details.

> If so, is there any way you could hint how I can do this, as the centered models I have tried are not working?
>

What are you trying to center?

I was trying to move away from the centered parameterisation for eta. For example:

transformed parameters{
for (i in 1:K){
    int speciesTemp;
    speciesTemp <- SpeciesIndex[i];
    eta[i] <- mMu[speciesTemp] + sigmaMu[speciesTemp]*eta_raw[i];
    }

model {
...
eta_raw ~ normal(0,1);
mMu ~ normal(0,10);
sigmaMu ~ cauchy(0,2.5); 
}

Does the above make sense? I am following Michael's paper, and the Stan guide as far as I can see...

Bob Carpenter

unread,
Jul 15, 2015, 7:40:06 PM7/15/15
to stan-...@googlegroups.com
Yes, that looks reasonable. You don't need to define the
local for just array indexing --- it won't help measurably
with efficiency.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages