Hey Ben, I like the horseshoe example, thanks. I noticed that if you set the acceptance target close to 1 like you did it does help cut down on the diverging errors but even still there are a few during sampling (post-warmup). Are we comfortable saying it's converged unless there are exactly zero?
Umm, no p-values, please.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
split R-hat < 1.1
I noticed that too (the whole posterior blows up for some chains). I even tried it with plain Phi() in case the fast approximation was not accurate enough and it still blew up. This is definitely a concerning aspect of horseshoe prior, since I have never seen the geometry get worse when you do the normal -> uniform -> inverse CDF dance.
mc2 <-
'
data {
int<lower=0> n;
int<lower=0> p;
matrix[n,p] X;
vector[n] y;
}
parameters {
vector[p] z;
vector<lower=0>[p] lambda_num;
vector<lower=0>[p] lambda_denom;
real<lower=0> z_tau;
real<lower=0> sigma;
}
transformed parameters {
vector[p] beta;
real<lower=0> tau;
tau <- tan(pi() * (Phi_approx(z_tau) - 0.5));
beta <- z .* lambda_num ./ lambda_denom * tau;
}
model {
lambda_num ~ normal(0, 1);
lambda_denom ~ normal(0,1);
z_tau ~ normal(0, 1);
z ~ normal(0,1);
y ~ normal(X * beta, sigma);
}
'
How about Tomi's version?
From http://becs.aalto.fi/en/research/bayes/diabcvd/wei_hs.stan
We talked about this briefly with Mike at the Stan meeting today. Without any data, the last three should be fine because the priors amount to products of independent standard normals (or some inverse gammas in Tomi's parameterization). But with the data, the posterior distribution is less amenable to NUTS than the prior distribution.
One issue might be that a horseshoe prior may not reasonable for an intercept, since there is no obvious reason to concentrate prior beliefs about it near zero. But even if I center the outcome variable, the above holds. Another issue might be that some prior is needed on the standard deviation of the errors, but I have tried a Jeffreys prior and a exponential prior for that without success.
Under the ratio of half-normals approach to tau, you get this pairs plot (with non-divergent transitions below the diagonal and divergent transitions above. It seems that for small (but not too small) values of the denominator, there is a concentration of divergent transitions that coincides with above average values of lp__. I think this suggests that the curvature near the mode along this dimension is great while the curvature is minimal away from the mode, and thus NUTS cannot adapt well.
Thus, I think we need to see if this is a more general issue with situations like this before we recommend that horseshoe priors be used in something like stan_lm, particularly if we are not expecting users of stan_lm to do much in the way of diagnostics.
Ben
--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<lars.R.dump>
> It seems that for small (but not too small) values of the denominator, there is a concentration of divergent transitions that coincides with above average values of lp__. I think this suggests that the curvature near the mode along this dimension is great while the curvature is minimal away from the mode, and thus NUTS cannot adapt well.
That seems like what it's suggesting. Are the scales the same on
the vertical and horizontal? It's hard to tell with R's default
labels.