Hi Bob,
Thanks a lot for your advice and patience! I tried to follow your advice as good as possible:
1. Fixing one of the parameters. The only parameter that was possible to fix was ksi. I changed omega ~ gamma(a, ksi) to omega ~ gamma(a, 0.1). a is a constant, in this example 1. This did improve number of divergent transitions but there were still 171 divergent transition with adapt_delta=0.99. Additionally, number of effect samples sharply decreased to <10 for many variables.
2. Starting with a simple model. I'm always trying to do this but for hierarchical shrinkage priors this is difficult. I performed a test by omitting omega, thus changing beta = beta_raw .* (0.5 * omega * (psi .* phi)) to beta = beta_raw .* (0.5 * (psi .* phi)). This did not improve number of divergent transitions but effective number of samples and estimates looked good. I also removed psi from the model (instead of psi), thus beta = beta_raw .* (0.5 * omega * phi), but this did also not improve number of divergent transitions.
3. Tighten the exponential(0.5) prior. I changed this prior to exponential(10) but this did unfortunately not improve the number of divergent transitions and because of the multiplication in the model omega increased largely. I therefore additionally fixed omega to omega ~ gamma(1, 0.1) but this also did not change number of divergent transitions (but estimates looked good).
4. Reformulate in something not multiplicatively scaled. With the values of a_pi I'm using, the model is in fact equivalent to:
ksi ~ gamma(b, 1);
lambda ~ gamma(a_pi, ksi);
beta ~ double_exponential(0, sqrt_vec(lambda * 0.5)); //sqrt_vec is square root of a vector
This model needs less multiplication but unfortunately did not improve the number of divergent transitions. Additionally, it is hard to make hierarchical shrinkage priors without multiplication of the scale (the horseshoe prior needs also multiplication of the scale for example). The model I would like to implement in Stan can be found in formula 8 of
https://arxiv.org/pdf/1609.00046v1.pdf. This paper also shows how to implement a Gibbs sampler but because I understood that Hamiltonian Monte Carlo is more reliable I would really like to implement this in Stan. Do you think this is possible?
Many thanks,
D
Op woensdag 7 september 2016 16:21:38 UTC+2 schreef Bob Carpenter: