Hi all,
I have been having some issues with implementation of the horseshoe prior in Stan.
Using half-cauchy priors for both the local and global scale parameters results in difficult sampling (slow sampling, divergent transitions even with high acceptance rate and low step-size). Re-parameterizations of the cauchy helps but does not resolve this issue.
After much exploration of the posterior samples, the issue seems to occur when the global scale parameter approximates 0. Under this situation, the local scale for the relevant parameters is forced to extreme values far in the tails of the cauchy. Changing the priors on the scale parameters to student_t(3, 0, 1) alleviates the sampling issues, but results in lower shrinkage of less-relevant parameters towards zero.
I was wondering if anyone has experienced this issue and could offer guidance. In particular, I am thinking of using a boundary avoiding prior (such as a gamma(2, _) per BDA) on the global scale parameter to avoid this degenerate case. Or is it preferable to stick to using student_t priors rather than the cauchy?
Thanks in advance,