converted into reproducible case studies. Then we can
> On Mar 8, 2016, at 4:46 PM, Michael Betancourt <
betan...@gmail.com> wrote:
>
> In every controlled experiment I’ve done n_eff < 0.1 * N has lead
> to atrocious behavior of the autocorrelation/n_eff estimators, even
> in Stan and even with multiple chains. The estimators are just way
> too noisy with such strong dependencies.
>
> On Mar 8, 2016, at 9:31 PM, Andrew Gelman <
gel...@stat.columbia.edu> wrote:
>
>> Sure, it’s not perfect but if R-hat is less than 1.1, I’m guessing that the chains have mixed ok. At least, back in the bad old days before NUTS, I’d see this sort of thing all the time, and the simulations were inefficient but eventually got there.
>>
>>> On Mar 8, 2016, at 4:28 PM, Michael Betancourt <
betan...@gmail.com> wrote:
>>>
>>> Yes 400 effective samples is sufficient for most applications ,but
>>> if n_eff < 0.1 * N then I’m arguing that the n_eff estimator itself is dubious.
>>>
>>> On Mar 8, 2016, at 9:12 PM, Andrew Gelman <
gel...@stat.columbia.edu> wrote:
>>>
>>>> Hi, if n_eff is less than 10% of the number of draws then, yes, that means the chains are moving slowly, but if you’ve run for 1000 iterations for 4 chains, that’s still giving you n_eff of 400 or whatever, which should be fine for most practical purposes.
>>>> A
>>>>
>>>>> On Mar 8, 2016, at 11:19 AM, Roy Martin <
royw...@gmail.com> wrote:
>>>>>
>>>>> Hello again,
>>>>>
>>>>> As it turns out, although this model was converging well with my empirical dataset, such that all Rhat < 1.1, after doing some more reading on convergence diagnostics in Stan, I decided to download shinystan (amazing tool here!) to take a look at some other measures. It turns out that, after runs with warmups of 2k-10k, there were still quite a few parameters where NEff was less than 10% of the samples (1000 x 4 chains). Also, monte carlo std err was greater than 10% for a some parameters.
>>>>>
>>>>> So, I decided to try to simulate a similar dataset to see if similar convergence issues were apparent and what parameter recovery looked like and I ran into similar issues with the sim data. Likewise, although the runs generally recovered the individual and group level predictors (beta, gamma) quite well, the correlations at both the units and group (traits) level left a good bit to be desired. In particular, large correlations usually were not included in the 95% CI. I thought, probably naively, that maybe this was because lkj_corr_cholesky(eta=4) was too strong a constraint for the simulated correlations, where several were quite large. So I lowered eta to 2, which may have made some slight difference, but still not great. I attached coefplots from a run (with 4000 warmup/5000 iter, eta=2) for illustration. Of course, lowering eta also occasionally lead to some divergent transitions, and certainly didn't help with NEff > 10%, etc. There was also often concerning autocorrelation in the chains. All of these diagnostics varied a little from run to run when adjusting things like adapt_delta, but I've yet to find the right combination of adapt_delta, max_treedepth, warmup, etc that would result in a model passing all of the QAQC, particularly NEff > 10%.
>>>>>
>>>>> In the end, I'd guess that this model potentially has high parameter correlation in the posterior (?) and needs some optimization. I have been reading the manual and trying to understand the "Matt Trick" and "non-centering" approaches in this context, and I think I get the gist and am trying to figure out how to implement those now. However, I wanted to make this post in hopes that someone might be able to give me a nudge. These operations are a little confusing to me because (1) matrix algebra is not yet a native language to me (an ecologist), and (2) multivariate sigma is 1 in this latent model and I am having trouble translating the manual examples for this case.
>>>>>
>>>>> On another front, and sorry if this is documented somewhere I haven't looked, but I am not sure what a reasonable number of warmup iter should be for a model of this complexity; or at what point I should give up on increasing warmup to improve the diagnostics. I did feel that, in this case, increasing warmup perhaps led to slight improvements. Yet, even after 10k warmup iter (say 1-2 hrs runs), there were still quite a few parameters with NEff < 10% and parameter recovery for the correlations was similarly poor (but maybe slightly better?). As such, I would not bet that running 100k iter would make the difference; and from what I've read, even 10k seems large for Stan. So I wanted to check here before potentially spending further hours upon days of processing time trying to tune warmup and control options, even after perhaps optimizing the model parameterization.
>>>>>
>>>>> Many thanks again, in advance. And sorry for the long-winded post. Below are the coefficient plots, the Stan model, and dataset sim in R.
>>>>>
>>>>> -Roy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>