That's fantastic. Did you see if the results were similar
to those you got with MCMC?
Some answers inline.
On Jul 16, 2015, at 11:52 PM, Sebastian Weber <
sdw....@gmail.com> wrote:
>
> Hi!
>
> I have started to play with ADVI and I must say I am impressed by the speed, but I still need to check if I get garbage (on a first look everything looks ok). Now, a couple of questions (and if you answer with RT(F)M that's fine - justlet me know which document to read):
Other than the doc in the Stan manual and CmdStan manual (2.7.0 versions
of both of which are now available), you'll want to read Alp's
arXive paper for a deeper description of how it works:
http://arxiv.org/abs/1506.03431
> - A run with the default settings took 3h, a run with eta_agrad=0.05 (half of the default) took only 20 minutes. Is this normal? On what scale should I vary eta_agrad? I.e. try multiples or offsets (so 1/2^n or eta_agrad= 0.1, 0.09, 0.08 ...)
>
> - ADVI complained about diverging ELBO during the first 1000 iterations and the delta_ELBO_mean was in the thousands whereas the delta_ELBO_median was around or below 1. Do I need to worry now?
These first two, Dustin or Alp are going to have to tackle.
> - For which type of problems does ADVI work well and which ones not?
It's labeled "experimental" exactly because we don't know
under what conditions it'll work well or how best to tune it.
So all these data points are super helpful.
> - Is the resulting CSV a dump of MCMC-like samples? So can I use it in the usual way?
As to the draws, those are from the variational approximation,
as explained in the paper. But they're meant to be usable in
the same way as the MCMC output. So if everything's done right,
you should be able to read them in just like MCMC output.
I think the plan is to perhaps do some importance weighting
in the future to make expectation calculations closer to the
true posterior, but I'm not 100% sure.
> - Does Rhat mean anything? What else do I need to check for non-convergence?
The Rhat won't mean anything --- it should have n_eff close to the
number of iterations because it's using pure Monte Carlo (not Markov chain)
draws.
> - Can read_stan_csv from rstan read in these csv files or should I read these in with R and skip over the first 1000 warmups?
If everything's coded correctly, it should be. I haven't tried it.
> I am still evaluating, but just let me say that ADVI crunched an ODE problem which takes almost 3 days per chain in just 20-30 minutes !!! If the result is any usable then this is just awesome!
This is really exciting news. Thanks again for trying this out. I
can see a whole stream of PK/PD publications coming out of this if
it really is that great. Do you have an estimate of how long NONMEM
would take to fit a similar problem?
- Bob