On 8/28/13 6:43 PM,
eligu...@yandex.ru wrote:
> Dear STANsters,
>
> First, thanks for developing an exciting and well-designed and well-documented tool for Bayesian sampling.
>
> I would like to use STAN to estimate parameters that are nestled inside of the mean and variance-covariance matrix of a
> multi-variate normal distribution, i.e. X ~ MVN(Mu(theta), Sigma(theta)), where "thetas" are the parameters and the
> likelihood is just the probability density of the data vector X. I am using STAN from within R, and the ultimate
> objective is to develop a package that obtains these estimates (and C.I.'s) from data.
So it's like a Gaussian process example where Mu and Sigma are
functions of parameters? There are some examples in the manual.
> As a simplistic example, I attach an example [.stan and .r files] where I estimate the autocorrelation coefficient rho
> of an AR(1) process by using the variance-covariance of the complete time-series [i.e. Sigma[i,j] = rho^(abs(i-j))].
> This is not the best way to estimate rho, but it is exactly the template I'm looking for.
>
> I got this to work, but have several questions. (I am a bit of a novice here, so apologize in advance if the answers
> are well-known or documented elsewhere - I really did dig around):
>
> 1) The compilation worked using "set_cppo("fast")", but it would not compile using "set_cppo("debug")", giving me a
> non-specific error, with only a Cygwin POSIX path warning. How/why is that possible?
set_cppo("fast") sets C++ compiler
optimization to level 3 and set_cppo("debug") makes
it optimization level 0. The current Stan (1.3.0) also uses set_cppo("fast")
to turn off some error handling, leading to potential R crashes.
In 2.0, the options will only set optimization and not error handling.
The problem is the C++ toolchain on Windows, which typically runs out
of memory or just dies when setting optimization level 0.
> It takes quite a while to compile
> - and if the "debug" is faster [plus more informative output], then
> that would be much more convenient for development.
We don't want to rely on error messages from C++, but rather catch
them in Stan directly. The debug version won't have more informative
output as far as Stan's concerned.
> 2) I see that the compiled C++ code is stored somewhere deep (Users/eli/AppData/Temp/Local ... etc.) with some
> complicated auto-generated name. I couldn't figure out how to control where that code goes or how it is named.
Why would you want to control this?
Jiqiang should be able to help if you really need to control it.
> A
> further question: Will it eventually be straightforward to implement the compiled C++ code in an R package via Rcpp
> (which is the ultimate goal)?
I'm not sure what you mean by "implement the compiled C++ code".
You can use the command-line version of Stan to compile the C++
code for a model and then do whatever you want with it. But the I/O
is tricky given the constraints.
> 3) If there are any comments on the optimality/efficiency of the "stan" code for sampling from the MVN itself, they
> would be appreciated. For the desired application, speed is highly desired.
We're working on making multivariate normal faster in a few different
ways. One, we'll vectorize the distributions, and two, we'll support
Cholesky factor representations everywhere. For now, Ben and Marcus have
provided a lot of examples on the mailing list. But the basic idea is
to do the heavy lifting outside of a loop as much as possible. So rather than
for (n in 1:N) y[n] ~ multi_normal(mu,Sigma);
which causes Sigma to be solved in every iteration, you can do something like:
L <- cholesky_decompose(Sigma);
for (n in 1:N) y[n] ~ multi_normal_cholesky(mu,L);
- Bob