[off issue; back on dev list]
I'd rather keep these higher-level design discussions on
the mailing list and then put the actual feature request
or bug in the tracker. Bigger designs can go on a wiki,
but I think this one's pretty isolated.
Now to answer Tamas's question, there's a lot going on
with variables.
The saving of variables can be further subdivided into
1. variables we want to save for monitoring convergence,
2. variables we want to report means and quantiles for,
3. variables we want to use for downstream posterior inference.
As Stan stands, if you want a variable for any of these purposes
***and you need to use it in the model***, then it has to be declared
as a transformed parameter.
One idea we've been toying with is just saving the sufficient
stats for (1) or (2), because really (2) is just a special case
of (1). And in some ways, so is (3) if we do all the downstream
inference in the generated quantities block. Otherwise, if
we need the draws externally, then the variable has to be saved.
Then there's a scoping issue. Variables declared in transformed
parameters are visible in the model, and in fact, if you look
at the C++ code generated, other than I/O, transforming
transformed parameters {
...A...
}
model {
... B ...
}
into
model {
...A...
...B...
}
produces almost identical code. Importantly, everything gets autodiffed.
Now, *** if you don't need the variable in the model block ***, then
it should be defined as a generated quantities. Importantly here, it
is evaluated as a double in C++ with no autodiff (hence it's anywhere
from 2 to 10 or more times faster and uses a small fraction of the
memory).
Also, the variables in the parameters block often don't need to
be saved, either.
There is a *very big difference* between declaring something as
a parameter and transformed parameter --- the parameters after transform
to the unconstrained scale represent the actual variables being sampled
and the actual model density we care about. Everything else is a kind
of intermediate quantity.
We should put some more thought into all of this. One idea we toyed
with is marking variables as ones to save or not.
- Bob
> On Nov 8, 2015, at 11:23 AM, Tamas K. Papp <
notifi...@github.com> wrote:
>
> Responding to the mention on the mailing list. Since I opened the original issue, I have been thinking about this. I think that there are two, almost orthogonal, issues here:
>
> • what additions to the final log (posterior or likelihood) function is enabled for which mode (ML or Bayesian).
>
> • what variables are recorded. Eg transformed parameters are. But sometimes it would make sense to save the posterior sample for some intermediate variables. Or even disable saving some parameters, if one wants to make the resulting file smaller.
>
> (1) is about ~ (or equivalently incrementation) statements, while (2) is about variables.
>
> IMO Stan should have semantics for designating this information in a stan file.
>
> @bob-carpenter: I would rather dispense with the transformed parameters block altogether, and have everything in the model. The user would just declare variables, make transformations, and then designate that some variables are recorded, some aren't. By default all params would be recorded, and nothing else (except generated quantities). The user could change this for all variables, eg at the declaration:
>
> parameters {
> real x discarded;
> real y; // kept by default
> }
> model {
> real z saved;
> real v; // discarded by default
> }
>
> Blocks like
>
> jacobian_adj {
> }
> prior {
> }
>
> would designate the semantics of pieces of code.
>
> —
> Reply to this email directly or view it on GitHub.
>