Re: [stan] `increment_log_prob` only allowed in the `model` section (#889)

Bob Carpenter

unread,

Nov 8, 2015, 3:10:49 PM11/8/15

to stan...@googlegroups.com

[off issue; back on dev list]

I'd rather keep these higher-level design discussions on
the mailing list and then put the actual feature request
or bug in the tracker. Bigger designs can go on a wiki,
but I think this one's pretty isolated.

Now to answer Tamas's question, there's a lot going on
with variables.

The saving of variables can be further subdivided into

1. variables we want to save for monitoring convergence,

2. variables we want to report means and quantiles for,

3. variables we want to use for downstream posterior inference.

As Stan stands, if you want a variable for any of these purposes
***and you need to use it in the model***, then it has to be declared
as a transformed parameter.

One idea we've been toying with is just saving the sufficient
stats for (1) or (2), because really (2) is just a special case
of (1). And in some ways, so is (3) if we do all the downstream
inference in the generated quantities block. Otherwise, if
we need the draws externally, then the variable has to be saved.

Then there's a scoping issue. Variables declared in transformed
parameters are visible in the model, and in fact, if you look
at the C++ code generated, other than I/O, transforming

transformed parameters {
...A...
}
model {
... B ...
}

into

model {
...A...
...B...
}

produces almost identical code. Importantly, everything gets autodiffed.

Now, *** if you don't need the variable in the model block ***, then
it should be defined as a generated quantities. Importantly here, it
is evaluated as a double in C++ with no autodiff (hence it's anywhere
from 2 to 10 or more times faster and uses a small fraction of the
memory).

Also, the variables in the parameters block often don't need to
be saved, either.

There is a *very big difference* between declaring something as
a parameter and transformed parameter --- the parameters after transform
to the unconstrained scale represent the actual variables being sampled
and the actual model density we care about. Everything else is a kind
of intermediate quantity.

We should put some more thought into all of this. One idea we toyed
with is marking variables as ones to save or not.

- Bob

> On Nov 8, 2015, at 11:23 AM, Tamas K. Papp <notifi...@github.com> wrote:
>
> Responding to the mention on the mailing list. Since I opened the original issue, I have been thinking about this. I think that there are two, almost orthogonal, issues here:
>
> • what additions to the final log (posterior or likelihood) function is enabled for which mode (ML or Bayesian).
>
> • what variables are recorded. Eg transformed parameters are. But sometimes it would make sense to save the posterior sample for some intermediate variables. Or even disable saving some parameters, if one wants to make the resulting file smaller.
>
> (1) is about ~ (or equivalently incrementation) statements, while (2) is about variables.
>
> IMO Stan should have semantics for designating this information in a stan file.
>
> @bob-carpenter: I would rather dispense with the transformed parameters block altogether, and have everything in the model. The user would just declare variables, make transformations, and then designate that some variables are recorded, some aren't. By default all params would be recorded, and nothing else (except generated quantities). The user could change this for all variables, eg at the declaration:
>
> parameters {
> real x discarded;
> real y; // kept by default
> }
> model {
> real z saved;
> real v; // discarded by default
> }
>
> Blocks like
>
> jacobian_adj {
> }
> prior {
> }
>
> would designate the semantics of pieces of code.
>
> —
> Reply to this email directly or view it on GitHub.
>

Ben Goodrich

unread,

Nov 8, 2015, 3:27:10 PM11/8/15

to stan development mailing list

On Sunday, November 8, 2015 at 3:10:49 PM UTC-5, Bob Carpenter wrote:

We should put some more thought into all of this. One idea we toyed
with is marking variables as ones to save or not.

which is basically what is done in rstan and has proven very useful.

However, it seems like a fair amount of the time, you want everything to be returned when debugging a model and a subset of everything after you get the model debugged. So, if we were going to mark variables within the Stan program, then it would be best if it could be saved conditional on a flag that is passed into the data block.

Ben

Bob Carpenter

unread,

Nov 8, 2015, 4:23:49 PM11/8/15

to stan...@googlegroups.com

That is a good point about putting this kind of thing in
the model directly. It's not really about the model per se
but about how it's used.

I had been thinking about something like a type modifier, as in:

parameters {
save(draws) real beta[N];
save(summary) real<lower=0> sigma;
real alpha;

I think at that point, if we allow these markings on the model block
top-level scope variables, we could get rid of the transformed parameters
block altogether.

Part of the original motivation was thinking like this:

parameters: parameters actually being sampled (on unconstrained scale)

transformed parameters: parameters that are either used as
arguments or used on the LHS of sampling statements
--- tension is that latter need Jacobian and former don't

local variables: just convenient things in loops so you don't
need to define global variables you don't want to save

generated quantities: anything derived from the other variables

So there's also the question of where scope goes. Even if we don't
save alpha in the above, is it going to be visible in generated
quantities?

- Bob

> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Michael Betancourt

unread,

Nov 8, 2015, 4:35:07 PM11/8/15

to stan...@googlegroups.com

I think scope is important here — one huge advantage of transformed parameters
is that they define a larger scope that encompasses both the model and generated
quantities block. Personally I like transformed parameters (it’s particularly nice
given how many auxiliary parameter model specifications we use).

Ben Goodrich

unread,

Nov 8, 2015, 4:42:45 PM11/8/15

to stan development mailing list

On Sunday, November 8, 2015 at 4:23:49 PM UTC-5, Bob Carpenter wrote:

I had been thinking about something like a type modifier, as in:

parameters {
save(draws) real beta[N];
save(summary) real<lower=0> sigma;
real alpha;

I'm saying we need something like

parameters {
  save(draws) real beta[N];


  save(if_else(debug, draws, nothing)) real<lower=0> sigma;
  real alpha;
}

and I think the summary option is unnecessary because we should store the means and variances of everything regardless.

Ben

Reply all

Reply to author

Forward