Scope of "model" block variables and access to them subsequently

950 views
Skip to first unread message

Rudolf Cardinal

unread,
Mar 9, 2013, 10:11:55 PM3/9/13
to stan-...@googlegroups.com
Dear all,

I've just come across Stan and am very impressed; thanks to all those who have developed it!

I'm trying to use it for reinforcement learning models in which some information is stored locally (in the "model" block) and used to calculate things on a trial-by-trial, cumulative basis. Stan seems potentially ideal for this, given that you can do imperative coding within the "model" block.

The Stan manual (v1.2.0) says (under "Variable scope") that "The variables declared in each block have scope over all subsequent statements. Thus a variable declared in the transformed data block may be used in the model block. But a variable declared in the generated quantities block may not be used in any earlier block, including the model block."

[There's one qualification I've noticed: variables in an earlier block are read-only for later blocks (with the exception of parameter sampling statements, which I'm aware are not the same sort of thing as conventional assignments).]

However, variables declared in the "model" block (even at its top level) are not in scope in the "generated quantities" block. Accordingly, if I want to retrieve values of my calculated probabilities of choosing the better stimulus on each trial ("p_best" in the skeleton code below), I can't retrieve it from the "model" block, and if I declare it in an earlier block, I can't write to it from the "model" block -- so my only option is to recalculate it within the "generated quantities" block -- which is somewhat inefficient since I had to calculate it anyway as part of the model. (Not as inefficient as it could be, as I understand that the "generated quantities" block is executed less often than the "model" block -- but it's still noticeable.)

Could "model" variables be put in read-only scope for the "generated quantities" block (clearly, with the caveat that only variables declared in the "generated quantities" should be reported back to e.g. RStan or the final output, or there'd be too much junk coming back in many cases -- but a simple copy would suffice, e.g.: generated quantities { real my_copy; my_copy <- my_model_var; })?

Or is there some other obvious way of retrieving posterior distributions of things calculated within the model block that I've not seen?

Simplest code to reproduce is:

    model {
        real modelthing;
        modelthing <- 1.0;
    }
    generated quantities {
        real thing;
        thing <- modelthing;
    }

... generates error:
        thing <- modelthing;
                ^-- here
DIAGNOSTIC(S) FROM PARSER:
variable "modelthing" does not exist.

More realistic skeleton code below.

Many thanks!

all the best,
Rudolf.

    data {
        int<lower=0> T; // number of trials in total
        int<lower=0> S; // number of subjects
        int<lower=0> G; // number of groups
        int<lower=1,upper=G> group_membership[S]; // which group is each subject in?
        int<lower=1,upper=S> subject[T]; // which subject?
        int<lower=1> discrimination[T]; // which discrimination?
        int<lower=0,upper=1> chose_best[T]; // did the subject choose the best stimulus (0 no, 1 yes)?
        int<lower=0,upper=1> reward[T]; // was the trial rewarded?
        // ...
    }
    parameters {
        // Group level, assuming separate means and shared SD
        real<lower=0,upper=1> group_mean_reinf_rate[G];
        real<lower=0> group_sd_reinf_rate;
        // ...
        
        // Subject level
        real<lower=0,upper=1> subject_reinf_rate[S];
        // ...
    }
    transformed parameters {
        // Constants
        real softmax_b;

        softmax_b <- 1.0;        
    }
    model {
        // What we will optimize
        real p_best[T]; // calculated probability of choosing the best stimulus
        
        // Keeping track of the current subject/discrimination
        int s; // current subject
        real reinf_rate;
        // ...
        
        // Value tracking
        vector[2] stim_value; // worst, best
        // ...

        // Specify priors
        group_mean_reinf_rate ~ beta(1.1, 1.1);
        group_sd_reinf_rate ~ cauchy(0, 5);
        // ...
        
        // Determine subject parameters
        for (subj in 1:S) {
            int g;
            g <- group_membership[subj];
            subject_reinf_rate[subj] ~ normal(group_mean_reinf_rate[g], group_sd_reinf_rate);
            // ...
        }

        // Iterate through trials
        s <- 0; // invalid value
        for (t in 1:T) {
        
            // Starting a new subject? Read subject parameters and reset discrimination.
            if (subject[t] != s) {
                s <- subject[t];
                reinf_rate <- subject_reinf_rate[s];
                // ...
            }

            // ...
            
            // Calculate p
            overall_value <- stim_value; // + ...
            // ... calculate softmax
            s_exp_products <- exp( (overall_value * softmax_b) - mean(overall_value) );
            p_best[t] <- s_exp_products[2] / sum(s_exp_products);

            // Update
            stim_value[chosen_index[t]] <- ((1 - reinf_rate) * stim_value[chosen_index[t]]) + (reinf_rate * reward[t]);
        }
        
        // Optimize fit
        chose_best ~ bernoulli(p_best);
    }
    generated quantities {
        // Group mean differences (in all cases: group 1 - group 2 = sham - lesion)
        real gmd_reinf_rate;

        gmd_reinf_rate <- group_mean_reinf_rate[1] - group_mean_reinf_rate[2];
    }


Bob Carpenter

unread,
Mar 9, 2013, 10:22:21 PM3/9/13
to stan-...@googlegroups.com
Answers inline.

On 3/9/13 5:11 PM, Rudolf Cardinal wrote:
> Dear all,
>
> I've just come across Stan and am very impressed; thanks to all those who have developed it!
>
> I'm trying to use it for reinforcement learning models in which some information is stored locally (in the "model"
> block) and used to calculate things on a trial-by-trial, cumulative basis. Stan seems potentially ideal for this, given
> that you can do imperative coding within the "model" block.
>
> The Stan manual (v1.2.0) says (under "Variable scope") that "*The variables declared in each block have scope over all
> subsequent statements.* Thus a variable declared in the transformed data block may be used in the model block. But a
> variable declared in the generated quantities block may not be used in any earlier block, including the model block."
>
> [There's one qualification I've noticed: variables in an earlier block are read-only for later blocks (with the
> exception of parameter sampling statements, which I'm aware are not the same sort of thing as conventional assignments).]

Right -- variables can only be assigned to (show up on the
left side of an "<-") in the block in which they are declared.

> However, *variables declared in the "model" block (even at its top level) are not in scope in the "generated quantities"
> block.* Accordingly, if I want to retrieve values of my calculated probabilities of choosing the better stimulus on each
> trial ("p_best" in the skeleton code below), I can't retrieve it from the "model" block, and if I declare it in an
> earlier block, I can't write to it from the "model" block -- so my only option is to recalculate it within the
> "generated quantities" block -- which is somewhat inefficient since I had to calculate it anyway as part of the model.
> (Not as inefficient as it could be, as I understand that the "generated quantities" block is executed less often than
> the "model" block -- but it's still noticeable.)

I'll try to clarify in the manual.

The variables in the model block are all local variables, which only persist
in the block in which they are defined.

To define a variable you need in the model block, use the
transformed parameter block.

> *Could "model" variables be put in read-only scope for the "generated quantities" block* (clearly, with the caveat that
> only variables declared in the "generated quantities" should be reported back to e.g. RStan or the final output, or
> there'd be too much junk coming back in many cases -- but a simple copy would suffice, e.g.: *generated quantities {
> real my_copy; my_copy <- my_model_var; }*)?

The could be, but they're not going to be :-)

> Or is there some other obvious way of retrieving posterior distributions of things calculated within the model block
> that I've not seen?

Yes, put them in the transformed parameter block.

> Simplest code to reproduce is:
>
> model {
> real modelthing;
> modelthing <- 1.0;
> }
> generated quantities {
> real thing;
> thing <- modelthing;
> }
>
> ... generates error:
> thing <- modelthing;
> ^-- here
> DIAGNOSTIC(S) FROM PARSER:
> variable "modelthing" does not exist.
>
> More realistic skeleton code below.

In this case, you'd do this:

transformed parameters {
real modelthing;
modelthing <- 1.0;
}

Then modelthing is available in the model and will
also be printed.

If the variable being defined does not depend on any parameters (implicitly,
this includes other transformed parameters or local variables), as
in this example, then it's more efficient to define it in the transformed data block.

- Bob

Rudolf Cardinal

unread,
Mar 10, 2013, 5:21:46 PM3/10/13
to stan-...@googlegroups.com
Many thanks! With local blocks to be able to define integer locals within the "transformed parameters" block, that works perfectly. Thank you!
all the best,
Rudolf.
Reply all
Reply to author
Forward
0 new messages