working with rstan output

5,200 views
Skip to first unread message

Daniel Hocking

unread,
Oct 17, 2012, 11:58:12 PM10/17/12
to stan-...@googlegroups.com
I am currently going through Marc Kery's book, "Introduction to WinBUGS for Ecologists" and trying to translate all of the WinBUGS code into stan (using rstan). It's my way of learning stan so the current code might not be the most efficient (e.g. I'm starting out with BUGS-like loops instead of vectorized functions). My primary question is how best to use the posterior samples to calculate Bayesian p-values, posterior predictive distributions, and various goodness of fit metrics.

For example, in Chapter 8 there is a simple linear regression assuming normality. The R and WinBUGS (using R2WinBUGS) code is attached including data generation. In the model statement below, you can see that he generates new data and calculates some derived parameters within the WinBUGS code.

### 8.4.1. Fitting the model
# Write model
sink
("linreg.txt")
cat
("
model {
   
    # Priors
    alpha ~ dnorm(0,0.001)
    beta ~ dnorm(0,0.001)
    sigma ~ dunif(0, 100)
   
    # Likelihood
    for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha + beta*x[i]
    }
   
    # Derived quantities
    tau <- 1/ (sigma * sigma)
    p.decline <- 1-step(beta)        # Probability of decline
   
    # Assess model fit using a sums-of-squares-type discrepancy
    for (i in 1:n) {
    residual[i] <- y[i]-mu[i]        # Residuals for observed data
    predicted[i] <- mu[i]        # Predicted values
    sq[i] <- pow(residual[i], 2)    # Squared residuals for observed data
   
    # Generate replicate data and compute fit stats for them
    y.new[i] ~ dnorm(mu[i], tau) # one new data set at each MCMC iteration
    sq.new[i] <- pow(y.new[i]-predicted[i], 2)    # Squared residuals for new data
    }
    fit <- sum(sq[])            # Sum of squared residuals for actual data set
    fit.new <- sum(sq.new[])        # Sum of squared residuals for new data set
    test <- step(fit.new - fit)        # Test whether new data set more extreme
    bpvalue <- mean(test)            # Bayesian p-value
}
    "
,fill=TRUE)
sink
()


He then uses these new data to calculate and plot various assessments of model fit as such:


### 8.4.2. Goodness-of-Fit assessment in Bayesian analyses
plot
(out$mean$predicted, out$mean$residual, main = "Residuals vs. predicted values",
     las
= 1, xlab = "Predicted values", ylab = "Residuals")
abline
(h = 0)

lim
<- c(0, 3200)
plot
(out$sims.list$fit, out$sims.list$fit.new, main = "Graphical posterior predictive check",
     las
= 1, xlab = "SSQ for actual data set", ylab = "SSQ for ideal (new) data sets", xlim = lim, ylim = lim)
abline
(0, 1)

mean
(out$sims.list$fit.new > out$sims.list$fit)    # Bayesian p-value


### 8.4.3. Forming predictions
predictions
<- array(dim = c(length(x), length(out$sims.list$alpha)))
for(i in 1:length(x)){
  predictions
[i,] <- out$sims.list$alpha + out$sims.list$beta*i
}
LPB
<- apply(predictions, 1, quantile, probs = 0.025) # Lower bound
UPB
<- apply(predictions, 1, quantile, probs = 0.975) # Upper bound



I am wondering the best way to calculate these same parameters and how to derive the new data using rstan? I have the following rstan code that runs the model but I am not sure how to best generate the new data and extract the parameter estimates from the stan model fit to calculate the fit metrics. I'm not sure how much would be easier to do within stan (generated quantities) or back in R (I am functionally literate in R and illiterate in C++). Any suggestions would be appreciated. I hope this question is not too vague and that I didn't dump too much code in this post.

# for 64 bits RStudio
Sys.setenv(R_ARCH = '/x86_64')
library
(inline)
library
(Rcpp)
library
(rstan)
file
.path(R.home(component = 'etc'), .Platform$r_arch, 'Makeconf')
set_cppo
(mode = "fast")

m8_4_stan
<- '
data{
  int n;
  real y[n];
  real x[n];
}

parameters{
  real alpha;
  real beta;
  real<lower=0, upper=100> sigma;
}

transformed parameters{
    real mu[n];
    for(i in 1:n){
    mu[i] <- alpha + beta*x[i];
  }
}

model{
  //Priors
  alpha ~ normal(0, 100);
  beta ~normal(0, 100);
  sigma ~ uniform(0, 100);
 
  //Likelihood
  for(i in 1:n){
    y[i] ~ normal(mu[i], sigma);
}
}

generated quantities{
  real tau;
    real residual[n];
  real predicted[n];
  real sq[n];
  //real y_new[n];
  //real sq_new[n];

  // Assess model fit using a sums of squares type discrepancy
  for(i in 1:n){
    residual[i] <- y[i] - mu[i];
    predicted[i] <- mu[i];
    sq[i] <- pow(residual[i], 2); // Squared residuals for observed data
    // Generate replicate data and compute fit stats for them - how to do in Stan?
    //y_new[i] ~ normal(mu[i], sigma); // One new data set at each MCMC iteration
    //sq_new[i] <- pow(y_new[i] - predicted[i], 2); // Squared residuals for new data
  }
  tau <- 1/(sigma*sigma);
// compute fit statistics back in R
}
'


m8_4_data
<- list(n = n, x = x, y = y)

system
.time(m8_4_stanfit <- stan(model_code = m8_4_stan,
                                     data
= m8_4_data,
                                     iter
= 10000,
                                     chains
= 3,
                                     thin
= 1,
                                     warmup
= 5000)) # 30.7, 2.6, 36.3

print(m8_4_stanfit, dig = 3)

# Extract estimates
mu
.est <- mean(extract(m8_4_stanfit, pars = 'mu', inc_warmup = FALSE))
sigma
.est <- mean(extract(m8_4_stanfit, pars = 'sigma', inc_warmup = FALSE))
residual
.est <- (extract(m8_4_stanfit, pars = 'residual', inc_warmup = FALSE))

mu_est
for(i in 1:n){
  mu_est
[i] <- mean(extract(m8_4_stanfit, pars = 'mu[i]', inc_warmup = FALSE)) # doesn't work - can't index mu
}
mu
.1 <- (extract(m8_4_stanfit, pars = 'mu[1]', inc_warmup = FALSE))

# Generate new dataset from MCMC estimates
y
.new <- rnorm(n, mu.1, sigma.est) # not sure how to do this for all values of mu (should be n = 16)
sq
.new <- pow(Resid, 2)







Ben Goodrich

unread,
Oct 18, 2012, 2:11:06 AM10/18/12
to stan-...@googlegroups.com
On Wednesday, October 17, 2012 11:58:12 PM UTC-4, Daniel Hocking wrote:
I am currently going through Marc Kery's book, "Introduction to WinBUGS for Ecologists" and trying to translate all of the WinBUGS code into stan (using rstan).

You are the second person to mention that book recently. Maybe you can get together with Shige on stan-users. When you are done, it would be great if you could post your translations somewhere or email them to us or maybe even include them in a future version of Stan.
 
I am wondering the best way to calculate these same parameters and how to derive the new data using rstan? I have the following rstan code that runs the model but I am not sure how to best generate the new data and extract the parameter estimates from the stan model fit to calculate the fit metrics. I'm not sure how much would be easier to do within stan (generated quantities) or back in R (I am functionally literate in R and illiterate in C++). Any suggestions would be appreciated. I hope this question is not too vague and that I didn't dump too much code in this post.

No problem, although we would prefer code as attachments unless you are trying to focus on a few lines of code. We always planned for Stan to support the following
 
generated quantities{
  real tau;
  real residual[n];
  real predicted[n];
  real sq[n];
  real y_new[n];

  real sq_new[n];

  // Assess model fit using a sums of squares type discrepancy
  for(i in 1:n){
    residual[i] <- y[i] - mu[i];
    predicted[i] <- mu[i];
    sq[i] <- pow(residual[i], 2); // Squared residuals for observed data
    // Generate replicate data and compute fit stats for them - how to do in Stan?
    y_new[i] ~ normal_rand(mu[i], sigma); // One new data set at each MCMC iteration DOES NOT ACTUALLY WORK YET

    sq_new[i] <- pow(y_new[i] - predicted[i], 2); // Squared residuals for new data
  }
  tau <- 1/(sigma*sigma);
// compute fit statistics back in R
}


but we had to take support for things like y_new[i] ~ normal_rand(mu[i], sigma); out of Stan after v1.0.0 for some reason that I don't remember. Look for it to return soon, but probably not in v1.0.3 .

So, you would have to do it In R for now. If you replace these lines

mu_est
for (i in 1:n) {

  mu_est
[i] <- mean(extract(m8_4_stanfit, pars = "mu[i]", inc_warmup = FALSE))
}

with

mu_est <- rep(NA, n)

for (i in 1:n) {

  mu_est
[i] <- mean(extract(m8_4_stanfit, pars = paste0("mu[", i, "]"), inc_warmup = FALSE))
}

it would work. But that only gives you Bayesian point estimates.

If you want to average over the whole posterior, I would work with the whole array.

posterior <- extract(m8_4_stanfit, inc_warmup = FALSE)
dim(posterior) # 5000 iterations, by 3 chains, by 20 parameters / transformed parameters / generated quantities
dimnames(posterior)[[3]]
 [1] "alpha"  "beta"   "sigma"  "mu[1]"  "mu[2]"  "mu[3]"  "mu[4]"  "mu[5]"  "mu[6]"  "mu[7]"  "mu[8]"  "mu[9]"
[13] "mu[10]" "mu[11]" "mu[12]" "mu[13]" "mu[14]" "mu[15]" "mu[16]" "lp__"  

y_rep
<- apply(posterior, MARGIN = 1:2, FUN = function(draw) {
  rnorm
(n, mean = draw[grepl("^mu", names(draw))], sd = draw["sigma"])
})
dim
(y_rep) # 16 replicates by 5000 iterations by 3 chains
error_rep
<- y - y_rep # same dimensions
mean
(error_rep^2) # somewhat greater than sigma2

and so forth.

Ben

Bob Carpenter

unread,
Oct 18, 2012, 4:35:51 AM10/18/12
to stan-...@googlegroups.com

On 10/17/12 11:58 PM, Daniel Hocking wrote:

> I am currently going through Marc Kery's book, "Introduction to WinBUGS for Ecologists"
> and trying to translate all of
> the WinBUGS code into stan (using rstan).

Fantastic. I hope you can share it on a web site
or something -- we can certainly supply space and
distribution for models.

Also, if you have time, we'd love to hear about what
you found hard or easy to use, and what features you'd
like to see. We learned a whole lot when we coded the
BUGS examples, but we weren't trying to do posterior
inference, just ensure we got the same results as JAGS
did.

> It's my way of learning stan so the current code might not be the most
> efficient (e.g. I'm starting out with BUGS-like loops instead of vectorized functions).

Starting with code that works and then making it
more efficient is MUCH easier than the other way
around.

Vectorizing is pretty easy to do after the fact, too.

> My primary question is how best
> to use the posterior samples to calculate Bayesian p-values, posterior predictive distributions, and various goodness of
> fit metrics.

You can calculate posterior inferences based on
parameters and data in the generated quantities block
from within Stan or you can calculate in R with
the results. RStan is set up just like R2WinBUGS
and R2jags, so examples of posterior analysis using
those packages will transfer directly to RStan.

The advantage of doing it within Stan is that you can
define new random variables based on parameters and
treat them like their natural types. If you do the
work back in R, you have to iterate over the vector
yourself (though R makes that pretty easy in many cases).

You can define new variables in the generated quantities
block the way you'd define deterministic nodes in BUGS.
But if you need them in the model, they have to be
transformed parameters. Both generated quantities and
transformed parameters get printed.

(The reason to use generated quantities instead of transformed
parameters is that it's much more efficient -- the values are double
data types instead of auto-dif for gradients
at that point and we only compute generated quantities once per
iteration, not once per leapfrog step. It's also clearer about
what's part of the model and what's posterior/transform analysis.)
The derived quantities that are not used in the model
should go in derived quantities block. But for transforms
of parameters like tau (precision), you need to use
a transformed parameter in Stan. Of course, in Stan,
you don't need the precision, so it could be moved out
anyway if you even care about it.

> He then uses these new data to calculate and plot various assessments of model fit as such:
>
> |
>
> ### 8.4.2. Goodness-of-Fit assessment in Bayesian analyses
> plot(out$mean$predicted, out$mean$residual, main = "Residuals vs. predicted values",
> las = 1, xlab = "Predicted values", ylab = "Residuals")
> abline(h = 0)
>
> lim <- c(0, 3200)
> plot(out$sims.list$fit, out$sims.list$fit.new, main = "Graphical posterior predictive check",
> las = 1, xlab = "SSQ for actual data set", ylab = "SSQ for ideal (new) data sets", xlim = lim, ylim = lim)
> abline(0, 1)
>
> mean(out$sims.list$fit.new > out$sims.list$fit) # Bayesian p-value
>
>
> ### 8.4.3. Forming predictions
> predictions <- array(dim = c(length(x), length(out$sims.list$alpha)))
> for(i in 1:length(x)){
> predictions[i,] <- out$sims.list$alpha + out$sims.list$beta*i
> }
> LPB <- apply(predictions, 1, quantile, probs = 0.025) # Lower bound
> UPB <- apply(predictions, 1, quantile, probs = 0.975) # Upper bound

Anything collective like LPB and UPB need to be calculated
back in R because everything in Stan works on a draw-by-draw
basis.

For instance, even posterior means need to be calculated back in R.

So I guess the basic rule is:

-- If it's a draw-by-draw operation on parameters/data, calculate
in generated quantities

-- If it's an operation over the entire sample of many draws,
it has to be done back in R

> I am wondering the best way to calculate these same parameters and how to derive the new data using rstan? I have the
> following rstan code that runs the model but I am not sure how to best generate the new data and extract the parameter
> estimates from the stan model fit to calculate the fit metrics. I'm not sure how much would be easier to do within stan
> (generated quantities) or back in R (I am functionally literate in R and illiterate in C++).

You shouldn't need C++ for this.

> Any suggestions would be
> appreciated. I hope this question is not too vague and that I didn't dump too much code in this post.

More code is better than less. As long as it's set off,
it's easy to skim. If it's not there, I just wind up
guessing.

- Bob

Bob Carpenter

unread,
Oct 18, 2012, 4:49:01 AM10/18/12
to stan-...@googlegroups.com


On 10/18/12 2:11 AM, Ben Goodrich wrote:
> On Wednesday, October 17, 2012 11:58:12 PM UTC-4, Daniel Hocking wrote:
...
> No problem, although we would prefer code as attachments
> unless you are trying to focus on a few lines of code.

I'd prefer code in the message body if it's not something I need
to run and attachments if it's something I'm going to try to compile.
The goal in both cases is to minimize fiddling on my part.

> We
> always planned for Stan to support the following
...
> but we had to take support for things like y_new[i] ~ normal_rand(mu[i], sigma); out of Stan after v1.0.0 for some
> reason that I don't remember.

The reason was that it was in the doc but
never implemented! I have the framework for
the implementation now, but when that's done, we
still need to integrate the samplers for all the
different distributions. The technical hurdle is
getting the seeded RNG down to the generated quantities
block. The compilation and code generation will be
simple.

> Look for it to return soon, but probably not in v1.0.3 .

Not 1.0.3, but it's near the top of my to-do list.

For now, you have to define things like the above as
parameters -- when we have random number generators
in generated quantities it'll be much much more
efficient to do them there.

Or you could do it in R ...

> So, you would have to do it In R for now. If you replace these lines

- Bob

Richard McElreath

unread,
Oct 18, 2012, 11:33:48 AM10/18/12
to stan-...@googlegroups.com
I do a lot of simulating from the posterior, and what has simplified a lot of my R code for doing that is to convert the samples in the rstan object into a list of arrays. This is useful, because a matrix of varying effects samples could have dimension (200,9,10000), i.e. 200 clusters, 9 effects, and 10k samples. Simulating across these samples is inconvenient with rstan results, because the samples are stored in a "flat" format in which every parameter is a separate vector, not an element of an array. So it's hard to use apply or even merely loop over parameter indices. I got spoiled by mcarray output from rjags, perhaps.

Here's the function I use to make rstan samples into a list of arrays:

stripstan <- function( object , pars=NULL , permuted=TRUE , ... ) {
    result <- list()
    if ( is.null(pars) )
        pseq <- 1:length(object@par_dims)
    else
        pseq <- pars
    for ( i in pseq ) {
        name <- names(object@par_dims)[i]
        result[[ name ]] <- extract( object , name , permuted=permuted , ... )[[1]]
    }
    result
}

That will produce a list of arrays, so I can then use apply to marginalize or for to loop over parameters as needed, using vectorized code to apply calculations to all samples.

This isn't a direct answer to your question, but it's a bit of infrastructure that has saved me a lot of awkward extract() fiddling.

Daniel Hocking

unread,
Oct 18, 2012, 12:18:33 PM10/18/12
to stan-...@googlegroups.com
Thank you Ben and Bob, the general discussion is very helpful. I will be happy to share my rstan code for Kery's WinBUGS examples, although they may be better in combination with Shige since I am still quite a novice in R as well as BUGS and Stan.

I would also be happy to share my experiences to help you improve Stan and rstan for end users like myself. I'd imagine it quite hard to foresee the challenges that users would have because the inner works make sense when you write the code (or even have the ability to write programs like this). I think the Stan manual is quite good. The only suggestion that I can think of would be to include (or as a separate quick start guide) a skeleton of the different parts of stan code like:

# for 64 bits RStudio
Sys.setenv(R_ARCH = '/x86_64')
library
(inline)
library
(Rcpp)
library
(rstan)
file
.path(R.home(component = 'etc'), .Platform$r_arch, 'Makeconf')
set_cppo
(mode = "fast")


model_code
<- '
data{
// Data that will be imported for use in the model should be defined here
}

parameters{
// parameters that will be used in the model should be defined here
// These will be saved and printed with the output
}

transformed parameters{
// Generation of derived parameters from parameters above should be defined here
// These will be saved and printed with the output
}

model{
// This is area for describing the model including the priors and likelihood
// This is the only place where sampling can be done within Stan
// All data and parameters used in this section must be defined above
}

generated quantities{
// This section is for derived parameters not used in the model
// If not used in the model it is more efficient to include them here rather than in the transformed parameters
// These values will be save and printed along with parameters and transformed parameters
}
'


stan_data
<- list()

stanfit
<- stan(file, model_name = "anon_model", model_code = "",
     fit
= NA, data = list(), pars = NA, chains = 4,
     iter
= 2000, warmup = floor(iter/2), thin = 1,
     init
= "random", seed = sample.int(.Machine$integer.max, 1),
     sample_file
, save_dso = TRUE,
     verbose
= FALSE, ...,
     boost_lib
= NULL,
     eigen_lib
= NULL)

print(stanfit)

posterior
<- extract(stanfit)

It took a while for me to find all of this information when first wanting to use rstan and I think some of the information about what goes in each section like you described in your initial response to me would be helpful for people starting out. Also in the help file associated with ?rstan, there is no list of functions such as extract. I'm not even sure where I ended up finding that there was an extract function, but it wasn't the most obvious.

There seems to be sufficient information for running all the models I've tried but more information for post processing would be very helpful. The only other thing that I can think of that has been confusing is lp_. There is a little about it in the manual but more information would be useful.

Back to my example, I may have been generous when I claimed functional literacy in R because I am not sure I fully grasp your method of extracting and using the posterior samples


y_rep
<- apply(posterior, MARGIN = 1:2, FUN = function(draw) {
  rnorm
(n, mean = draw[grepl("^mu", names(draw))], sd = draw["sigma"])
})

I can understand what it does in the end but I can't figure it out enough to use it myself in other circumstances. For example, how would I go about creating a vector of the mean mu values if I wanted to make plots similar to Kery:


### 8.4.2. Goodness-of-Fit assessment in Bayesian analyses
plot
(out$mean$predicted, out$mean$residual, main = "Residuals vs. predicted values",
 las
= 1, xlab = "Predicted values", ylab = "Residuals")
abline
(h = 0)

lim
<- c(0, 3200)
plot
(out$sims.list$fit, out$sims.list$fit.new, main = "Graphical posterior predictive check",
 las
= 1, xlab = "SSQ for actual data set", ylab = "SSQ for ideal (new) data sets", xlim = lim, ylim = lim)
abline
(0, 1)

mean
(out$sims.list$fit.new > out$sims.list$fit) # Bayesian p-value

The current for of the extract function seems like it's difficult to manipulate and handle the posteriors for post processing, at least for novices such as myself.

Thanks again!
Dan

Daniel Hocking

unread,
Oct 18, 2012, 12:20:25 PM10/18/12
to stan-...@googlegroups.com
Thank you, I think this will really help with what I'm trying to do!

Ben Goodrich

unread,
Oct 18, 2012, 12:42:00 PM10/18/12
to stan-...@googlegroups.com
On Thursday, October 18, 2012 12:18:33 PM UTC-4, Daniel Hocking wrote:
Thank you Ben and Bob, the general discussion is very helpful. I will be happy to share my rstan code for Kery's WinBUGS examples, although they may be better in combination with Shige since I am still quite a novice in R as well as BUGS and Stan.

I would also be happy to share my experiences to help you improve Stan and rstan for end users like myself.

Thanks. We'll probably adjust the manual along the lines of what you suggested.
 
There seems to be sufficient information for running all the models I've tried but more information for post processing would be very helpful. The only other thing that I can think of that has been confusing is lp_. There is a little about it in the manual but more information would be useful.

It is just the value of the log-posterior evaluated at the realization of the parameters.
 
Back to my example, I may have been generous when I claimed functional literacy in R because I am not sure I fully grasp your method of extracting and using the posterior samples


y_rep
<- apply(posterior, MARGIN = 1:2, FUN = function(draw) {
  rnorm
(n, mean = draw[grepl("^mu", names(draw))], sd = draw["sigma"])
})

I can understand what it does in the end but I can't figure it out enough to use it myself in other circumstances. For example, how would I go about creating a vector of the mean mu values if I wanted to make plots similar to Kery:

I can't actually make those plots because I don't have WinBUGS, but I can guess what they look like. Richard had a nice suggestion about using lists of arrays. In your case, I think a big array is fine, but you just need to know how to manipulate it.

If you wanted a vector of 16 values where the i-th element was mu[i] averaged over the 5000 iterations from each of 3 chains, I would do

mu_means <- apply(posterior[,,grepl("^mu",dimnames(posterior)[[3]])], 3, mean)

In your case, posterior is 5000 iterations by 3 chains by 69 parameters. So, grepl("^mu",dimnames(posterior)[[3]])] extracts the parameters whose first two letters are mu, leaving a 5000 by 3 by 16 array. The apply() command applies the specified function to the specified dimension of the resulting array, in this case the mean function to the third dimension. Thus, mu_means is 

> as.matrix(mu_means)
           [,1]
mu
[1]  41.98327
mu
[2]  40.31490
mu
[3]  38.64653
mu
[4]  36.97816
mu
[5]  35.30979
mu
[6]  33.64142
mu
[7]  31.97306
mu
[8]  30.30469
mu
[9]  28.63632
mu
[10] 26.96795
mu
[11] 25.29958
mu
[12] 23.63121
mu
[13] 21.96284
mu
[14] 20.29447
mu
[15] 18.62611
mu
[16] 16.95774

Ben
 

Bob Carpenter

unread,
Oct 18, 2012, 12:48:01 PM10/18/12
to stan-...@googlegroups.com


On 10/18/12 12:42 PM, Ben Goodrich wrote:
> On Thursday, October 18, 2012 12:18:33 PM UTC-4, Daniel Hocking wrote:

> There seems to be sufficient information for running all the models I've tried but more information for post
> processing would be very helpful. The only other thing that I can think of that has been confusing is lp_. There is
> a little about it in the manual but more information would be useful.
>
>
> It is just the value of the log-posterior evaluated at the realization of the parameters.

In the output, lp__ is the accumulated total unnormalized
log probability. By Bayes's rule, with theta params and
y data, we have

p(theta|y) oc p(theta,y) oc exp(lp__)

At any given point in the program's execution, the value
of lp__ is the log probability accumulated up to that point
in the program's execution. That's why you can add to it.

Writing

y ~ normal(mu,sigma);

is really just syntactic sugar for

lp__ <- lp__ + normal_log(y,mu,sigma);

- Bob


Bob Carpenter

unread,
Oct 18, 2012, 1:14:15 PM10/18/12
to stan-...@googlegroups.com
Should this go in what the R folks call a "vignette"?

I'm thinking something like a sequence of
hands-on case studies aimed at a user who's
not super-expert at R and hasn't necessarily
used R2WinBUGS or R2jags. It'd give people
basic project skeletons to start with and
some idea of how to organize their code, data,
model files, etc.

If we started very gently and did a halfway decent
job of this, we could probably publish it as a textbook
with a name such as "Practical Bayesian Inference
with Stan and R". The manual we already have is partway
down that path already.

I'll add more to the manual section "Overview of
Stan's Program Blocks". That section of the manual
is more about what Stan does than how you use it.
I tried to explain that in the "Statistical
Variable Taxonomy" section, but I agree that something
like what Daniel suggests is a good addition.

I'd be happy to help with the structure, but
we'd have to design whether to go with Ben's
expert-level code, something simpler and more
loop oriented, or both. I'd love to see both,
with the simple version first, then the
bind-and-apply versions, because it'd help me
learn R better.

The list of RStan functions is available by following
the "Index" link at the bottom of the rstan help
page. I had to hunt for that myself, still being a
novice R user.

- Bob
> --
>
>

Ben Goodrich

unread,
Oct 18, 2012, 1:59:05 PM10/18/12
to stan-...@googlegroups.com
On Thursday, October 18, 2012 1:14:18 PM UTC-4, Bob Carpenter wrote:
Should this go in what the R folks call a "vignette"?

Yes, but we should implement foo_rand() inside Stan first.

I'd be happy to help with the structure, but
we'd have to design whether to go with Ben's
expert-level code, something simpler and more
loop oriented, or both.  I'd love to see both,
with the simple version first, then the
bind-and-apply versions, because it'd help me
learn R better.

We can do both, but what I think would be most useful would be a function that hides the details of apply(). Like

stan_apply <- 
  function(fit, dimensions = c("parameters", "chains","iterations"),
           pars = fit@sim$pars_oi, inc_warmup = FALSE, FUN, ...) {
    
    dimensions <- match.arg(dimensions, several.ok = TRUE)
    if(length(dimensions) == 3) {
      stop("the length of 'dimensions' must be either 1 or 2")
    }
    MARGIN <- sapply(dimensions, switch, iterations = 1L,
                     chains = 2L, parameters = 3L)
    return(apply(extract(fit, pars = pars, inc_warmup = inc_warmup), 
                 MARGIN = MARGIN, FUN = FUN, ...))
  }

As in

> oxford <- stan_demo("oxford")
> IQRs <- stan_apply(oxford, "parameters", FUN = IQR)
> length(IQRs)
[1] 246
> head(IQRs)
    mu[1]     mu[2]     mu[3]     mu[4]     mu[5]     mu[6] 
0.9217132 0.6696871 0.7633586 0.6213006 0.5682908 1.1354033 
> quantiles <- stan_apply(oxford, "parameters", FUN = quantile, probs = c(.25, .75))
> dim(quantiles)
[1]   2 246
> diff(quantiles[,1:6])
        mu[1]     mu[2]     mu[3]     mu[4]     mu[5]    mu[6]
75% 0.9217132 0.6696871 0.7633586 0.6213006 0.5682908 1.135403


 
The list of RStan functions is available by following
the "Index" link at the bottom of the rstan help
page.  I had to hunt for that myself, still being a
novice R user.

That is actually an index of everything, including class definitions and things that might not be of interest. To get a list of just the functions that rstan exports, execute

> ls("package:rstan")
 [1] "extract"             "get_adaptation_info" "get_cppcode"         "get_cppo"            "get_cppo_mode"      
 [6] "get_cxxflags"        "get_inits"           "get_logposterior"    "get_sampler_params"  "get_seed"           
[11] "get_seeds"           "get_stancode"        "get_stanmodel"       "makeconf_path"       "plot"               
[16] "print"               "read_rdump"          "rstan_options"       "sampling"            "set_cppo"           
[21] "sflist2stanfit"      "show"                "stan"                "stanc"               "stan_demo"          
[26] "stan_model"          "stan_rdump"          "stan_version"        "summary"             "traceplot"          

Ben

Daniel Hocking

unread,
Oct 19, 2012, 10:06:20 PM10/19/12
to stan-...@googlegroups.com
Thanks everyone for the help with processing rstan output. This has been extremely helpful for learning stan, plus I learned some new useful R functions!

I have finished translating the BUGS example I was working on into Stan but I get a different result for the derived sum of squared residuals. I seem to get similar parameter estimates for the BUGS and Stan models so I think this is another post-processing issue (sorry if this isn't really what this forum is intended for).

In WinBUGS/JAGS the predictive and realized sums-of-squares discrepancies are calculated in WinBUGS, while I am doing it in R from Stan output (because new derived data can't be generated in Stan as noted in the earlier post). In WinBUGS the code is:

    # Assess model fit using a sums-of-squares-type discrepancy
   
for (i in 1:n) {
    residual
[i] <- y[i]-mu[i]        # Residuals for observed data
    predicted
[i] <- mu[i]        # Predicted values
    sq
[i] <- pow(residual[i], 2)    # Squared residuals for observed data
   
   
# Generate replicate data and compute fit stats for them
    y
.new[i] ~ dnorm(mu[i], tau) # one new data set at each MCMC iteration
    sq
.new[i] <- pow(y.new[i]-predicted[i], 2)    # Squared residuals for new data
   
}
    fit
<- sum(sq[])            # Sum of squared residuals for actual data set
    fit
.new <- sum(sq.new[])        # Sum of squared residuals for new data set
    test
<- step(fit.new - fit)        # Test whether new data set more extreme
    bpvalue
<- mean(test)            # Bayesian p-value

From the rstan output I tried to calculate the new data and SS residuals using the following code in R
# Extract Entire posterior (for all parameters) - 3 chains
posterior
<- extract(m8_4_stanfit, inc_warmup = FALSE, permute = FALSE)

# Generate new data

y_rep
<- apply(posterior, MARGIN = 1:2, FUN = function(draw) {
  rnorm
(n, mean = draw[grepl("^mu", names(draw))], sd = draw["sigma"])
})


dim
(y_rep) # 16 replicates by 5000 iterations by 3 chains
error_rep
<- y - y_rep # y = 16 replications
mean
(error_rep^2) # far greater than sigma2 (92 vs true = 25) - also differs from estimated sigma2 of ~40

# alternative way to extract mu MCMC estimates
mu_rep
<- apply(posterior, MARGIN = 1:2, FUN = function(draw) {
  mu1
= draw[grepl("^mu", names(draw))]
})

# Fit statistics that are done within WinBUGS
residual
<- y - y_rep
sq
<- residual^2
sq
.new <- (y_rep - mu_rep)^2

for(i in 1:5000){
    fit
[i] <- sum(sq[ , i, 1]) # Not sure how to get this to loop right for 3 chains - might be easier with permuted extract function
 
}

for(i in 1:5000) fit.new[i] <- sum(sq.new[, i, 1])

# Posterior predictive distributions and bayesian p-values
test
<- ifelse(test = (fit.new - fit) > 0, yes=1, no = 0) # Test whether new data set more extreme
(bpvalue <- mean(test)) # Bayesian p-value = 0.0358 indicates poor fit (should be ~0.5)

plot
(fit, fit.new)
abline
(0, 1) # Also shows poor fit - bias of idealized (new) fit data being lower than fit data. The SSQ are also MUCH larger than those found in WinBUGS. Seems like I did something incorrectly since the summary output was the same for WinBUGS and Stan.



I'm not sure why but my residuals are much larger using my stan estimates than WinBUGS/JAGS and they are biased resulting in a poor fit as indicated by the posterior predictive check (plot) and Bayesian p-value. Does anyone see anything I did incorrectly in my post-processing R code?

Thanks for any suggestions,
Dan

P.S. The full R files for the Stan and JAGS code would be attached but I get "An error (#340) occurred while communicating with the server" and I can't post on the google group with an attachment.


Ben Goodrich

unread,
Oct 19, 2012, 11:17:54 PM10/19/12
to stan-...@googlegroups.com
On Friday, October 19, 2012 10:06:20 PM UTC-4, Daniel Hocking wrote:
P.S. The full R files for the Stan and JAGS code would be attached but I get "An error (#340) occurred while communicating with the server" and I can't post on the google group with an attachment.

Google Groups (sometimes) objects to attachments with obscure extensions like .stan and .R, which is annoying. If you make your text files have a .txt extension, then it works.

Ben

Daniel Hocking

unread,
Oct 19, 2012, 11:55:40 PM10/19/12
to stan-...@googlegroups.com
Thanks Ben - the files are attached now as text documents (1 JAGS, 1 Stan).
Chp8_stan.txt
Chp8_JAGS.txt

Ben Goodrich

unread,
Oct 20, 2012, 4:58:16 AM10/20/12
to stan-...@googlegroups.com
Hi Daniel,


On Friday, October 19, 2012 10:06:20 PM UTC-4, Daniel Hocking wrote:

This is the error. To correspond to your WinBUGS code it should be

residual <- y - mu_rep # although mu_rep is not really a "replicate" of anything
 
From this point, you can just do

sq <- residual^2
sq
.new <- (y_rep - mu_rep)^2

dim
(sq) # 16 observations by 5000 iterations by 3 chains
fit
<- apply(sq, 2:3, sum) # yields a 5000 by 3 matrix
fit
.new <- apply(sq.new, 2:3, sum)
bpvalue
<- mean(sign(fit.new - fit)) # I got 0.085

Ben

Daniel Hocking

unread,
Oct 20, 2012, 9:40:28 AM10/20/12
to stan-...@googlegroups.com
That's embarrassing, I was so worried about it being a problem with extracting the right information that I missed a simple mistake. Thanks again for all your help on this!
-Dan

Jiqiang Guo

unread,
Oct 20, 2012, 9:57:41 PM10/20/12
to stan-...@googlegroups.com
A couple of questions regarding this function:

1. this function would not work when permuted = FALSE, as in latter case, the returned object is not a list, but an array. 
2. when permuted = TRUE, this function seems to do just what extract(..., permuted = TRUE, ) does.  

--
Jiqiang

Daniel Hocking

unread,
Oct 21, 2012, 12:10:30 AM10/21/12
to stan-...@googlegroups.com
I have another question about working with rstan output. Shige's question about Stan's trouble with large datasets made me want to try some examples (using a similar version of the code he had, I had the same problem of very small n_eff and huge Rhat when using a large dataset of n=100,000). I decided to try increasing the size of the dataset for another problem I was working on. It is the same simple linear regression that I've been posting about in this tread. The main part of the model is:

transformed parameters
{
    real mu
[n];
   
for(i in 1:n){
    mu
[i] <- alpha + beta*x[i];
 
}
}

model
{
 
//Priors
  alpha
~ normal(0, 100);

  beta
~ normal(0, 100);

  sigma
~ uniform(0, 100);
 
 
//Likelihood
 
for(i in 1:n){
    y
[i] ~ normal(mu[i], sigma);
}

When my sample size (n) is 10 or even 100, it's not a big deal to print stanfit. However, when I increase n to 1000 or more, I can't realistically print the stan model fit (stanfit) because it will include a mu[i] for every n. I know I could get around this with the code:

model{
 
//Priors
  alpha
~ normal(0, 100);

  beta
~ normal(0, 100);

  sigma
~ uniform(0, 100);
 
 
//Likelihood
 
for(i in 1:n){

    y
[i] ~ normal(alpha + beta*x[i], sigma);
}

That way there is no mu since I am really only interested in alpha, beta, and sigma estimations. However, in more complex hierarchical models with lots of covariates, I find the code including mu to be more pleasing. My resulting question is, can only some of the parameters be printed? Using the extract function would be okay for then calculating means and confidence intervals but I wouldn't get an Rhat or n_eff. I'm not terribly familiar with working with S4 objects but I know head() and stanfit[1:3] wouldn't work. So is there any way to use some version of the uppermost code with mu[i] but not have mu[i] printed? This would be similar to WinBUGS/JAGS where you can choose which parameters to save and report.

Thanks again,
Dan



Ben Goodrich

unread,
Oct 21, 2012, 12:52:59 AM10/21/12
to stan-...@googlegroups.com
On Sunday, October 21, 2012 12:10:30 AM UTC-4, Daniel Hocking wrote:
I have another question about working with rstan output. Shige's question about Stan's trouble with large datasets made me want to try some examples (using a similar version of the code he had, I had the same problem of very small n_eff and huge Rhat when using a large dataset of n=100,000). I decided to try increasing the size of the dataset for another problem I was working on.

In Shige's case, it is more an issue of initial values than large n, per se. However, a large n can compound the problem of poor initial values. Also, for large n, you really would be well served to vectorize wherever possible.

That way there is no mu since I am really only interested in alpha, beta, and sigma estimations. However, in more complex hierarchical models with lots of covariates, I find the code including mu to be more pleasing. My resulting question is, can only some of the parameters be printed?

Yes, if mu were defined in the model block rather than the transformed parameters block, then mu is not saved. As in
 
model { 
  real mu;
  
//Priors

  alpha 
~ normal(0, 100);
  beta 
~ normal(0, 100);
  sigma 
~ uniform(0, 100);
  
  
//Likelihood
  
for(in 1:n){
 
    mu <- alpha + beta*x[i];
    y
[i] ~ normal(mu,sigma);
  }
 
or better yet 

model {
  vector
[N] mu;
  mu
<- alpha + beta * x;
 
// priors
  y
~ normal(mu,sigma);
}

Also, the stan() function accepts an argument called pars which can be a character vector of named quantities in the parameters, transformed parameters, or generated quantities blocks to save. There is an argument to be made that you should save everything in order to best judge convergence. However, that argument is less compelling when talking about a linear function of x.

I'm not terribly familiar with working with S4 objects but I know head() and stanfit[1:3] wouldn't work.

You shouldn't have to know anything about S4 objects to use stan, but those functions indeed do not work. We are working in v1.0.3 and future versions to make more functions work as expected with stanfit objects.

Ben

Tim Triche, Jr.

unread,
Oct 22, 2012, 12:05:06 AM10/22/12
to stan-...@googlegroups.com
On Sat, Oct 20, 2012 at 9:52 PM, Ben Goodrich <goodri...@gmail.com> wrote:

You shouldn't have to know anything about S4 objects to use stan, but those functions indeed do not work. We are working in v1.0.3 and future versions to make more functions work as expected with stanfit objects.

Ooh, something I could theoretically actually help out with.  Depending on how ugly/not ugly the accessors are.  Will have a look.

What sort of plans exist to get rstan into CRAN?  It would sure help with popularizing Stan and to have packages depend upon it.  



--
A model is a lie that helps you see the truth.

Ben Goodrich

unread,
Oct 22, 2012, 12:18:07 AM10/22/12
to stan-...@googlegroups.com, ttr...@usc.edu
On Monday, October 22, 2012 12:05:27 AM UTC-4, Tim wrote:
On Sat, Oct 20, 2012 at 9:52 PM, Ben Goodrich <goodri...@gmail.com> wrote:

You shouldn't have to know anything about S4 objects to use stan, but those functions indeed do not work. We are working in v1.0.3 and future versions to make more functions work as expected with stanfit objects.

Ooh, something I could theoretically actually help out with.  Depending on how ugly/not ugly the accessors are.  Will have a look.

We could use the help, regardless of whether it ever makes it to CRAN. Particularly with suggestions from you or anyone else as to what existing functions could be expected to work on an stanfit object.
 
What sort of plans exist to get rstan into CRAN?  It would sure help with popularizing Stan and to have packages depend upon it.

I'm sure it would. Our plans to get rstan onto CRAN are currently up in the air. 

Ben

Reply all
Reply to author
Forward
0 new messages