How to translate a Hidden Markov Model in WinBUGS/JAGS code to RStan?

1,047 views
Skip to first unread message

Luke Mansillo

unread,
May 25, 2016, 9:25:44 AM5/25/16
to stan-...@googlegroups.com

I have been re-working in WinBUGS a Hidden Markov Model that was initially written by Simon Jackman to pool the polls over an election campaign. I have included the original article, replication package, and a recent update by Professor Jackman. I also have included my R code, WinBUGS code and data. I have been prompted to increase my computational efficiency because running the model takes quite a while. I have only this last week downloaded RStan, then went hunting for a HMM example so I could learn from example. I came across a lot more complex examples that I anticipated, for example, and I am struggling to interpret the Stan language despite reading the Stan reference manual. I do not have a statistical or computational background and I am intending to learn Stan as I do my PhD. As there is an election campaign in Australia where I live and I research Australian elections, I've found a sudden need to improve my computational efficiency.

I was hoping someone could explain how to translate the BUGS model below into Stan.

Please have some patience and many thanks in advance.

model{
 ## measurement model
 for(i in 1:NPOLLS){
       mu[i] <- house[org[i]] + alpha[date[i]]
       y[i] ~ dnorm(mu[i],prec[i])
 }

 ## transition model (aka random walk prior)
 for(i in 2:NPERIODS){
        mu.alpha[i] <- alpha[i-1]
        alpha[i] ~ dnorm(mu.alpha[i],tau)
 }

 ## priors
 tau <- 1/pow(sigma,2)   ## deterministic transform to precision
 sigma ~ dunif(0,.01)    ## uniform prior on standard deviation

 alpha[1] ~ dunif(.4,.6) ## initialization of daily track

 for(i in 1:13){          ## vague normal priors for house effects
       house[i] ~ dnorm(0,.01)
 }

}


The R code I'm using to set up my WinBUGS code is below.
data <- read.csv("8 may polls.csv",header=TRUE)
data$date.j <- julian(as.Date(as.vector(data$date),"%d-%m-%y"),origin=as.Date("1996-03-02"))
data$date.st <- as.numeric(data$date.j) 

## convert result and sample size to variance
data$y <- data$LNP2PP/100
data$var <- data$y*(1-data$y)/data$sample


## compute upper and lower bounds
data$sd <- sqrt(data$var)
data$lo <- data$y - 1.96*data$sd
data$up <- data$y + 1.96*data$sd

data$org <- as.integer(data$pollster)

## Date to predict to

predend <- 1 + julian(as.Date("2016-05-26"),format="%Y-%m-%d", origin=as.Date("1996-03-02"))

## set up WinBUGS

library(R2WinBUGS)

## set up data lists

foo <- list()

foo$y <- data$LNP2PP/100 # set recale percentage to 0 to 1
var <- foo$y*(1-foo$y)/data$sample # calculate variance
foo$prec <- 1/var
foo$date <- as.numeric(data$date.j)  ## set 2 March 1996, election day, as 1
foo$org <- data$org ## there are 13 in the data set
foo$NPOLLS <- length(data$y) ## there are 2001 polls
foo$alpha <- c(.5363,rep(NA,length(2:945)), ## there are 7 election results; Coalition 2PP
               .4902,rep(NA,length(947:2079)),
               .5095,rep(NA,length(2081:3143)),
               .5274,rep(NA,length(3145:4284)),
               .4730,rep(NA,length(4286:5285)),
               .4988,rep(NA,length(5287:6398)),
               .5349,rep(NA,length(6400:predend))) ## 26 May 2016 is day 7391
foo$NPERIODS <- length(foo$alpha)

## initial values for priors

initfunc <- function(){
  house <- rnorm(n=13,sd=.05)
  NPERIODS <- length(1:(max(foo$date)))
  alpha <- c(runif(n=7,.4,.6),  ## 7 election results
             runif(n=NPERIODS-7), ## length minus 7 election results
             NA) ## missing chain point
  sigma <- runif(n=1,0,.01)  ## uniform prior on standard deviation
  list(house=house, ## parameter - house effect
       alpha=alpha, ## parameter - chain
       sigma=sigma) ## standard deviation
}

## send to WinBUGS

daily <- bugs(data=foo,
              inits=initfunc,
              debug=TRUE,
              n.burnin=15000,
              n.iter=70000,
              n.thin=100,
              codaPkg=TRUE,
              parameters.to.save=c("alpha","house","sigma"),
              model.file="kalman bug 13.txt",bugs.directory="c:/WinBUGS14",
              program=c("winbugs"),useWINE=FALSE)

######## MUST MANUALLY CLOSE WinBUGS #####

## times for readings BUGS output

periodend <- length(foo$alpha)-7
housestart <- length(foo$alpha)-6
houseend <- length(foo$alpha)+6
sigmaend <- length(foo$alpha)+7


## read BUGS output back into R
library(coda)
alpha <- read.bugs(daily)
house <- alpha[,housestart:houseend]
sigma <- alpha[,sigmaend]
alpha <- alpha[,1:periodend]

alpha.bar <- apply(as.matrix(alpha),2,mean)
alpha.ci <- apply(as.matrix(alpha),2,quantile,c(.025,.975))

## estimates

alpha.bar <- c(.5363,as.matrix(alpha.bar)[1:944],.4902,as.matrix(alpha.bar)[945:2077],.5095,as.matrix(alpha.bar)[2078:3140],.5274,as.matrix(alpha.bar)[3141:4280],.4730,as.matrix(alpha.bar)[4281:5280],.4988,as.matrix(alpha.bar)[5281:6392],.5349,as.matrix(alpha.bar)[6393:length(alpha.bar)])
alpha.ci <- t(matrix(c(.5363,as.matrix(alpha.ci)[1,1:944],.4902,as.matrix(alpha.ci)[1,945:2077],.5095,as.matrix(alpha.ci)[1,2078:3140],.5274,as.matrix(alpha.ci)[1,3141:4280],.4730,as.matrix(alpha.ci)[1,4281:5280],.4988,as.matrix(alpha.ci)[1,5281:6392],.5349,as.matrix(alpha.ci)[1,6393:length(alpha.ci)],
                       .5363,as.matrix(alpha.ci)[2,1:944],.4902,as.matrix(alpha.ci)[2,945:2077],.5095,as.matrix(alpha.ci)[2,2078:3140],.5274,as.matrix(alpha.ci)[2,3141:4280],.4730,as.matrix(alpha.ci)[2,4281:5280],.4988,as.matrix(alpha.ci)[2,5281:6392],.5349,as.matrix(alpha.ci)[2,6393:length(alpha.ci)]),nrow=length(alpha.bar),ncol=2))

### estimate on final day of chain
alpha.bar[length(alpha.bar)]
### lower credibility interval
alpha.ci[1,length(alpha.bar)]
### upper credibility interval
alpha.ci[2,length(alpha.bar)]       
         
         #####
         
houseEffects <- round(apply(as.matrix(house),2,function(x)c(mean(x),quantile(x,c(.025,.975))))*100,1)
dimnames(houseEffects)[[2]] <- levels(data$pollster)
         
         ## summary statistics for average
houseSum <- apply(as.matrix(house),1,mean)
mean(houseSum)*100
quantile(houseSum,c(.025,.975))*100
write.csv(houseEffects,file = "In-houseeffectsMansillo26mayl2016.csv")
         
         #### save daily estimates
         
dest <- cbind(alpha.bar,alpha.ci[1,],alpha.ci[2,])
colnames(dest)[1]<-"alpha.bar.mean"
colnames(dest)[2:3]<-c("alpha.bar.low.95","alpha.bar.upper.95")
write.csv(dest,file = "Coalition2PP2March1996Mansillo26may2016.csv")
8 may polls.csv
Jackman 2005.pdf
election poll pooling working file 25 may 2.R
kalman bug 13.txt
jackman_PoolingPolls.zip

Bob Carpenter

unread,
May 25, 2016, 10:54:23 AM5/25/16
to stan-...@googlegroups.com
There's an HMM example in the manual, but that involves
discrete state spaces that get marginalized out.
Your model has only continuous parameters (I think --- the
BUGS/JAGS models don't declare variable types), so you
don't need to do that (though in some cases, if everything's
conjugate, you can marginalize out the continuous states,
as in Kalman filters).

We don't recommend interval priors like this, and they
behave very differently in BUGS/JAGS than Stan for scale
parameters, because they use 1 / sigma^2 parameterizations.

> sigma ~ dunif(0,.01) ## uniform prior on standard deviation

are almost certainly bad news. It means your parameter with
scale sigma is going to have small variation. We recommend (see
the manual chapter on regression and Wiki page on prior
recommendations) unbounded <lower=0> constrained parameters for
scales with weakly informative priors.

Same here, only with no lower or upper-bound constraints.

> alpha[1] ~ dunif(.4,.6) ## initialization of daily track


- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <8 may polls.csv><Jackman 2005.pdf>

Luke Mansillo

unread,
May 30, 2016, 1:29:50 AM5/30/16
to stan-...@googlegroups.com
I have made a few attempts but I think I'm missing something crucial.

Below is my Stan code:

data {
    // data size
    int<lower=1> n_polls;
    int<lower=1> n_span;
    int<lower=1> n_houses;

    // poll data
    real<lower=0,upper=1> y[n_polls];
    real<lower=0> prec[n_polls];
    int<lower=1> houses[n_polls];
    int<lower=1> day[n_polls];
}
parameters {
    real<lower=0,upper=1> hidden_voting_intention[n_span];
    real<lower=0,upper=1> houseEffect[n_houses];
    real<lower=0,upper=0.01> sigma;
}
model{
    // -- temporal model
    sigma ~ uniform(0, 0.01);
    hidden_voting_intention[1] ~ normal(0.5363, 0.001);
    hidden_voting_intention[946] ~ normal(0.4902, 0.001);
    hidden_voting_intention[2080] ~ normal(0.5095, 0.001);
    hidden_voting_intention[3144] ~ normal(0.5274, 0.001);
    hidden_voting_intention[4285] ~ normal(0.473, 0.001);
    hidden_voting_intention[5280] ~ normal(0.4988, 0.001);
    hidden_voting_intention[6399] ~ normal(0.5349, 0.001);
    for(i in 2:n_span) {
        hidden_voting_intention[i] ~ normal(hidden_voting_intention[i-1], sigma);
    }

    // -- observational model
    for(poll in 1:n_polls) {
        y[poll] ~ normal(houseEffect[houses[poll]] + hidden_voting_intention[day[poll]], prec[poll]);
    }
}


I am not entirely sure how to anchor the voting intention to the election results in the Stan framework. I have the election results for 7 dates and I'm not entirely sure on the best way to incorporate them into the model given I cannot (or have failed to) create hidden_voting_intention as data with the election results and missing data points to be estimated.

Here is my R code:
##### Luke Mansillo - Pooling the Polls
##### Department of Government & International Relations, University of Sydney
##### 30 May 2016

## get data

setwd("C:/Users/Luke/Documents/stan")
data <- read.csv("8 may polls.csv",header=TRUE)
data$date.j <- julian(as.Date(as.vector(data$date),"%d-%m-%y"),origin=as.Date("1996-03-02"))
data$date.st <- as.numeric(data$date.j) 

## convert result and sample size to variance
data$y <- data$LNP2PP/100
data$var <- data$y*(1-data$y)/data$sample


## compute upper and lower bounds
data$sd <- sqrt(data$var)
data$lo <- data$y - 1.96*data$sd
data$up <- data$y + 1.96*data$sd

data$org <- as.integer(data$pollster)

## Date to predict to

predend <- 1 + julian(as.Date("2016-05-28"),format="%Y-%m-%d", origin=as.Date("1996-03-02"))

####

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())


foo <- list()
coo <- list()
foo$y <- data$LNP2PP/100 # set recale percentage to 0 to 1
var <- foo$y*(1-foo$y)/data$sample # calculate variance
foo$prec <- 1/var
foo$day <- as.numeric(data$date.j)  ## set 2 March 1996, election day, as 1
foo$houses <- data$org ## there are 13 in the data set
foo$n_polls <- length(data$y) ## there are 2001 polls
coo$alpha <- c(.5363,rep(NA,length(2:945)), ## there are 7 election results; Coalition 2PP
               .4902,rep(NA,length(947:2079)),
               .5095,rep(NA,length(2081:3143)),
               .5274,rep(NA,length(3145:4284)),
               .4730,rep(NA,length(4286:5285)),
               .4988,rep(NA,length(5287:6398)),
               .5349,rep(NA,length(6400:predend))) ## 26 May 2016 is day 7391
foo$n_span <- length(coo$alpha)
foo$n_houses <- nlevels(data$pollster)



fit <- stan(file = 'model3.stan', data = foo, 
            iter = 70000, warmup = 10000,  chains = 3)

la <- extract(fit, permuted = TRUE)
mu <- la$hidden_voting_intention
house <- la$houseEffect
a <- extract(fit, permuted = FALSE)

a2 <- as.array(fit)
m <- as.matrix(fit)

alpha <- apply(mu,2,mean)
alpha.ci <- apply(mu,2,quantile,c(.025,.975))
house.ef <- apply(house,2,mean)
house.ef.ci <- apply(house,2,quantile,c(.025,.975))

The results have not been intuitive and to be frank I'm somewhat lost.

Am I anchoring my daily estimates of public opinion correctly from the polls to the elections appropriately? If not how can I best do this in the Stan environment?

Is my measurement model of the in-house bias of the polls doing what I think it is, i.e. the original WinBUGS code?

Kind regards,
Luke

andre.p...@googlemail.com

unread,
May 30, 2016, 3:01:18 AM5/30/16
to stan-...@googlegroups.com
Luke,

just to scratch down the idea. 

In R you do:
 hidden <- c(1,946) # ....
 nonhidden <- setdiff(seq(1,n_span),hidden)

then you init in stan

hidden_raw[1] ~ normal( ... );
hidden_raw[2] ~ normal( ... );
hidden_voting[hidden] <- hidden_raw;

And then you loop over the nonhidden

for(i in 1:length(nonhidden)) {
   hidden_voting[nonhidden[i]]  ~ normal(hidden_voting[nonhidden[i]-1], sigma);
}

One may also think about using the 'MATT' trick. 

Andre

Luke Mansillo

unread,
May 30, 2016, 3:04:33 AM5/30/16
to stan-...@googlegroups.com
Thank you for your patience.
I will have another go shortly.

Sent from my iPhone


On 30 May 2016, at 5:01 PM, andre.pfeuffer via Stan users mailing list <stan-...@googlegroups.com> wrote:

Luke,

just to scratch down the idea. 

In R you do:
 hidden <- c(1,946) # ....
 nonhidden <- setdiff(seq(1,n_span),hidden)

then you init in stan

hidden_raw[1] ~ normal( ... );
hidden_raw[2] ~ normal( ... );
hidden_voting[hidden] <- hidden_raw;

And then you loop over the nonhidden

for(i in 1:length(nonhidden)) {
   hidden_voting[nonhidden[i]]  ~ normal(hidden_voting[nonhidden[i]-1], sigma);
}

Andre

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/8mIMbpum_Qk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Luke Mansillo

unread,
May 30, 2016, 3:37:12 AM5/30/16
to stan-...@googlegroups.com
Wouldn't the 7 election results that we know be the nonhidden cases whereas the 7386 other days in the set be the hidden cases?

On 30 May 2016 at 17:01, andre.pfeuffer via Stan users mailing list <stan-...@googlegroups.com> wrote:
Luke,

just to scratch down the idea. 

In R you do:
 hidden <- c(1,946) # ....
 nonhidden <- setdiff(seq(1,n_span),hidden)

then you init in stan

hidden_raw[1] ~ normal( ... );
hidden_raw[2] ~ normal( ... );
hidden_voting[hidden] <- hidden_raw;

And then you loop over the nonhidden

for(i in 1:length(nonhidden)) {
   hidden_voting[nonhidden[i]]  ~ normal(hidden_voting[nonhidden[i]-1], sigma);
}

Andre

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/8mIMbpum_Qk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
May 30, 2016, 10:22:49 AM5/30/16
to stan-...@googlegroups.com
The supervised (non-hidden) and unsupervised (hidden)
cases should look the same, the difference being
whether the latent variables are variable or observed.

- Bob
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Andrew Gelman

unread,
May 30, 2016, 4:14:34 PM5/30/16
to stan-...@googlegroups.com
Hi, a few things.  First, all those uniform bounded parameters are bad news.  They should all be unbounded (except of course that the sigmas should have lower bound of 0).  If you have prior info that a parameter should be close to some range, put it in as a "soft constraint" (that is, a prior distribution such as normal or whatever) not a "hard constraint" (that is, bounds).  Second, you can run into trouble if you have parameters with priors such as normal(0.5363, 0.001).  I think you'd be better off just specifying these as data.  In this case you can define your time series as random variables during the periods between the measured values.
A


On May 30, 2016, at 1:29 AM, Luke Mansillo <lukema...@gmail.com> wrote:

I have made a few attempts but I think I'm missing something crucial.

Below is my Stan mode:

Luke Mansillo

unread,
May 31, 2016, 11:34:07 AM5/31/16
to Stan users mailing list, gel...@stat.columbia.edu, ca...@alias-i.com

When I thought I was embarrassing myself horribly with my utter ignorance, I emailed Bob. I am very thankful for his assistance given how stumped I have been. Below highlighted is my text responding to his comments.


This model's going to be fast and easy to fit in Stan.
 
I took a look at the model, and indeed the error scale is observed (it comes with the poll).  Jackman uses that same hacked uniform(0.4, 0.6) on the initial state.  You shouldn't need an extreme measure like that in Stan --- in fact, as Andrew pointed out, we actively discourage it (if estimates are near boundaries, the constraint dominates and the truncation chops off probability mass and has an extreme effect on posterior means;  if not, you don't need the interval constraint).
 
Also, the whole model is degenerate if you want to assume alpha[i] in (0,1).  You either need to put that constraint on alpha OR you need to work on the logit scale, where the time series is log odds in (-inf, inf), and you use inverse-logit to transform back to (0,1).  That's the usual GLM approach to general models for variables constrained to lie in (0,1).
 

I could put the constraint on alpha between 0.32 and 0.65 which is well within my polling data and would be intuitive. But that being said the initial state is an election outcome, so ought to start at 0.5363 given that is the observed measured election result. 

 

 
Of course, setting sigma in (0, 0.01) will also keep things from wandering away, but will cause real issues for Stan if the true answer is near 0.01 for the same reason as any other interval constraint (it chops off support above 0.01 and thus pushes the posterior means lower than 0.01).
 

 

 

########

 

 
I see the BUGS program, but need to know what's observed and not to translate it to Stan.

 

Over NPERIODS a span of dates from the 2nd March 1996 to the 28th May 2016 (1:7393) in the initial write up. The two-party preferred vote or alpha, is observed at points 1, 946, 2080, 3144, 4285, 5286, and 6399. These values are 0.5363, 0.4902, 0.5095, 0.5274, 0.4730, 0.4988, and 0.5349. The unobserved points are 2:945, 947:2079, 2081:3143, 3145:4284, 4286:5285, 5287:6398, and 6400:7393 with the end date of the 28th May 2016.

 
As Andrew pointed out, this model is bad news with
 
  sigma ~ uniform(0, 0.01);
 
That means sigma has to be very very small.
 
I acknowledge that this is very small and I adopted the prior from the paper (Jackman 2005) given the assumption that a day’s opinion will be much the same as the previous day’s opinion. I can also see good reasons to adjust the prior as the assumption that public opinion moves by a maximum of 1% between days is invalidated in the time series.
 
And if you really insist on
 
  alpha[1] ~ dunif(0.4, 0.6)
 
Then you have to break alpha apart into components, because alpha[1] would need to be declared with a constraint.
Same with that sigma, by the way.

 

It would be appropriate to have a hard constraint on alpha[1] as it is the known election result.

alpha [1] ~ normal(0.5363, 0.001);

    alpha [946] ~ normal(0.4902, 0.001);

   alpha [2080] ~ normal(0.5095, 0.001);

    alpha [3144] ~ normal(0.5274, 0.001);

    alpha [4285] ~ normal(0.473, 0.001);

    alpha [5280] ~ normal(0.4988, 0.001);

    alpha [6399] ~ normal(0.5349, 0.001);

 

 

Why is there a prec[i] for each y[i]?  That won't work if prec is a parameter (you can't fit a variance with a single observation).
 

prec[i] is the precision on each poll result which is a function of y and the poll’s sample size. It is derived from 1/(y*((1-y)/n)) i.e. vars= y*(1-y)/n 1/vars; see Jackman (2005: 503) if unsure.

 

The Stan model's going to look something like this (I took the liberty of eliminating the hacks to intervals and converted the precision variable to a scale):
 

I am unsure what the difference between a precision and a scale. Could you please elaborate on what the scale entails.

 
data {
  int NPOLLS;
  int NPERIODS;
  vector[13] house;
  int<lower=1, upper=NPERIODS> date[NPOLLS];
  int<lower=1, upper=13> org[NPOLLS];
  real<lower=0> scale[NPOLLS];
}
parameters {
  real<lower=0> sigma;
  vector[NPERIODS] alpha;
}
model {
  sigma ~ normal(0, 0.1);
  alpha[1] ~ normal(0.5, 0.1);
  alpha[2:NPERIODS] ~ normal(alpha[1:(NPERIODS-1)], sigma);
  house ~ normal(0, 10);
  y ~ normal(house[org] + alpha[date], scale);
}
 
I vectorized everything, including the sampling for y. 
 
There's a complication with missing data (still not sure what's missing here).
 

I am unsure how to approach what Andrew suggested regarding specifying my observed values and defining the times as random variables during the periods between the measured values.

 

I thought about the missing observations model in the documentation, below:

 

data {

 

int<lower=0> N_obs;

int<lower=0> N_mis;

real y_obs
[N_obs];

}

parameters
{

real mu
;

real
<lower=0> sigma;

real y_mis
[N_mis];

}

model
{

for (n in 1:N_obs)

y_obs
[n] ~ normal(mu,sigma);

for (n in 1:N_mis)

y_mis
[n] ~ normal(mu,sigma);

}


 

I’m quite possibly quite far off here with my attempt to incorporate both the observed and missing values in the time series. Perhaps I should break NPERIODS into 7 separate periods? but here goes nothing. I am concerned I've broken the link in the chain of alphas, e.g. there is no declaration between alpha[945] and alpha[946].

 

data {

  int NPOLLS;

  int NPERIODS;

  int<lower=0> N_obs;

  int<lower=0> N_mis;

  real alpha_obs[N_obs];

  vector[13] house;

  int<lower=1, upper=NPERIODS> date[NPOLLS];

  int<lower=1, upper=13> org[NPOLLS];

  real<lower=0> scale[NPOLLS];

}

parameters {

  real<lower=0> sigma;

  vector[NPERIODS] alpha;

}

model {

  sigma ~ normal(0, 0.1);

  alpha[1] ~ normal(alpha_obs[1], 0.1);

  alpha[2:NPERIODS] ~ normal(alpha[1:(NPERIODS-1)], sigma);

  alpha[946] ~ normal(alpha_obs[2], 0.1);

  alpha[947:NPERIODS] ~ normal(alpha[946:(NPERIODS-1)], sigma);

  alpha[2080] ~ normal(alpha_obs[3], 0.1);

  alpha[2081:NPERIODS] ~ normal(alpha[2080:(NPERIODS-1)], sigma);

  alpha[3144] ~ normal(alpha_obs[4], 0.1);

  alpha[3145:NPERIODS] ~ normal(alpha[3144:(NPERIODS-1)], sigma);

  alpha[4285] ~ normal(alpha_obs[5], 0.1);

  alpha[4286:NPERIODS] ~ normal(alpha[4285:(NPERIODS-1)], sigma);

  alpha[5286] ~ normal(alpha_obs[6], 0.1);

  alpha[5287:NPERIODS] ~ normal(alpha[5286:(NPERIODS-1)], sigma);

  alpha[6399] ~ normal(alpha_obs[7], 0.1);

  alpha[6400:NPERIODS] ~ normal(alpha[6399:(NPERIODS-1)], sigma);

  house ~ normal(0, 10);

  y ~ normal(house[org] + alpha[date], scale);

}


Perhaps the model below that explicitly predicts alpha_mis from alpha and the observed alphas are modeled from the given data.


data {

  int NPOLLS;

  int NPERIODS;

  int<lower=0> N_obs;

  int<lower=0> N_mis;

  real alpha_obs[N_obs];

  vector[13] house;

  int<lower=1, upper=NPERIODS> date[NPOLLS];

  int<lower=1, upper=13> org[NPOLLS];

  real<lower=0> scale[NPOLLS];

}

parameters {

  real<lower=0> sigma;

  vector[NPERIODS] alpha;

  vector[N_mis] alpha_mis;

}

model {

  sigma ~ normal(0, 0.1);

  alpha[1] ~ normal(alpha_obs[1], 0.1);

  alpha_mis[2:NPERIODS] ~ normal(alpha[1:(NPERIODS-1)], sigma);

  alpha[946] ~ normal(alpha_obs[2], 0.1);

 alpha_mis [947:NPERIODS] ~ normal(alpha[946:(NPERIODS-1)], sigma);

  alpha[2080] ~ normal(alpha_obs[3], 0.1);

 alpha_mis [2081:NPERIODS] ~ normal(alpha[2080:(NPERIODS-1)], sigma);

  alpha[3144] ~ normal(alpha_obs[4], 0.1);

 alpha_mis [3145:NPERIODS] ~ normal(alpha[3144:(NPERIODS-1)], sigma);

  alpha[4285] ~ normal(alpha_obs[5], 0.1);

 alpha_mis [4286:NPERIODS] ~ normal(alpha[4285:(NPERIODS-1)], sigma);

  alpha[5286] ~ normal(alpha_obs[6], 0.1);

 alpha_mis [5287:NPERIODS] ~ normal(alpha[5286:(NPERIODS-1)], sigma);

  alpha[6399] ~ normal(alpha_obs[7], 0.1);

 alpha_mis [6400:NPERIODS] ~ normal(alpha[6399:(NPERIODS-1)], sigma);

  house ~ normal(0, 10);

  y ~ normal(house[org] + alpha[date], scale);

appendix.pdf

Bob Carpenter

unread,
May 31, 2016, 4:26:50 PM5/31/16
to stan-...@googlegroups.com

> On May 31, 2016, at 10:34 AM, Luke Mansillo <lukema...@gmail.com> wrote:
>
> When I thought I was embarrassing myself horribly with my utter ignorance, I emailed Bob. I am very thankful for his assistance given how stumped I have been. Below highlighted is my text responding to his comments.
>
>
>
> This model's going to be fast and easy to fit in Stan.
>
> I took a look at the model, and indeed the error scale is observed (it comes with the poll). Jackman uses that same hacked uniform(0.4, 0.6) on the initial state. You shouldn't need an extreme measure like that in Stan --- in fact, as Andrew pointed out, we actively discourage it (if estimates are near boundaries, the constraint dominates and the truncation chops off probability mass and has an extreme effect on posterior means; if not, you don't need the interval constraint).
>
> Also, the whole model is degenerate if you want to assume alpha[i] in (0,1). You either need to put that constraint on alpha OR you need to work on the logit scale, where the time series is log odds in (-inf, inf), and you use inverse-logit to transform back to (0,1). That's the usual GLM approach to general models for variables constrained to lie in (0,1).
>
>
>
>
> I could put the constraint on alpha between 0.32 and 0.65 which is well within my polling data and would be intuitive. But that being said the initial state is an election outcome, so ought to start at 0.5363 given that is the observed measured election result.

This is *not* a good idea. Remember that we're doing Bayesian
statistics here, not point estimation. Given that we have a time
series, that original alpha could be anything and isn't constrained
to be near the initial value.

Also, the constraint between 0.32 and 0.65 is arbitrary --- we only
know the value has to be between 0 and 1. So what we're recommending
is only constraining between 0 and 1, and then if you want, including
something like a beta prior that's concentrated between 0.3 and 0.7 if
you know that's what's reasonable.
>
> It would be appropriate to have a hard constraint on alpha[1] as it is the known election result.
>
> alpha [1] ~ normal(0.5363, 0.001);

This is a bad idea for the same reason. I know it was in Jackman's
original model.
>
> The Stan model's going to look something like this (I took the liberty of eliminating the hacks to intervals and converted the precision variable to a scale):
>
>
> I am unsure what the difference between a precision and a scale. Could you please elaborate on what the scale entails.

"Scale" is the name of the "standard deviation" parameter.
In general, variance = sd^2, precision = 1 / sd.

So I just mean parameterize in terms of scale.

>
> I am unsure how to approach what Andrew suggested regarding specifying my observed values and defining the times as random variables during the periods between the measured values.

I'm missing the context here.
>
> I’m quite possibly quite far off here with my attempt to incorporate both the observed and missing values in the time series. Perhaps I should break NPERIODS into 7 separate periods? but here goes nothing. I am concerned I've broken the link in the chain of alphas, e.g. there is no declaration between alpha[945] and alpha[946].

You want to merge your observed and unobserved vectors into a single
vector in transformed parameters. See the chapter on missing data in
the manual for examples.

But the new indexing features make this easier.

data {
int N;
int N_obs;
int N_unobs;
vector[N_obs] y_obs;
int ii_y_obs[N_obs];
int ii_y_unobs[N_unobs];
...

parameters {
vector[N_unobs] y_unobs;
...

transformed pareters {
vector[N] y;
y[ii_y_obs] <- y_obs;
y[ii_y_unobs] <- y_unobs;

...

now you're good to go in the rest of the program. The ii
values are the indexes of which values are observed and which
aren't.

- Bob





Andrew Gelman

unread,
May 31, 2016, 4:28:09 PM5/31/16
to stan-...@googlegroups.com
Just very quickly: if you want to set a parameter to a fixed value, we recommend specifying it as data rather than giving it a very sharp prior.
A

Bob Carpenter

unread,
Jun 1, 2016, 1:09:51 AM6/1/16
to stan-...@googlegroups.com
This isn't what's going on Andrew. They don't want to
set to a fixed value, they want to cap the evolution of
the time series so that inter-day, there's a 0.01 sd on
the probability scale. Same for starting values, which they're
trying to coax to a very narrow range. It's probably not
necessary --- I'd actually fit that SD rather than hard
coding it.

It's all in Jackman's paper. Luke's just trying to implement
Jackman's model, which is the source of all of these
modeling decisions.

I think they can and should all be relaxed, and to the
extent possible, estimated rather than hard-coded.

But the whole model's defective in that it doesn't
constraint proportions to (0, 1).

- Bob
Reply all
Reply to author
Forward
0 new messages