model selection for automised bsts model

697 views
Skip to first unread message

tina....@googlemail.com

unread,
Apr 7, 2015, 7:00:21 AM4/7/15
to causal...@googlegroups.com

Hi Kay, 


I would appreciate if you could answer some general questions about the package. 


1) What are the default state space model components used in the CausalImpact package when the bsts.model is not specified, in other words, when the model is run like in the example (***)below?


From the paper, I understood that the default model includes: 

- a local linear trend (eq. 2.3 in the paper), 

- seasonality component (eq. 2.5) (the package seems to set S to 1 by default. Is that always the case? )

- and the regression component with static coefficients.


I tested this with the given example using StateSizes(bsts.model$state.specification)

inside the RunWithData() function.

This shows that only a local linear trend was used - I was expecting a local linear trend, seasonal component and the regression component. Hence, I wonder how the package choses the state space components if not specified? 


example(***)

library(CausalImpact)

set.seed(1)

x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)

y <- 1.2 * x1 + rnorm(100)

y[71:100] <- y[71:100] + 10

data <- cbind(y, x1)

pre.period <- c(1, 70)

post.period <- c(71, 100)


debug(CausalImpact)

debug(RunWithData)

impact <- CausalImpact(data, pre.period, post.period)

StateSizes(bsts.model$state.specification)


output: 

StateSizes(bsts.model$state.specification)

trend 

    1 


2) Ways to evaluate the model fit with the package: For the purpose of model selection with a customised bsts model, I wanted to look at some metrics how well the state space model fits the pre-intervention period. Plots are produced by the package, but I also want to look at the standardised prediction errors (ADF,PAC,QStat, etc). 


I noticed Harvey's goodness of fit statistic is included in the bsts package, given by summary(object, burn = SuggestBurn(.1, object), ). Is this the recommended measure for model fit? I also found an example in the CRAN documentation of the bsts package including an ACF plot (p.4) and the functions. I would welcome other suggestions how the package can help fitting the bsts model.

3) Specify prior sample size: Could you explain how you chose the value of prior sample size=32  (nu - in the paper) and the prior degrees=50 of freedom for the spike-and-slab prior in the paper?

Reference: p. 21: “For the inverse-Gamma prior on its diffusion variance we used a prior estimate of 0.1sigma_{y} and a prior sample size \nu=32. We used a spike-and-slab prior with an expected model size of M=3, an explained variance of 0.8 and 50 prior degrees of freedom. ”

4) How to specify prior inclusion probabilities in the case where the state space includes a regression component?: I understand the package defaults the prior inclusion probabilities to zero. I was looking for the argument that lets me specify the value for the inclusion probability. 


Many Thanks & Regards, 

Tina

Kay Brodersen

unread,
Apr 14, 2015, 4:14:10 PM4/14/15
to Tina Wenzel, causal...@googlegroups.com
Hi Tina,

Thanks for reaching out. Please see my comments inline - I hope this is helpful.

With best wishes,
Kay


On 4 April 2015 at 15:23, <tina....@googlemail.com> wrote:

Hi Kay, 


I would appreciate if you could answer some general questions about the package. 


1) What are the default state space model components used in the CausalImpact package when the bsts.model is not specified, in other words, when the model is run like in the example (***)below?


From the paper, I understood that the default model includes: 

- a local linear trend (eq. 2.3 in the paper), 

- seasonality component (eq. 2.5) (the package seems to set S to 1 by default. Is that always the case? )

- and the regression component with static coefficients.



The default model is constructed in impact_model_ss.R:152 and typically consists of:
  • a local level
  • a static regression component
unless otherwise specified by model.args$. See also the default priors specified at the top of the file.
 


I tested this with the given example using StateSizes(bsts.model$state.specification)

inside the RunWithData() function.

This shows that only a local linear trend was used - I was expecting a local linear trend, seasonal component and the regression component. Hence, I wonder how the package choses the state space components if not specified? 


example(***)

library(CausalImpact)

set.seed(1)

x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)

y <- 1.2 * x1 + rnorm(100)

y[71:100] <- y[71:100] + 10

data <- cbind(y, x1)

pre.period <- c(1, 70)

post.period <- c(71, 100)


debug(CausalImpact)

debug(RunWithData)

impact <- CausalImpact(data, pre.period, post.period)

StateSizes(bsts.model$state.specification)


output: 

StateSizes(bsts.model$state.specification)

trend 

    1 


2) Ways to evaluate the model fit with the package: For the purpose of model selection with a customised bsts model, I wanted to look at some metrics how well the state space model fits the pre-intervention period. Plots are produced by the package, but I also want to look at the standardised prediction errors (ADF,PAC,QStat, etc). 


I noticed Harvey's goodness of fit statistic is included in the bsts package, given by summary(object, burn = SuggestBurn(.1, object), ). Is this the recommended measure for model fit? I also found an example in the CRAN documentation of the bsts package including an ACF plot (p.4) and the functions. I would welcome other suggestions how the package can help fitting the bsts model.


All of these are good starting points. In addition, you could evaluate the predictive accuracy using a backtesting strategy (i.e., by training on time points 1...n and testing the accuracy of the forecast for time points n+1 ... n+k; then pass the training window along the time series and repeat). All of this you would do on the pre-period. In addition to providing you with an idea of goodness of fit, this also allows you to estimate statistical power (since any potential treatment would have to lift the response metric outside of the prediction intervals).


3) Specify prior sample size: Could you explain how you chose the value of prior sample size=32  (nu - in the paper) and the prior degrees=50 of freedom for the spike-and-slab prior in the paper?

These are simply default suggestions that worked well across a range of scenarios we've been working with at Google.
 

Reference: p. 21: “For the inverse-Gamma prior on its diffusion variance we used a prior estimate of 0.1sigma_{y} and a prior sample size \nu=32. We used a spike-and-slab prior with an expected model size of M=3, an explained variance of 0.8 and 50 prior degrees of freedom. ”

4) How to specify prior inclusion probabilities in the case where the state space includes a regression component?: I understand the package defaults the prior inclusion probabilities to zero. I was looking for the argument that lets me specify the value for the inclusion probability. 


The prior inclusion probabilities are set to: expected.model.size / number.of.variables. The default expected.model.size is 3. For example, if your data frame contains 10 predictor variables, the prior inclusion probability for each of them will be set to 3/10. You can modify this prior by adjusting kStaticRegressionExpectedModelSize.


Many Thanks & Regards, 

Tina

--
You received this message because you are subscribed to the Google Groups "CausalImpact" group.
To unsubscribe from this group and stop receiving emails from it, send an email to causalimpact...@googlegroups.com.
To post to this group, send email to causal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/causalimpact/f4999b74-c167-43d6-ba10-6bedae18d43e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

tina....@googlemail.com

unread,
Apr 16, 2015, 5:00:15 AM4/16/15
to causal...@googlegroups.com, tina....@googlemail.com
Thank you very much!
Best, 
Tina
Reply all
Reply to author
Forward
0 new messages