Inconsistency in Confidence Interval Terminology in statsmodel

Timmy Jimmy

unread,

Mar 23, 2022, 8:22:58 AM3/23/22

to pystatsmodels

Statsmodel mixes confidence interval and prediction interval terminology despite the two having different definitions. For example, the statsmodels.tsa.arima.model.ARIMAResults.get_forecast function is defined as “Out-of-sample forecasts and prediction intervals” but returns “...out-of-sample forecasts and results including confidence intervals.” Correspondingly, the summary_frame method from get_forecast, outputs mean values and CIs (confidence intervals).

Confidence intervals get me intervals around a parameter. What I really care about is quantifying the probability of my forecasts laying within a specified interval (aka prediction intervals). What exactly is being presented here from statsmodel? Are the mean values of the forecast point predictions given by statsmodel forecast representing the average value of the distribution of potential values, or do they represent some other mean value?

To add even more confusion other models such as ETS models have a summary_frame method that outputs mean and prediction intervals instead of mean and confidence intervals like in ARIMA. Why the change in intervals between the two methods? If the confidence intervals in ARIMA forecasts are actually confidence intervals and not prediction intervals, then how can I interpret it given that it isn’t telling me anything about the variability in the point forecasts.

The difference between confidence and prediction intervals is explained here by Rob Hyndman https://robjhyndman.com/hyndsight/intervals/

Here is an additional post by him stating that “There is almost no use for a confidence interval in forecasting.” https://stats.stackexchange.com/questions/62188/confidence-or-prediction-limits-for-significant-difference-between-forecast-and/62197#62197

ETS model documentation showcasing prediction intervals instead of confidence

https://www.statsmodels.org/devel/examples/notebooks/generated/ets.html?highlight=prediction%20interval

Timmy Jimmy

unread,

Mar 23, 2022, 8:23:42 AM3/23/22

to pystatsmodels

Statsmodel mixes confidence interval and prediction interval terminology despite the two having different definitions. For example, the statsmodels.tsa.arima.model.ARIMAResults.get_forecast function is defined as “Out-of-sample forecasts and prediction intervals” but returns “...out-of-sample forecasts and results including confidence intervals.” Correspondingly, the summary_frame method off of get_forecast outputs mean values and CIs (confidence intervals).

Confidence intervals get me intervals around a parameter. What I really care about is quantifying the probability of my forecasts laying within a specified interval (aka prediction intervals). What exactly is being presented here? Are the mean values of the forecast point predictions given by statsmodel forecast representing the average value of the distribution of potential values, or do they represent some other mean value?

josef...@gmail.com

unread,

Mar 23, 2022, 8:47:47 AM3/23/22

to pystatsmodels

I don't know the specific answer

On Wed, Mar 23, 2022 at 8:23 AM Timmy Jimmy <whiteroc...@gmail.com> wrote:

Statsmodel mixes confidence interval and prediction interval terminology despite the two having different definitions. For example, the statsmodels.tsa.arima.model.ARIMAResults.get_forecast function is defined as “Out-of-sample forecasts and prediction intervals” but returns “...out-of-sample forecasts and results including confidence intervals.” Correspondingly, the summary_frame method off of get_forecast outputs mean values and CIs (confidence intervals).

Confidence intervals get me intervals around a parameter. What I really care about is quantifying the probability of my forecasts laying within a specified interval (aka prediction intervals). What exactly is being presented here? Are the mean values of the forecast point predictions given by statsmodel forecast representing the average value of the distribution of potential values, or do they represent some other mean value?

To add even more confusion other models such as ETS models have a summary_frame method that outputs mean and prediction intervals instead of mean and confidence intervals like in ARIMA. Why the change in intervals between the two methods? If the confidence intervals in ARIMA forecasts are actually confidence intervals and not prediction intervals, then how can I interpret it given that it isn’t telling me anything about the variability in the point forecasts.

The difference between confidence and prediction intervals is explained here by Rob Hyndman https://robjhyndman.com/hyndsight/intervals/

quote:

The distinction is mostly retained in the statistics literature. However, in econometrics it is common to use “confidence intervals” for both types of interval (e.g., Granger & Newbold, 1986). I once asked Clive Granger why he confused the two concepts, and he dismissed my objection as fussing about trivialities. I disagreed with him then, and I still do.

:)

Here is an additional post by him stating that “There is almost no use for a confidence interval in forecasting.” https://stats.stackexchange.com/questions/62188/confidence-or-prediction-limits-for-significant-difference-between-forecast-and/62197#62197

That's a strange statement

A forecast is a conditional expectation and we want to know how much uncertainty there is in the expected "mean" of the prediction.

For example, compare long run prediction of a stationary ARMA with an ARIMA with unit root and it's associated confidence interval for the conditional expectation.

ETS model documentation showcasing prediction intervals instead of confidence

https://www.statsmodels.org/devel/examples/notebooks/generated/ets.html?highlight=prediction%20interval

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/5dd9c8f4-3e70-4a00-877a-b07731d6b246n%40googlegroups.com.

Asides, because I'm just struggling with something similar for Poisson

An important distinction is "prediction interval" with or without parameter uncertainty.

For example OLS prediction interval, which I called "confidence interval for a new observation" in get_prediction, includes uncertainty about the estimate of the mean parameters beta but not the uncertainty about the estimate of residual standard deviation sigma.

AFAIK, most "prediction intervals" in tsa ignore parameter uncertainty.

I recently discovered the term "tolerance interval" which uses the confidence interval of the mean estimate to widen the "prediction interval" to include parameter uncertainly.

Terminology in this is not consistent across fields.

Josef

Chad Fulton

unread,

Mar 23, 2022, 11:56:30 AM3/23/22

to Statsmodels Mailing List

TL;DR: Overall, my impression is that (a) we should be more explicit about what intervals we are computing, but (b) this specific semantic argument about "prediction interval" versus "confidence interval" is not very useful.

--------

This comes up from time to time, so let me explain the situation as I understand it, and then hopefully (a) others can correct me if I'm misunderstanding or wrong, and (b) we can chart a path to improving the documentation.

In the context of a linear regression, we have:

y(i) = x(i)' beta + e(i)

And I guess the idea of the terminology is that a confidence interval is an interval around the mean "parameter" E[ y | x_0 ] = x_0' b and so it is based on var[ x_0' b ], where b is the estimator of beta and x_0 is some conditional value of interest for the explanatory variable that is taken as given. So this interval is fundamentally based on the variance of the estimator b.

On the other hand, a prediction interval for a hypothesized "observation" y_0 would be based on var[ x_0' b + e_0 ], where e_0 is the applicable error term (usually just iid with the e(i) terms). Thus, here we account for both the uncertainty from the estimated parameters (coming from the variance of the estimator b) as well as the uncertainty from the error term (coming from variance of the error term e_0).

To go further, we need to be more specific about what Hyndman is claiming about the terminology:

- Hyndman says that the prediction interval is for a random variable yet to be observed and can arise in either Bayesian or frequentist contexts

- While a confidence interval is a frequentist concept for a non-random but unknown parameter.

- But in a Bayesian context, a credible interval for a parameter is the corresponding concept of a confidence interval. In the Bayesian perspective, parameters are treated as random variables, and so have probability distributions

- And finally, "a Bayesian confidence interval is like a prediction interval, but associated with a parameter rather than an observation".

Let me just note here that others do not make these hard distinctions. For example, in Wooldridge's Introduction to Econometrics, he writes (section 6.4): "[we can] put a confidence interval around the OLS estimate of E(y|x1,...,xk), for any values of the explanatory variables. But this is not the same as obtaining a confidence interval for a new, as yet unknown, outcome on y. In forming a confidence interval for an outcome on y, we must account for another very important source of variation: the variance in the unobserved error." and then he says: "Let y0 denote the value for which we would like to construct a confidence interval, which we sometimes call a prediction interval."

Now, let's move to a generic state space model (of which ARIMA is a special case), which is of the form:

y(t) = d + Z alpha(t) + e(t) with e(t) ~ N(0, H)

alpha(t) = c + T alpha(t - 1) + z(t) with z(t) ~ N(0, Q)

alpha(1) ~ N(a_1, P_1)

When we are performing "frequentist" inference here (e.g. the usual maximum likelihood estimation that we do for ARIMA models), the "non-random but unknown parameters" are elements of the c, d, Z, T, H, and Q matrices. Meanwhile, the unobserved state vector alpha(t) is considered as a sequence of random variables, which themselves are estimated by the Kalman filter as a byproduct of evaluating the likelihood function for the "frequentist parameters".

Now, the intervals that we are producing for an ARIMA model in e.g. get_forecast are associated with a future observation, e.g. y(t+h), and they are based on Var[y(t+h)] = Var[Z alpha(t+h) + e(t+h)]. So this "looks like" what we described above as a prediction interval. But notice that unlike the earlier "prediction interval", this does *not* take into account uncertainty coming from the estimated parameters, because c, d, Z, T, H, and Q are taken as given at this stage. What it *does* take into account is uncertainty coming from the estimated state, alpha(t), and the error term e(t).

Moreover, the intervals that we are generating are Bayesian credible intervals, because the object of interest is associated with a probability distribution, and the variance of interest here is generated from the posterior distribution.

So what we are computing is not identical to either the "confidence interval" or "prediction interval" that Hyndman refers to. Note that, similar to Wooldridge above, two of the leading books on the econometrics of state space models do not make a distinction between confidence and prediction intervals. Durbin and Koopman (2012) always uses the term "confidence interval" for this interval, while Harvey (1989) refers to them as "prediction (confidence) intervals" (see pg. 31). Meanwhile, there are methods to compute intervals that do account for the uncertainty from parameter estimation (either via frequentist or Bayesian methods) in state space models, but those are not what Durbin and Koopman / Harvey are referring to by either confidence or prediction intervals.

Finally, let me note that we can cast the linear regression model itself as a state space model, with:

d = c = 0, Z = x(t)', T = I, Q = 0

so that we have:

y(t) = x(t)' alpha(t) + e(t)

alpha(t) = alpha(t - 1)

alpha(1) ~ N(0, k I)

where k -> infinity (i.e. a diffuse prior).

In this case, our unobserved state vector alpha(t) is identical to the estimator b from OLS, and so the interval that we compute Var[y(t+h)] = Var[Z alpha(t+h) + e(t+h)] = Var(x(t+h)' alpha(t+h) + e(t+h)] is, in this case, identical to the "prediction interval" as defined by Hyndman.

I think that one key feature here is that the Kalman filter is essentially a Bayesian filter for the unobserved state vector alpha(t), in that it starts with a prior for alpha(1) and then performs Bayesian updating to construct posterior estimates for each of the alpha(t). This is one reason why the semantic distinctions that Hyndman is making run into trouble in the context of state space models, because there are now more moving parts than just thinking about "variance of the estimator b" versus "variance of the estimator b combined with variance of the error term".

Hope that helps,

Chad

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BC7U7UjmoyZty-Pg59B476wZ3A3HgPXs-ergCg%2BVm1vig%40mail.gmail.com.

josef...@gmail.com

unread,

Mar 23, 2022, 1:22:32 PM3/23/22

to pystatsmodels

To clarify

Does tsa in statsmodels distinguish between

confidence interval for y_hat (conditional expectation)

confidence interval for y (value of endog) (i.e. some kind of prediction interval)

or is it always the second across tsa in statsmodels ?

(

one problem for my parts of statsmodels is the normality assumption behind prediction intervals.

There is no law of large numbers for a single observation, so those standard prediction intervals are only accurate if the distribution of an observation is approximately gaussian.

This will not be the case for discrete distributions for example, Poisson, Binomial, or distributions on the positive real line.

On the other hand, parameter estimates by least squares or quasi-MLE are much more robust to distributional assumptions.

)

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAGxqfE85POMvH4Kfu7%2B28YuvGehtDOtHKsYXmLrCU%3DoziNE4Gw%40mail.gmail.com.

Chad Fulton

unread,

Mar 23, 2022, 1:59:57 PM3/23/22

to Statsmodels Mailing List

In v0.13, you can only get the second version.

In the development version, we have recently added more flexibility to the state space model predictions, and you can specify the first version using the argument `signal_only=True` in `get_prediction`, `get_forecast`, etc. (in state space models, the Z alpha(t) term is sometimes called the "signal").

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BDZ%3D6Boy_u0xAz15nvV-1SG%3D-tkaOJM8YxCcWPEq%2BRFZQ%40mail.gmail.com.

josef...@gmail.com

unread,

Mar 23, 2022, 3:20:03 PM3/23/22

to pystatsmodels

On Wed, Mar 23, 2022 at 1:59 PM Chad Fulton <chadf...@gmail.com> wrote:

In v0.13, you can only get the second version.

If the PredictionResults confint, e.g. in SARIMAX, is a prediction interval, then the naming conventions with postfix `_mean` is misleading

https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.mlemodel.PredictionResults.html

e.g. in linear models, we have se_mean and se_obs

https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.PredictionResults.html

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAGxqfE9wXi8i_P30_CZgC9ty3wE%3DaW6C5Q3GQgVHsBHKzPW6MQ%40mail.gmail.com.

Reply all

Reply to author

Forward