SARIMAX: Do exog vars need to be made stationary before fitting?

440 views
Skip to first unread message

Adrian Scholl

unread,
Aug 25, 2022, 7:05:07 AM8/25/22
to pystatsmodels
Hello everyone,

I used statsmodels to build a SARIMAX (regression with SARIMA errors) model for my thesis on conflict prediction.

y/endog: sum of monthly fatalities
X/exog: socio-economic indicators from world bank and IMF
Aggregation: country-level; monthly

Model pipeline:
  1. Imputation of X: linear interpolation/last observation carried forward
  2. Standardization of X: sklearn.StandardScaler
  3. PCA for dimensionality reduction of X
  4. SARIMAX model fitting with y and n PCA main components of X
My questions: 
  • Do the exog vars in X need to be stationary or can SARIMAX handle non-stationary exog vars?
  • Is there an automated way to make X stationary (e.g. by differencing) and ready for the SARIMAX model without human interference?
  • How would non-stationary exog vars impact the model results and predicitons?

I appreciate any help!

Best regards,
Adrian


Chad Fulton

unread,
Aug 26, 2022, 11:44:21 PM8/26/22
to pystat...@googlegroups.com
Hi Adrian,

One of the main concerns with non-stationary time series is that even if two such series are completely unrelated, they can display similar trending behavior over short time horizons.  This is called spurious regression and one result can be that the estimated coefficients are large and appear to be "statistically significant".  This is problematic for typical inference, but also for forecasting, since there is no causal relationship between the two trending series. If they are both truly non-stationary then they will eventually diverge again, leading to poor forecasts.

However, SARIMAX does not require that your exog variables be stationary - the model will usually run with no problems with trending exog variables. As noted above, the problems occur if the model estimates a spurious relationship.

There are procedures for automatically transforming data to fit various requirements, including stationarity, and these typically fall under the description of "automatic forecasting" (for example, the pmdarima package).  Usually differencing is done until the series satisfies some unit root test, such as the KPSS or Augmented Dickey–Fuller test.

Best,
Chad

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/b22312b3-babe-4a66-9c4a-c69af45584e7n%40googlegroups.com.

Adrian Scholl

unread,
Aug 29, 2022, 7:35:01 AM8/29/22
to pystatsmodels
Hi Chad,

thanks for your fast and detailed response!

1. To prevent problems caused by spurious relationships, I think about using the first difference (absolute change) or logchange of my exog vars instead of the absolute values. 
However, the last step of my model fitting pipeline is a PCA. Do you know whether the PCA can "destroy" the stationarity properties of my exog vars?

2. I am also in the process of formally writing down the regression with SARIMA errors model. I already went through the sarimax.py code and found the definition of the initial transition, design and selection matrix. However, I couldn't figure out the rest of the model and didn't find any source (no website, book or paper) that mathmatically formulates all necessary matrices needed to fully understand the state space model.
Do you have any resources that specifically define the regression with SARIMA errors state space model?

I think that would be very helpful, also for improving the SARIMAX documentation.

Thanks a lot!
Adrian




Reply all
Reply to author
Forward
0 new messages