automatic forecasting - tools ?

86 views
Skip to first unread message

josef...@gmail.com

unread,
Jun 1, 2016, 11:05:09 AM6/1/16
to pystatsmodels
I never looked systematically into automatic forecasting. However, it looks like statsmodels is getting more use in this application area.
eg.
and stackoverflow questions.

We have several PRs in progress that will provide parts of the functionality that can be used for forecasting and for automatic model selection.

This would be a great area for more user/developers to contribute and help getting better coverage in this area.
Helping out with writing unit tests, usage and code review, identifying and/or implementing missing functions. Writing some high level functions that bring things together for the *automatic* part.


For example Hyndman is using unit root (KPSS) tests to decide between differencing and modeling through explanatory variables (trend, season dummies). ...

One problem are outliers when we work with data that is not so clean or disturbed by special events, e.g.


The current examples that I found on the internet use mainly SARIMAX, but I think there are several other statespace models that could be easily used for automatic forecasting. 
We need some recipes in that direction.
Hyndman is the big forecasting person in R, but we don't need to restrict ourselves to the models that he prefers to write and use.



 a quick example:
estimating a variance stabilizing box-cox transformation parameter
We have a PR in progress for this.
The method is by Guerrero which is also in R's forecast package.

I was preparing some examples and tried something similar.
Here I just use pandas rolling windows to find the box cox lambda that has the least significant trend.
(The p-values itself are not correct because they don't take autocorrelation from the rolling window into account. But we need them only for ranking different values of the box cox parameter. Guerrero uses non-overlapping windows to avoid extra serial correlation.)

def box_cox_rolling_coeffvar(box_cox_param, endog, freq):
    roll_air = special.boxcox(endog, box_cox_param).rolling(window=freq)
    y = roll_air.std() 
    m = roll_air.mean()
    x = sm.add_constant(m)
    res_rlm = sm.RLM(y, x, missing='drop').fit()
    return res_rlm

endog = df_air['AirPassengers']
freq = 12
tt = [(lam, box_cox_rolling_coeffvar(lam, endog, freq).pvalues[1]) for lam in np.linspace(-1, 1, 21)]

tt = np.asarray(tt)
print(tt)
print(tt[tt[:,1].argmax()])

[-0.2         0.62121147]

which is close to R's guerrero estimate of -0.29


Josef

josef...@gmail.com

unread,
Jun 3, 2016, 10:29:29 AM6/3/16
to pystatsmodels
Here are the slides I used yesterday for a talk on forecasting


It's the first time I tried jupyter notebooks directly to build the slides. That works pretty well, except that there is not option for fontsize.

Josef

Niels Wouda

unread,
Jul 7, 2016, 9:22:12 AM7/7/16
to pystatsmodels
As a part of my job, I've written quite a few of these nice little 'helper' functions, like automatic differencing based on unit root test results, etc.

For forecasting large amounts of time series of product sales data, manual forecasting quickly became intractable, so I had to speed things up. I believe Hyndman also wrote a method for parameter selection for ARIMA models, based on an information criterion. That should be quite doable.

I could look into this once I finish up on the transformations, and also contribute those differencing function in a new PR.

Op woensdag 1 juni 2016 17:05:09 UTC+2 schreef josefpktd:
Reply all
Reply to author
Forward
0 new messages