Still trying with ARIMA and ARMA a plea for help!

217 views
Skip to first unread message

Dartdog

unread,
Nov 18, 2013, 11:44:11 AM11/18/13
to pystat...@googlegroups.com

Have spent the last few days reading up on ARMA and ARIMA.. to limited use. There are very few completely worked examples on the web, and none to speak of using StatsModels.


My current notebook is here http://nbviewer.ipython.org/7473989


I’d like to fix that but need some help. I think that the resulting notebook would be a great resource for Statsmodels users as well as those coming up to speed with the procedures.


There are clearly two parts, one using the tool, getting them to work and understanding the output,

and two, developing useful accurate models.


I’m at this point more concerned with the former, explaining the tools.


Note that I have done very accurate forecasts with primitive methods see here http://tbrander.wordpress.com/2013/10/14/september-birmingham-area-sales-start-of-a-strong-fall-season/

It is a slightly different dataset but the ytd error is under 1% (yup I realize it may be chance!)


So I have posted my best take at an Ipython notebook see above link, that shows my attempt at using ARIMA and ARMA to do projections of the statewide home sales in Alabama.


In particular how do I get cell #9 and #10 to work?


At this point I’m more concerned with getting the StatsModels tools working than precise models, I have embedded a series of questions in the notebook.


Found a great tutorial for Arma in R here http://www.youtube.com/watch?v=zFo7QixEKvg

it seems as had been suggested that the data should be "Differenced' to remove the business cycle effects. It indicates that the Arima procedure can do the diff by passing 1 as the first argument (which I did in the referenced Notebook).


I have added headings and numbers to the notebook for easy reference. so Please either Download and add/ fix it or post comments here and I’ll update it.


Chad Fulton

unread,
Nov 18, 2013, 4:21:33 PM11/18/13
to Statsmodels Mailing List
Hi,

Here is some quick help: http://nbviewer.ipython.org/7535278.

Issues with using Statsmodels:
- The data was not sorted in ascending order by date, which is why predict failed (I don't know if this is intended behavior, but it is existing behavior :)
- Cell #10 does not work because ARIMAResults has no built-in plotting method. I'm not sure what you were trying to plot there, however. Possibly you meant predict1_units.plot() ?

Issues with econometrics:
- Your data has two components, a trend (the unit root) and a cycle (the stationary / ARMA portion). One way to deal with a unit root is to difference the data, which the ARIMA function does with d=1. However, the cycles themselve likely depend on more than the previous 1 period, so an ARIMA(1,1,1) is not necessarily the correct model; I used ARIMA(12,1,0), but you would likely want to test for appropriate lag length.
- The Augmented Dickey Fuller test (tsa.adfuller) can be used to test for the presence of a unit root.
- Since your data has a unit root, and ARMA(12,0) model on the original data will not be useful. However, an ARMA(12,0) on the differenced data would be (which is what an ARIMA(12,1,0) is).

Chad

Dartdog

unread,
Nov 18, 2013, 5:48:55 PM11/18/13
to pystat...@googlegroups.com
Awesome! Thank you so much I hope it will help others as well. The best I've seen so far on this topic. More through that what else we have in statsmodels
This should be added to the Wiki examples.

josef...@gmail.com

unread,
Nov 18, 2013, 8:21:55 PM11/18/13
to pystatsmodels
Chad, Thanks for looking into this, (I'm currently not up to date to
using ARMA or ARIMA)

One comment: The seasonality looks dominant. In that case using
annual/seasonal differencing instead of differencing to the previous
period (month) would most likely be more appropriate.

Josef

Chad Fulton

unread,
Nov 19, 2013, 3:03:15 PM11/19/13
to Statsmodels Mailing List
On Mon, Nov 18, 2013 at 5:21 PM, <josef...@gmail.com> wrote: 

Chad, Thanks for looking into this, (I'm currently not up to date to
using ARMA or ARIMA)

One comment: The seasonality looks dominant. In that case using
annual/seasonal differencing instead of differencing to the previous
period (month) would most likely be more appropriate.

 This is a good point.

The stats.normaltest on the ARMA(12,0) residuals from the original notebook strongly rejected the null of normal residuals, and part of that was the presence of the unit root, but I noticed that running the stats.normaltest even on the residuals from the ARIMA(12,1,0) rejected the null of normality at 1%. I suspect that doing a better job with seasonality as Josef suggests would improve things.
Reply all
Reply to author
Forward
0 new messages