On Thu, Nov 28, 2013 at 7:49 PM, <
user1...@gmail.com> wrote:
> Do you mean something like this?
>
> (a,b,c) = (0,0,4)
> olsResults = sm.OLS(train[endogenous],train.drop(endogenous,axis=1)).fit()
> prediction = olsResults.predict(x.drop(endogenous,axis=1))
> arima = tsa.ARIMA(train[endogenous],order=(a,b,c),freq='B')
If I understand your variables correctly, I meant:
arima = tsa.ARIMA(olsResults.resid, order=(a, b, c), freq='B')
> results = arima.fit(transparam=True, dynamic=True)
> prediction = prediction[b:] + results.predict(start=b,end=len(x)-1)
>
> OLS by itself did much better than ARIMA by itself. Doing the above
> procedure negligibly improved the results from OLS. Also, I don't quite
> understand the intuition behind this. Is this a way to approximate ARIMA?
using resid in ARIMA will give you approximately the full ARIMAX
(Doing it in two stages won't be quite right. OLS will not give
unbiased or consistent parameter estimates unless the exog are
strictly or strongly (?) exogenous or there is no autocorrelation in
the residuals. The standard errors are also wrong.)
The idea is that you use OLS to get the effect from your explanatory
variables. However, if there is autocorrelation in the residuals, then
the past residuals still contain information that you can use to get a
better short-term forecast. Using ARMA on the residuals can capture
that part of the forecast.
ARIMAX combines both in an efficient way.
ARIMA or ARIMAX is largely used for univariate forecasting. If you
have to forecast the next several periods, then you still need also a
forecast of your explanatory variables. It might be difficult to
forecasts the explanatory variables well enough that we actually do
better than the univariate ARIMA forecast.
(Alternatively, we could also regress on our explanatory variables
after they have been lagged by the number of periods that we want to
forecast, then we don't need future explanatory variables.)
Josef