Using SARIMAX apply on new data - first value

242 views
Skip to first unread message

Samuel

unread,
Sep 4, 2019, 1:15:37 PM9/4/19
to pystatsmodels
Hello,

I'm currently using a very nice, ideal time series to fit a SARIMAX model, and then apply it to a noisier but similar time series to predict. This works quite well, especially as SARIMAX couldn't fit well on the noisy data. The only issue is when I make a prediction, for instance with a (0,0,1)x(0,1,0,101) model with 'ct' trend (which works much better than just 't' or 'c'), the first value in the prediction is way off, while it eventually converges to good predictions. In some cases this initial value is throwing off the entire set of predictions. My guess is it has something to do with the constant term. Does anyone know why the initial prediction is so far off, and how to mitigate this?

Thanks for the help!

Sam

Chad Fulton

unread,
Sep 4, 2019, 9:32:14 PM9/4/19
to Statsmodels Mailing List
Hi Sam,

This is a little difficult to diagnose without seeing an example of exactly what you're doing.  Can you post an example?

Best,
Chad 

Samuel

unread,
Sep 5, 2019, 10:58:10 AM9/5/19
to pystatsmodels
Hi Chad,

Thanks for the response. Here's the data that I used to fit with in sample prediction:
Here I apply the model to another time series, with out of sample forecasting starting at x=404:
You can see the first value is off, but by the time it gets to out of sample it generally works. Here you can see the issue (out of sample forecasting again starting at x=404):
Any idea why it's doing this or how to fix it?

Thanks again, really appreciate it!

Sam

Chad Fulton

unread,
Sep 5, 2019, 7:15:25 PM9/5/19
to Statsmodels Mailing List
On Thu, Sep 5, 2019 at 10:58 AM Samuel <sbrown...@gmail.com> wrote:
Hi Chad,

Thanks for the response. Here's the data that I used to fit with in sample prediction:
Here I apply the model to another time series, with out of sample forecasting starting at x=404:
You can see the first value is off, but by the time it gets to out of sample it generally works. Here you can see the issue (out of sample forecasting again starting at x=404):
Any idea why it's doing this or how to fix it?

Thanks again, really appreciate it!

Sam


Hi Sam,

Sorry, I should have been more clear - can you should the code you're using to do this, including your setting up and fitting the first model, and setting up and smoothing the second model?

In fact, if you could give a minimal working example with your actual data or some test data, that would be very helpful.

Best,
Chad 

Samuel

unread,
Sep 6, 2019, 10:37:09 AM9/6/19
to pystatsmodels
Sorry, here are two time series and the code I used:
with statsmodels version '0.11.0dev0+482.gfad69bc'
```python
from statsmodels.tsa.statespace.sarimax import SARIMAX
model=SARIMAX(ts1,order=(0, 0, 1),seasonal_order=(0,1,0,101),trend='ct',maxiter=10000)
model_fit=model.fit()

newmodel=model_fit.apply(ts2[:404]) #404:505 supposed to be out of sample
preds=newmodel.predict(0,505)
```
ts1.csv
ts2.csv

Chad Fulton

unread,
Sep 9, 2019, 5:25:06 PM9/9/19
to Statsmodels Mailing List
On Fri, Sep 6, 2019 at 10:37 AM Samuel <sbrown...@gmail.com> wrote:
Sorry, here are two time series and the code I used:
with statsmodels version '0.11.0dev0+482.gfad69bc'
```python
from statsmodels.tsa.statespace.sarimax import SARIMAX
model=SARIMAX(ts1,order=(0, 0, 1),seasonal_order=(0,1,0,101),trend='ct',maxiter=10000)
model_fit=model.fit()

newmodel=model_fit.apply(ts2[:404]) #404:505 supposed to be out of sample
preds=newmodel.predict(0,505)
```


Thanks!

Basically, the issue is that the series do not appear to be generated by the same parameters, particularly the intercept, but possibly also in the trend. This means that fitting them on one dataset and then applying them to the other is unlikely to provide good predictions.

One simple way to see this is to look at the means of the differenced variables:

print(pd.Series(ts1).diff(101).mean())
print(pd.Series(ts2).diff(101).mean())

yields: 

5841.73
148.10

There are a variety of things you could try here (e.g. taking longs might help, and taking one non-seasonal difference might help), but at the end of the day, the series just don't seem to be similar enough, so I think you'll need to refit the parameters.

Best,
Chad
Reply all
Reply to author
Forward
0 new messages