HELP: SARIMA Forecasting Model Not Working?

55 views
Skip to first unread message

Kyle Vincson Mabbayad

unread,
Sep 3, 2023, 10:41:33 AM9/3/23
to pystatsmodels
I am currently aiming to forecast the Carbon Emission up to the year 2030. The .csv file can be read as follows:

Year,Total Carbon Footprint
2018-5-31,16
2018-7-31,15
2018-12-31,92
2019-5-31,33
2019-7-31,25
2019-12-31,98
2020-5-31,31
2020-7-31,51
2020-12-31,104
2021-5-31,99
2021-7-31,44
2021-12-31,110
2022-5-31,175
2022-7-31,125
2022-12-31,116
2023-5-31,153
2023-7-31,55
2023-12-31,129

Using the code below:
model = SARIMAX(y, order=(0, 0, 0), seasonal_order=(1, 1, 0, 3))
model_fit = model.fit()

# Project the data for the years 2023 to 2030
predictions = model_fit.predict(start=y_to_train[0], end=+(3*6))

# Plot the data and predictions
plt.plot(y)
plt.plot(predictions, color='red')
plt.title('SARIMA Predictions')
plt.show()

Where, y contains the csv file, the graph looks like this:
Untitled.png

If I removed the steps in the predictions line, 
model = SARIMAX(y, order=(0, 0, 0), seasonal_order=(1, 1, 0, 3))
model_fit = model.fit()

# Project the data for the years 2023 to 2030
predictions = model_fit.predict()

# Plot the data and predictions
plt.plot(y)
plt.plot(predictions, color='red')
plt.title('SARIMA Predictions')
plt.show()

The resulting plot becomes:
Untitled.png

I am new to this. Are there other ways to extend the prediction line to 2030?

Thank you!

Chad Fulton

unread,
Sep 3, 2023, 12:44:18 PM9/3/23
to pystat...@googlegroups.com
Hello,

Your date index does not have a defined frequency, so the index for the forecasts are being set to incrementing integers.  When you plot these integers, they are intepreted by matplotlib as timestamps starting in 1970.  You should be receiving a warning indicating that your index has no frequency and so will not be used for forecasting when you run your code.

I guess you want your forecasts to also be for May 31, July 31, and December 31 for the next 6 years?

Regardless, since you do not have a regular frequency, the easiest thing is probably to drop your index before passing the data to statsmodels (e.g. `y.reset_index(drop=True)`, then perform your prediction / forecasting, and then set a new index that uses whatever the appropriate dates are.

A second option is to pass the desired index directly to the `predict` call, e.g.:

dta = [0.5, 1.0, 0.2, 0.7]
index = ['2020-05-31', '2020-07-31', '2020-12-31', '2021-05-31']
y = pd.Series(dta, index=pd.DatetimeIndex(index))

mod = sm.tsa.SARIMAX(y)
res = mod.smooth([0.5, 1.0])

fcast_index = ['2021-07-31', '2021-12-31', '2022-05-31', '2022-07-31']
predict_index = pd.DatetimeIndex(index + fcast_index)
res.predict(0, 7, index=predict_index)


Hope that helps,
Chad


--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/acc0b41d-f9af-43dc-989e-2fe5a7bf531dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages