Hello !
I'm currently developing some Time Series models using SARIMAX.
I only have 2 weeks worth of data at this point but still wanted to start the modeling process.
The grain of the data is Hourly over the 2 weeks (full 7 days per week).
Based on the seasonal order documentation I've read, I thought it appropriate to use 24 as the seasonal specification to account for the variance of the hour by day.
The SARIMAX model specification is:
mod = sm.tsa.statespace.SARIMAX(view_hour['distinct_freq_sum'], order=(5,0,0),seasonal_order=(3,0,0,24))
The set of warnings are:
1. ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
2. Covariance matrix calculated using the outer product of gradients (complex-step).
3. Covariance matrix is singular or near-singular, with condition number 1.82e+31. Standard errors may be unstable.
I've run the standard diagnostics pre and post model execution.
In respect to post model execution, although the residuals are not normally distributed(i.e., JB test, stats.normaltest) the residuals do not exhibit any serial correlation (i.e., Durbin-Watson).
Below, is the output graph. Upon closer inspection, you will see that the accuracy of the model is not very good. The RMSE is 3218.933
As I write this I'm beginning to think that the primary problem may be the weekends.
So, in summary, I'd like to better understand what is driving the warnings and how can seasonal order take care of the weekends.
I've attached the csv data file if you'd like to recreate. The code to create the datetime index to allow the file to run in SARIMAX is:
view_hour['datetime'] = pd.to_datetime(view_hour['date_hour'])
view_hour.reset_index(inplace=True)
view_hour = view_hour.set_index('datetime')
view_hour.sort_index(inplace=True)
Thanks for your time and consideration.
Steve