statsmodels.tsa.arima_model.ARMA.predict doesn't predict out-of-sample data

593 views
Skip to first unread message

shen gao

unread,
Aug 7, 2015, 8:59:16 AM8/7/15
to pystatsmodels
Hello,

I fitted data with ARMA model and I want to predicts a couple steps further. 

        data = get_pricing([symbol],start_date= histdate, end_date = todayDate, frequency='daily')
        df =  pd.DataFrame({"value": data.price.values.ravel()},index = data.major_axis.ravel())
        result = df.pct_change().dropna()
        
        degree = {}
        for x in range(0,5):
            for y in range(0,5):
                try:
                    arma = ARMA(result, (x,y)).fit()
                    degree[str(x) + str(y)] = arma.aic
            
                except:
                    continue
                   
        dic= sorted(degree.iteritems(), key = lambda d:d[1])
            
        p = int(dic[0][0][0])
        q = int(dic[0][0][1])
        arma = ARMA(result, (p,q)).fit()
        predicts = arma.predict()
        ex = np.array([1,14])
        predictoos = arma.predict(startdate, enddate,ex)

The in sample prediction works well. But, I've got an error for the last line.

ValueError                                Traceback (most recent call last)
<ipython-input-210-b387ec79e894> in <module>()
     27         predicts = arma.predict()
     28         ex = np.array([1,14])
---> 29         predicts = arma.predict(startdate, enddate,ex)
     30         #plot
     31         xaxis = data.major_axis.ravel()[1:]

/usr/local/lib/python2.7/dist-packages/statsmodels/base/wrapper.pyc in wrapper(self, *args, **kwargs)
     90         results = object.__getattribute__(self, '_results')
     91         data = results.model.data
---> 92         return data.wrap_output(func(results, *args, **kwargs), how)
     93 
     94     argspec = inspect.getargspec(func)

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.pyc in predict(self, start, end, exog, dynamic)
   1439 
   1440     def predict(self, start=None, end=None, exog=None, dynamic=False):
-> 1441         return self.model.predict(self.params, start, end, exog, dynamic)
   1442     predict.__doc__ = _arma_results_predict
   1443 

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.pyc in predict(self, params, start, end, exog, dynamic)
    710         # will return an index of a date
    711         start = self._get_predict_start(start, dynamic)
--> 712         end, out_of_sample = self._get_predict_end(end, dynamic)
    713         if out_of_sample and (exog is None and self.k_exog > 0):
    714             raise ValueError("You must provide exog for ARMAX")

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.pyc in _get_predict_end(self, end, dynamic)
    651     def _get_predict_end(self, end, dynamic=False):
    652         # pass through so predict works for ARIMA and ARMA
--> 653         return super(ARMA, self)._get_predict_end(end)
    654 
    655     def geterrors(self, params):

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/base/tsa_model.pyc in _get_predict_end(self, end)
    158                     freq = self.data.freq
    159                     out_of_sample = datetools._idx_from_dates(dates[-1], dtend,
--> 160                                             freq)
    161                 else:
    162                     if freq is None:

/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/base/datetools.pyc in _idx_from_dates(d1, d2, freq)
     92     from pandas import DatetimeIndex
     93     return len(DatetimeIndex(start=d1, end=d2,
---> 94                              freq = _freq_to_pandas[freq])) - 1
     95 
     96 _quarter_to_day = {

/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
     86                 else:
     87                     kwargs[new_arg_name] = new_arg_value
---> 88             return func(*args, **kwargs)
     89         return wrapper
     90     return _deprecate_kwarg

/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, **kwargs)
    223 
    224         if data is None and freq is None:
--> 225             raise ValueError("Must provide freq argument if no data is "
    226                              "supplied")
    227 

ValueError: Must provide freq argument if no data is supplied

I've been thinking this for some days,but I couldn't figure it out. Thanks for help in advance!

josef...@gmail.com

unread,
Aug 7, 2015, 2:59:09 PM8/7/15
to pystatsmodels
somewhere the time frequency `freq` is not defined properly and pandas cannot create a DatetimeIndex.

That's all I can guess, I don't have a working example and I never tried to figure out date and time handling details in pandas and statsmodels.tsa.

Josef

shen gao

unread,
Aug 10, 2015, 8:15:22 AM8/10/15
to pystatsmodels
Thanks, I see the problem as well, but I don't know how to define the "freq". Could you cast a light on,please?

Padarn Wilson

unread,
Aug 11, 2015, 4:01:54 AM8/11/15
to pystatsmodels


On Monday, August 10, 2015 at 1:15:22 PM UTC+1, shen gao wrote:
Thanks, I see the problem as well, but I don't know how to define the "freq". Could you cast a light on,please?


I've used this in statsmodels a bit - I'll look into your specific problem in a bit more detail soon, but the 'freq' is a property of the 'index' of your pandas.Series object.

    
series.index.freq

You might have to make a new index, or use the 'reindex' function of a series.

Padarn
Reply all
Reply to author
Forward
0 new messages