ValueError: On entry to DLASCL parameter number 4 had an illegal value

3,083 views
Skip to first unread message

Jory

unread,
Jan 3, 2018, 8:13:00 PM1/3/18
to pystatsmodels
I am getting this error message when using statsmodels.tsa.arima_model.ARIMA.

I believe this means there is a nan or inf value in the data that I am passing to the ARIMA function, however I don't understand why this error is thrown depending on the order passed to the ARIMA function. For example, when using the same dataset, (11,0,1) can produce a forecast, but (11,0,5) throws an error.

The error message is:

File "<ipython-input-52-2b1315f831ff>", line 26, in run_ARIMA
    model_fit = model.fit(disp=0)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.py", line 932, in fit
    callback=callback, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/base/model.py", line 425, in fit
    full_output=full_output)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/base/optimizer.py", line 184, in _fit
    hess=hessian)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/base/optimizer.py", line 382, in _fit_lbfgs
    **extra_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 193, in fmin_l_bfgs_b
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 328, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 273, in func_and_grad
    f = fun(x, *args)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/base/model.py", line 403, in <lambda>
    f = lambda params, *args: -self.loglike(params, *args) / nobs
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.py", line 761, in loglike
    return self.loglike_kalman(params, set_sigma2)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.py", line 771, in loglike_kalman
    return KalmanFilter.loglike(params, self, set_sigma2)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tsa/kalmanf/kalmanfilter.py", line 649, in loglike
    R_mat, T_mat)
  File "statsmodels/tsa/kalmanf/kalman_loglike.pyx", line 342, in statsmodels.tsa.kalmanf.kalman_loglike.kalman_loglike_double (statsmodels/tsa/kalmanf/kalman_loglike.c:5245)
  File "statsmodels/tsa/kalmanf/kalman_loglike.pyx", line 74, in statsmodels.tsa.kalmanf.kalman_loglike.kalman_filter_double (statsmodels/tsa/kalmanf/kalman_loglike.c:2572)
  File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 1617, in pinv
    u, s, vt = svd(a, 0)
  File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 1359, in svd
    u, s, vt = gufunc(a, signature=signature, extobj=extobj)

josef...@gmail.com

unread,
Jan 3, 2018, 9:10:22 PM1/3/18
to pystatsmodels
As far as I remember a case like this hasn't shown up yet. So there are no good guesses for what might be the problem.

Do you have nans in your data? 
If it works for some lag order without problems, then nan or inf are more likely generated by some corner case of the data. The larger lag order might make the model overparameterized so that it runs into some corner problems. For example one possibility would be perfect prediction and variance of zero which might result in a zero division problem.

That's just a rough guess, to find out what's going on would require using the debugger to inspect what values cause the problems when evaluating the loglikelihood function.
You can also check SARIMAX to see if it behaves the same way.

We have currently no set of examples that show when we run into nans or infs during the computation and that are not provided by the users. We need those test and corner cases for the tsa models to get a better idea about what can go wrong.
If your example looks that way, then a reproducible example in a github issue would be helpful to investigate those cases.

Josef


Chad Fulton

unread,
Jan 3, 2018, 11:43:28 PM1/3/18
to Statsmodels Mailing List
It looks like it might be an issue with a bad parameter being passed - I think the error is occurring during Kalman filter initialization in `pinv(identity(r**2) - kron(T_mat, T_mat))`, and T_mat is a matrix that only contains the autoregressive parameters.

For example, if I do the following:

np.random.seed(1234)
endog = np.random.normal(size=100)
mod = sm.tsa.ARIMA(endog, order=(2, 0, 0))
mod.fit(trend='nc')
mod.loglike_kalman(np.r_[[np.nan, np.nan]])

The last call here gives me a LinAlgError that the SVD did not converge. Judging by the line numbers from the exception trace, I have a newer Numpy though, so maybe your Numpy isn't as safe with the NaNs or else maybe my LAPACK (Mac OS X) handles the error differently.

Can you run this code and see what error you get?

I would guess that when you use the (11, 0, 5) model, the parameters are probably not well identified and so either the starting parameters have a NaN in them or else the optimizer or parameter transformation function run into a problem somewhere.

Chad Fulton

unread,
Jan 3, 2018, 11:52:14 PM1/3/18
to Statsmodels Mailing List

It looks like it might be an issue with a bad parameter being passed - I think the error is occurring during Kalman filter initialization in `pinv(identity(r**2) - kron(T_mat, T_mat))`, and T_mat is a matrix that only contains the autoregressive parameters.

For example, if I do the following:

np.random.seed(1234)
endog = np.random.normal(size=100)
mod = sm.tsa.ARIMA(endog, order=(2, 0, 0))
mod.fit(trend='nc')
mod.loglike_kalman(np.r_[[np.nan, np.nan]])

The last call here gives me a LinAlgError that the SVD did not converge. Judging by the line numbers from the exception trace, I have a newer Numpy though, so maybe your Numpy isn't as safe with the NaNs or else maybe my LAPACK (Mac OS X) handles the error differently.

Can you run this code and see what error you get?

I would guess that when you use the (11, 0, 5) model, the parameters are probably not well identified and so either the starting parameters have a NaN in them or else the optimizer or parameter transformation function run into a problem somewhere.

When I try a similar thing with SARIMAX, I get some warnings, and all the output is NaN. I can't remember if this is what we want to have happen in this case, or if we would rather raise an exception for NaN parameters?

josef...@gmail.com

unread,
Jan 5, 2018, 10:27:34 PM1/5/18
to pystatsmodels
IMO, parameters should not have nans (except maybe intentionally in the final result or if some auxiliary parameters that are not used in the computation are nan). 
The best would be to find cases where this happens and then try to remove the source of the nans or infs, e.g. protect against zero division errors.
Otherwise we should raise, checking params for nans shouldn't be expensive and (I guess, almost all) scipy optimizers are not able to recover from nan parameters. Several of them are able to recover from a nan value of the objective function.
If we raise an exception, then it might also make it easier to debug those cases.

Josef



Reply all
Reply to author
Forward
0 new messages