LinAlgError: Singular matrix using X_train and y_train in sm.Logit

3,481 views
Skip to first unread message

Krista Mar

unread,
Jun 3, 2019, 2:19:12 PM6/3/19
to pystatsmodels
For some reason, only when I am trying to fit the model with y_train and X_train, am I getting the singular matrix error message. If I fit with X and y, I am not getting the error. Does anyone know what the cause of this might be?


import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

y=df[y]
X=df[[x1,x2,x3,etc]]
X_scaled = StandardScaler().fit_transform(X.astype(float))
X_scaled_df=pd.DataFrame(X_scaled, index=X.index, columns=X.columns)
X_train, X_test, y_train, y_test = train_test_split(X_scaled_df,y, test_size=.20, random_state=2331)
model = sm.Logit(y_train, X_train).fit_regularized( max_iterations= 10000000)

LinAlgError
: Singular matrix


Optimization terminated successfully.    (Exit mode 0)
            Current function value: 0.48740005813165277
            Iterations: 52
            Function evaluations: 53
            Gradient evaluations: 52
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-84-b4f313add7d3> in <module>
----> 1 model = sm.Logit(y_train, X_train).fit_regularized( max_iterations= 10000000)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit_regularized(self, start_params, method, maxiter, full_output, disp, callback, alpha, trim_mode, auto_trim_tol, size_trim_tol, qc_tol, **kwargs)
    457                 full_output=full_output, disp=disp, callback=callback,
    458                 alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
--> 459                 size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
    460         if method in ['l1', 'l1_cvxopt_cp']:
    461             discretefit = L1BinaryResults(self, bnryfit)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in fit_regularized(self, start_params, method, maxiter, full_output, disp, callback, alpha, trim_mode, auto_trim_tol, size_trim_tol, qc_tol, qc_verbose, **kwargs)
    371                 method=method, maxiter=maxiter, full_output=full_output,
    372                 disp=disp, callback=callback, extra_fit_funcs=extra_fit_funcs,
--> 373                 cov_params_func=cov_params_func, **kwargs)
    374 
    375         return mlefit # up to subclasses to wrap results

~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\base\model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)
    469         cov_params_func = kwargs.setdefault('cov_params_func', None)
    470         if cov_params_func:
--> 471             Hinv = cov_params_func(self, xopt, retvals)
    472         elif method == 'newton' and full_output:
    473             Hinv = np.linalg.inv(-retvals['Hessian']) / nobs

~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in cov_params_func_l1(self, likelihood_model, xopt, retvals)
    391             H_restricted = H[nz_idx[:, None], nz_idx]
    392             # Covariance estimate for the nonzero params
--> 393             H_restricted_inv = np.linalg.inv(-H_restricted)
    394         else:
    395             H_restricted_inv = np.zeros(0)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\linalg\linalg.py in inv(a)
    530     signature = 'D->D' if isComplexType(t) else 'd->d'
    531     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 532     ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    533     return wrap(ainv.astype(result_t, copy=False))
    534 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
     87 
     88 def _raise_linalgerror_singular(err, flag):
---> 89     raise LinAlgError("Singular matrix")
     90 
     91 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

josef...@gmail.com

unread,
Jun 3, 2019, 5:32:12 PM6/3/19
to pystatsmodels
On Mon, Jun 3, 2019 at 2:19 PM Krista Mar <krist...@gmail.com> wrote:
For some reason, only when I am trying to fit the model with y_train and X_train, am I getting the singular matrix error message. If I fit with X and y, I am not getting the error. Does anyone know what the cause of this might be?

check the matrix rank of X_train.
Maybe when you split the data you loose full rank of X, for example if one column has only identical values left then it becomes collinear with the constant.

Josef


 
--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/455c3574-1094-4ae9-b8b1-9c707eddf7c0%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages