Singular matrix error using MixedLM with categorical data

1,305 views
Skip to first unread message

fortozs

unread,
Apr 9, 2015, 2:19:17 PM4/9/15
to pystat...@googlegroups.com
I don't think its related to using categorical data, but I am getting a singular matrix error when I try to fit my model. I am trying the following with 'X3' being categorical (string):

data = data[['y', 'X1', 'X2', 'X3', 'X4', 'X5']].dropna()

model
= sm.MixedLM.from_formula("np.log(y) ~ X1 + X2 + X3",
                                data
,
                                re_formula
= 'X4',
                                groups
=data['X5'])


model.fit()


when I run this I get

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/statsmodels/regression/mixed_linear_model.py", line 1732, in fit

    pcov = np.linalg.inv(-hess)

  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 520, in inv

    ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)

  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular

    raise LinAlgError("Singular matrix")

numpy.linalg.linalg.LinAlgError: Singular matrix



Does anyone know what I might be doing wrong? Thanks.

fortozs

unread,
Apr 10, 2015, 10:09:19 AM4/10/15
to pystat...@googlegroups.com
Sorry about the terrible formatting of the original message. I have the model working now. I think I misunderstood the syntax compared to R. I was trying to do nested random effects of the form ~1 | X4 / X5, where X4 is groups and X5 is subgroups. I can get the model to run by using:

data = data[['y', 'X1', 'X2', 'X3', 'X4', 'X5']].dropna()

model 
= sm.MixedLM.from_formula("np.log(y) ~ X1 + X2 + X3", 
                                data
,
 
                                re_formula 
= '1', # not really necessary 
                                groups=data['X4']|
data['X5'])

Is this the correct syntax for nested random effects? The above gives me the same AIC as setting groups to data.index, but the standard errors are different. Thanks.

fortozs

unread,
Apr 10, 2015, 11:11:40 AM4/10/15
to pystat...@googlegroups.com
So the pipe character is apparently a Python operator, so that doesn't solve my problem. Does anyone know if you can have a nested random effects structure using statsmodels?

Nathaniel Smith

unread,
Apr 10, 2015, 11:13:51 AM4/10/15
to pystatsmodels

Can you post what your data looks like, e.g. the first ten entries in X1, X2, etc?

fortozs

unread,
Apr 10, 2015, 11:55:07 AM4/10/15
to pystat...@googlegroups.com
Sure, no problem. I'm calling my groups:  group and subgroup for clarity.

       X1            X2     X3     group  subgroup
0    0.036021  30.0  32.9       1        1-1
1    0.034346  30.0  32.9       1        1-1
2    0.036859  30.0  32.9       1        1-1
3    0.054450  30.0  32.9       1        1-1
4    0.070366  30.0  32.9       1        1-1
5    0.066178  30.0  32.9       1        1-1
6    0.050262  30.0  32.9       1        1-1
7    0.061152  30.0  32.9       1        1-1
8    0.078743  30.0  32.9       1        1-1
9    0.038534  30.3  32.8       1        1-3
10        NaN  30.3  32.8       1        1-3
11   0.041047  30.3  32.8       1        1-3
12   0.036859  30.3  32.8       1        1-3
13   0.058639  30.3  32.8       1        1-3
14        NaN  30.3  32.8       1        1-3
15   0.074555  30.3  32.8       1        1-3
..
115  0.133333  30.7  32.9       6        6-4
116  0.135556  30.7  32.9       6        6-4
117  0.131111  30.7  32.9       6        6-4
118  0.146667  30.7  32.9       6        6-4
119  0.184444  30.7  32.9       6        6-4
120  0.168889  30.7  32.9       6        6-4
121  0.157778  30.7  32.9       6        6-4
122  0.173333  30.7  32.9       6        6-4
123  0.170000  30.7  32.9       6        6-4
124  0.172222  30.7  32.9       6        6-4
125  0.170000  30.7  32.9       6        6-4
126  0.207778  30.7  32.9       6        6-4
127  0.201111  30.7  32.9       6        6-4
128  0.217778  30.7  32.9       6        6-4
129  0.201111  30.7  32.9       6        6-4

Kerby Shedden

unread,
Apr 10, 2015, 11:06:42 PM4/10/15
to pystat...@googlegroups.com
You can't do nested random effect in MixedLM.  Hopefully this will be addressed in the next few months.  You can do nested random effects in GEE, using the Nested covariance structure.

The closest you can get in MixedLM is to pass your 'group' variable as the 'group' argument to MixedLM, and then in re_formula pass a matrix that contains indicators for each distinct value in your 'subgroup' variable.  Then use the 'free' argument to make the random effects independent.  The missing piece is that you cannot force the random effects (for the subgroups) to have the same variance parameter.

fortozs

unread,
Apr 11, 2015, 12:28:20 PM4/11/15
to pystat...@googlegroups.com
Thanks for the information. I can use R until this is implemented, but I look forward to being able to do this in Python. It might be time for me to learn how to use RPy. That's a very good info about how to force statsmodels to make the random effects independent. I appreciate everyone's help.
Reply all
Reply to author
Forward
0 new messages