Singular matrix error using MixedLM with categorical data

fortozs

unread,

Apr 9, 2015, 2:19:17 PM4/9/15

to pystat...@googlegroups.com

I don't think its related to using categorical data, but I am getting a singular matrix error when I try to fit my model. I am trying the following with 'X3' being categorical (string):

data = data[['y', 'X1', 'X2', 'X3', 'X4', 'X5']].dropna()

model = sm.MixedLM.from_formula("np.log(y) ~ X1 + X2 + X3", 
                                data, 
                                re_formula = 'X4', 
                                groups=data['X5'])

model.fit()

when I run this I get

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/statsmodels/regression/mixed_linear_model.py", line 1732, in fit

pcov = np.linalg.inv(-hess)

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 520, in inv

ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular

raise LinAlgError("Singular matrix")

numpy.linalg.linalg.LinAlgError: Singular matrix

Does anyone know what I might be doing wrong? Thanks.

fortozs

unread,

Apr 10, 2015, 10:09:19 AM4/10/15

to pystat...@googlegroups.com

Sorry about the terrible formatting of the original message. I have the model working now. I think I misunderstood the syntax compared to R. I was trying to do nested random effects of the form ~1 | X4 / X5, where X4 is groups and X5 is subgroups. I can get the model to run by using:

data = data[['y', 'X1', 'X2', 'X3', 'X4', 'X5']].dropna()

model = sm.MixedLM.from_formula("np.log(y) ~ X1 + X2 + X3",
data,

re_formula = '1', # not really necessary
groups=data['X4']|data['X5'])

Is this the correct syntax for nested random effects? The above gives me the same AIC as setting groups to data.index, but the standard errors are different. Thanks.

fortozs

unread,

Apr 10, 2015, 11:11:40 AM4/10/15

to pystat...@googlegroups.com

So the pipe character is apparently a Python operator, so that doesn't solve my problem. Does anyone know if you can have a nested random effects structure using statsmodels?

Nathaniel Smith

unread,

Apr 10, 2015, 11:13:51 AM4/10/15

to pystatsmodels

Can you post what your data looks like, e.g. the first ten entries in X1, X2, etc?

fortozs

unread,

Apr 10, 2015, 11:55:07 AM4/10/15

to pystat...@googlegroups.com

Sure, no problem. I'm calling my groups: group and subgroup for clarity.

X1 X2 X3 group subgroup

0 0.036021 30.0 32.9 1 1-1

1 0.034346 30.0 32.9 1 1-1

2 0.036859 30.0 32.9 1 1-1

3 0.054450 30.0 32.9 1 1-1

4 0.070366 30.0 32.9 1 1-1

5 0.066178 30.0 32.9 1 1-1

6 0.050262 30.0 32.9 1 1-1

7 0.061152 30.0 32.9 1 1-1

8 0.078743 30.0 32.9 1 1-1

9 0.038534 30.3 32.8 1 1-3

10 NaN 30.3 32.8 1 1-3

11 0.041047 30.3 32.8 1 1-3

12 0.036859 30.3 32.8 1 1-3

13 0.058639 30.3 32.8 1 1-3

14 NaN 30.3 32.8 1 1-3

15 0.074555 30.3 32.8 1 1-3

..

115 0.133333 30.7 32.9 6 6-4

116 0.135556 30.7 32.9 6 6-4

117 0.131111 30.7 32.9 6 6-4

118 0.146667 30.7 32.9 6 6-4

119 0.184444 30.7 32.9 6 6-4

120 0.168889 30.7 32.9 6 6-4

121 0.157778 30.7 32.9 6 6-4

122 0.173333 30.7 32.9 6 6-4

123 0.170000 30.7 32.9 6 6-4

124 0.172222 30.7 32.9 6 6-4

125 0.170000 30.7 32.9 6 6-4

126 0.207778 30.7 32.9 6 6-4

127 0.201111 30.7 32.9 6 6-4

128 0.217778 30.7 32.9 6 6-4

129 0.201111 30.7 32.9 6 6-4

Kerby Shedden

unread,

Apr 10, 2015, 11:06:42 PM4/10/15

to pystat...@googlegroups.com

You can't do nested random effect in MixedLM. Hopefully this will be addressed in the next few months. You can do nested random effects in GEE, using the Nested covariance structure.

The closest you can get in MixedLM is to pass your 'group' variable as the 'group' argument to MixedLM, and then in re_formula pass a matrix that contains indicators for each distinct value in your 'subgroup' variable. Then use the 'free' argument to make the random effects independent. The missing piece is that you cannot force the random effects (for the subgroups) to have the same variance parameter.

fortozs

unread,

Apr 11, 2015, 12:28:20 PM4/11/15

to pystat...@googlegroups.com

Thanks for the information. I can use R until this is implemented, but I look forward to being able to do this in Python. It might be time for me to learn how to use RPy. That's a very good info about how to force statsmodels to make the random effects independent. I appreciate everyone's help.

Reply all

Reply to author

Forward