Possible bug with MNLogit from_formula when trying to return conf_int()

27 views
Skip to first unread message

Dylan Contris

unread,
Jun 16, 2017, 1:26:08 PM6/16/17
to pystatsmodels


The following code to retrieve confidence intervals for an MNLogit model works just fine... 
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)
logit_mod = sm.MNLogit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit()

logit_res.conf_int()

Optimization terminated successfully.
         Current function value: 0.402801
         Iterations 7
Out[269]:
array([[[-22.68656471,  -3.356129  ],
        [  0.35079357,   5.30143162],
        [ -0.18228348,   0.37259881],
        [  0.29218006,   4.46519525]]])

However if you rework this code to invoke the model from a formula, you get a ValueError. It's the same exact model and data, but returns an error, so I assume it's a bug.

import pandas as pd

spector_data = sm.datasets.spector.load()
tmp = pd.DataFrame(spector_data.exog)
tmp['endog'] = spector_data.endog
tmp = tmp.rename(columns={0:'a',1:'b',2:'c'})
logit_mod = sm.MNLogit.from_formula('endog ~ a+b+c', tmp)
logit_res = logit_mod.fit()

logit_res.conf_int()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-271-611c3c50878f> in <module>()
      8 logit_res = logit_mod.fit()
      9 
---> 10 logit_res.conf_int()

~/anaconda/lib/python3.6/site-packages/statsmodels/base/wrapper.py in wrapper(self, *args, **kwargs)
     93             obj = data.wrap_output(func(results, *args, **kwargs), how[0], how[1:])
     94         elif how:
---> 95             obj = data.wrap_output(func(results, *args, **kwargs), how)
     96         return obj
     97 

~/anaconda/lib/python3.6/site-packages/statsmodels/base/data.py in wrap_output(self, obj, how, names)
    405     def wrap_output(self, obj, how='columns', names=None):
    406         if how == 'columns':
--> 407             return self.attach_columns(obj)
    408         elif how == 'rows':
    409             return self.attach_rows(obj)

~/anaconda/lib/python3.6/site-packages/statsmodels/base/data.py in attach_columns(self, result)
    522             return Series(result, index=self.param_names)
    523         else:  # for e.g., confidence intervals
--> 524             return DataFrame(result, index=self.param_names)
    525 
    526     def attach_columns_eq(self, result):

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    304             else:
    305                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 306                                          copy=copy)
    307         elif isinstance(data, (list, types.GeneratorType)):
    308             if isinstance(data, types.GeneratorType):

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    461         # by definition an array here
    462         # the dtypes will be coerced to a single dtype
--> 463         values = _prep_ndarray(values, copy=copy)
    464 
    465         if dtype is not None:

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _prep_ndarray(values, copy)
   5686     return arrays, arr_columns
   5687 
-> 5688 
   5689 def _list_to_arrays(data, columns, coerce_float=False, dtype=None):
   5690     if len(data) > 0 and isinstance(data[0], tuple):

ValueError: Must pass 2-d input







josef...@gmail.com

unread,
Jun 16, 2017, 1:35:33 PM6/16/17
to pystatsmodels
Definitely a bug, I get this also. Thanks for reporting. Can you open
an issue on github?
It's the usage of pandas and not the use of formulas that cause the error.

The internal numpy array version of confint (in `._result`) is fine,
that's why also summary() works
but the wrapper cannot handle 3-D, there is one dimension too many in
the confint

>>> logit_res._results.conf_int()
array([[[-22.68656471, -3.356129 ],
[ 0.35079357, 5.30143162],
[ -0.18228348, 0.37259881],
[ 0.29218006, 4.46519525]]])

The example is only binomial, so there might still be issues with
multinomial with 3 or more levels

Josef

josef...@gmail.com

unread,
Jun 16, 2017, 1:42:44 PM6/16/17
to pystatsmodels
Actually, this is most likely related to a group of bugs because
MNLogit is using a multivariate representation, i.e. params is 2-D
while the generic, inherited methods are designed for 1-D params.
So we need to special case MNLogit results in many places which hasn't
been done everywhere yet.

But it's the first time I see this case with the pandas wrapper, AFAIR.

Josef


>
> Josef

Brock Mendel

unread,
Jun 16, 2017, 8:34:01 PM6/16/17
to pystatsmodels
See #3651

josef...@gmail.com

unread,
Jun 16, 2017, 9:08:46 PM6/16/17
to pystatsmodels
On Fri, Jun 16, 2017 at 8:34 PM, Brock Mendel <jbrock...@gmail.com> wrote:
> See #3651

Thanks, just a bit more than a month ago and my memory becomes vague.
(AFAIRGMLSTM Given My Limited Short Term Memory)

I think we should reserve a week in late summer or in fall to fix the
multivariate case properly.

(Right now I'm too much under pre-vacation stress. Imagine going
without notebook and without the correct 100 pdf files. :(

Josef
Reply all
Reply to author
Forward
0 new messages