Remove fixed effects from summary_col?

348 views
Skip to first unread message

Brian Keegan

unread,
Apr 24, 2014, 5:30:04 PM4/24/14
to pystat...@googlegroups.com
I'm estimating some simple OLS models that have dozens or hundreds of fixed effects terms, but I want to omit these estimates from the summary_col. Looking under the hood, it appears that the Summary object is just a DataFrame which means it should be possible to do some index slicing here to return the appropriate rows, but the Summary objects don't support the basic DataFrame attributes and methods.

More formally: 

import pandas as pd
import numpy as np
import string
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

df = pd.DataFrame({'A' : list(string.ascii_uppercase)*10,
                   'B' : list(string.ascii_lowercase)*10,
                   'C' : np.random.randn(260),
                   'D' : np.random.normal(size=260),
                   'E' : np.random.random_integers(0,10,260)})

m1 = smf.ols('E ~ D',data=df).fit()
m2 = smf.ols('E ~ D + C',data=df).fit()
m3 = smf.ols('E ~ D + C + B',data=df).fit()
m4 = smf.ols('E ~ D + C + B + A',data=df).fit()

print summary_col([m1,m2,m3,m4])

This returns a Summary object that has 55 rows (52 for the two fixed effects + the intercept + exogenous D and E terms). I would like a summary object that excludes the 52 fixed effects estimates and only includes the estimates for D, E, and the intercept for all four models. What's the best way to remove fixed effects from the summary_col? Alternatively, how can I create a Summary object that only includes specific regressors and excludes the rest?

Thanks!

Brian Keegan

unread,
Apr 24, 2014, 5:39:45 PM4/24/14
to pystat...@googlegroups.com
I screwed up those variable names in the final paragraph -- it should read: 

This returns a Summary object that has 55 rows (52 for the two fixed effects + the intercept + exogenous C and D terms). I would like a summary object that excludes the 52 fixed effects estimates and only includes the estimates for C, D, and the intercept for all four models. What's the best way to remove fixed effects from the summary_col? Alternatively, how can I create a Summary object that only includes specific regressors and excludes the rest?

josef...@gmail.com

unread,
Apr 24, 2014, 5:59:17 PM4/24/14
to pystatsmodels
great question (actually I misread it initially.

more options to summary tables,
Would you please open an issue.

Playing with a bit of python introspection

>>> tble = summary_col([m1,m2,m3,m4])
>>> tble.tables[0].index
Index([u'A[T.B]', u'', u'A[T.C]', u'', u'A[T.D]', u'', u'A[T.E]', u'', u'A[T.F]', u'', u'A[T.G]', u'', u'A[T.H]', u'', u'A[T.I]', u'', u'A[T.J]', u'', u'A[T.K]', u'', u'A[T.L]', u'', u'A[T.M]', u'', u'A[T.N]', u'', u'A[T.O]', u'', u'A[T.P]', u'', u'A[T.Q]', u'', u'A[T.R]', u'', u'A[T.S]', u'', u'A[T.T]', u'', u'A[T.U]', u'', u'A[T.V]', u'', u'A[T.W]', u'', u'A[T.X]', u'', u'A[T.Y]', u'', u'A[T.Z]', u'', u'B[T.b]', u'', u'B[T.c]', u'', u'B[T.d]', u'', u'B[T.e]', u'', u'B[T.f]', u'', u'B[T.g]', u'', u'B[T.h]', u'', u'B[T.i]', u'', u'B[T.j]', u'', u'B[T.k]', u'', u'B[T.l]', u'', u'B[T.m]', u'', u'B[T.n]', u'', u'B[T.o]', u'', u'B[T.p]', u'', u'B[T.q]', u'', u'B[T.r]', u'', u'B[T.s]', u'', u'B[T.t]', u'', u'B[T.u]', u'', u'B[T.v]', u'', u'B[T.w]', u'', u'B[T.x]', u'', u'B[T.y]', u'', u'B[T.z]', u'', u'C', u'', u'D', u'', u'Intercept', u''], dtype=object)
>>> tble.tables[0] = tble.tables[0][-6:]
>>> print tble

=============================================
            E I      E II    E III    E IIII
---------------------------------------------
C                  -0.1977  -0.1903  -0.1903
                   (0.1864) (0.1961) (0.1961)
D         0.1967   0.1741   0.2714   0.2714  
          (0.2021) (0.2031) (0.2180) (0.2180)
Intercept 4.9938   4.9907   5.0803   5.0803  
          (0.1980) (0.1980) (1.0168) (1.0168)
=============================================
Standard errors in parentheses.


This seems to work.
So you should be able to slice or index the underlying dataframe


my initial thought was changing summary() or summary2() which would also be nice if we want to add additional effects.

Josef

josef...@gmail.com

unread,
Apr 24, 2014, 6:14:23 PM4/24/14
to pystatsmodels
Looking at this for the first time, it's Vincent's work with the help of Jan Schulz

all the conversion/rendering is done at each call of `as_xxx`, so changing the underlying dataframe should be pretty safe (as long as you don't make up numbers).

>>> tble.tables[0].info()
<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, C to 
Data columns (total 4 columns):
E I       6  non-null values
E II      6  non-null values
E III     6  non-null values
E IIII    6  non-null values
dtypes: object(4)


Josef
Reply all
Reply to author
Forward
0 new messages