Remove fixed effects from summary

Brian Keegan

unread,

Apr 24, 2014, 5:30:04 PM4/24/14

to pystat...@googlegroups.com

I'm estimating some simple OLS models that have dozens or hundreds of fixed effects terms, but I want to omit these estimates from the summary_col. Looking under the hood, it appears that the Summary object is just a DataFrame which means it should be possible to do some index slicing here to return the appropriate rows, but the Summary objects don't support the basic DataFrame attributes and methods.

More formally:

import pandas as pd

import numpy as np

import string

import statsmodels.formula.api as smf

from statsmodels.iolib.summary2 import summary_col

df = pd.DataFrame({'A' : list(string.ascii_uppercase)*10,

'B' : list(string.ascii_lowercase)*10,

'C' : np.random.randn(260),

'D' : np.random.normal(size=260),

'E' : np.random.random_integers(0,10,260)})

m1 = smf.ols('E ~ D',data=df).fit()

m2 = smf.ols('E ~ D + C',data=df).fit()

m3 = smf.ols('E ~ D + C + B',data=df).fit()

m4 = smf.ols('E ~ D + C + B + A',data=df).fit()

print summary_col([m1,m2,m3,m4])

This returns a Summary object that has 55 rows (52 for the two fixed effects + the intercept + exogenous D and E terms). I would like a summary object that excludes the 52 fixed effects estimates and only includes the estimates for D, E, and the intercept for all four models. What's the best way to remove fixed effects from the summary_col? Alternatively, how can I create a Summary object that only includes specific regressors and excludes the rest?

Thanks!

Brian Keegan

unread,

Apr 24, 2014, 5:39:45 PM4/24/14

to pystat...@googlegroups.com

I screwed up those variable names in the final paragraph -- it should read:

This returns a Summary object that has 55 rows (52 for the two fixed effects + the intercept + exogenous C and D terms). I would like a summary object that excludes the 52 fixed effects estimates and only includes the estimates for C, D, and the intercept for all four models. What's the best way to remove fixed effects from the summary_col? Alternatively, how can I create a Summary object that only includes specific regressors and excludes the rest?

josef...@gmail.com

unread,

Apr 24, 2014, 5:59:17 PM4/24/14

to pystatsmodels

great question (actually I misread it initially.

more options to summary tables,
Would you please open an issue.

Playing with a bit of python introspection

>>> tble = summary_col([m1,m2,m3,m4])
>>> tble.tables[0].index
Index([u'A[T.B]', u'', u'A[T.C]', u'', u'A[T.D]', u'', u'A[T.E]', u'', u'A[T.F]', u'', u'A[T.G]', u'', u'A[T.H]', u'', u'A[T.I]', u'', u'A[T.J]', u'', u'A[T.K]', u'', u'A[T.L]', u'', u'A[T.M]', u'', u'A[T.N]', u'', u'A[T.O]', u'', u'A[T.P]', u'', u'A[T.Q]', u'', u'A[T.R]', u'', u'A[T.S]', u'', u'A[T.T]', u'', u'A[T.U]', u'', u'A[T.V]', u'', u'A[T.W]', u'', u'A[T.X]', u'', u'A[T.Y]', u'', u'A[T.Z]', u'', u'B[T.b]', u'', u'B[T.c]', u'', u'B[T.d]', u'', u'B[T.e]', u'', u'B[T.f]', u'', u'B[T.g]', u'', u'B[T.h]', u'', u'B[T.i]', u'', u'B[T.j]', u'', u'B[T.k]', u'', u'B[T.l]', u'', u'B[T.m]', u'', u'B[T.n]', u'', u'B[T.o]', u'', u'B[T.p]', u'', u'B[T.q]', u'', u'B[T.r]', u'', u'B[T.s]', u'', u'B[T.t]', u'', u'B[T.u]', u'', u'B[T.v]', u'', u'B[T.w]', u'', u'B[T.x]', u'', u'B[T.y]', u'', u'B[T.z]', u'', u'C', u'', u'D', u'', u'Intercept', u''], dtype=object)
>>> tble.tables[0] = tble.tables[0][-6:]
>>> print tble

=============================================
E I E II E III E IIII
---------------------------------------------
C -0.1977 -0.1903 -0.1903
(0.1864) (0.1961) (0.1961)
D 0.1967 0.1741 0.2714 0.2714
(0.2021) (0.2031) (0.2180) (0.2180)
Intercept 4.9938 4.9907 5.0803 5.0803
(0.1980) (0.1980) (1.0168) (1.0168)
=============================================
Standard errors in parentheses.

This seems to work.

So you should be able to slice or index the underlying dataframe

my initial thought was changing summary() or summary2() which would also be nice if we want to add additional effects.

Josef

josef...@gmail.com

unread,

Apr 24, 2014, 6:14:23 PM4/24/14

to pystatsmodels

Looking at this for the first time, it's Vincent's work with the help of Jan Schulz

all the conversion/rendering is done at each call of `as_xxx`, so changing the underlying dataframe should be pretty safe (as long as you don't make up numbers).

>>> tble.tables[0].info()

Index: 6 entries, C to

Data columns (total 4 columns):

E I 6 non-null values

E II 6 non-null values

E III 6 non-null values

E IIII 6 non-null values

dtypes: object(4)

Josef

Reply all

Reply to author

Forward

Remove fixed effects from summary_col?

Brian Keegan

Brian Keegan

josef...@gmail.com

josef...@gmail.com