OLS w/ Fixed effects

Damien Moore

unread,

Jun 25, 2017, 1:41:04 PM6/25/17

to pystatsmodels

I've noticed a lot of discussions about panel data stuff here and elsewhere, but wasn't able to figure out if there is a currently implemented basic fixed effects regression anywhere. I just want to run a completely trivial FE OLS regression on several hundred groups of times series observations in an unbalanced panel. Is the best current way just to manually add the fixed effects dummies to the endogenous variable set or is there something that will automate the generation of the dummies and report "correct" statistics?

josef...@gmail.com

unread,

Jun 25, 2017, 1:59:38 PM6/25/17

to pystatsmodels

On Sun, Jun 25, 2017 at 1:41 PM, Damien Moore <damien...@gmail.com> wrote:

I've noticed a lot of discussions about panel data stuff here and elsewhere, but wasn't able to figure out if there is a currently implemented basic fixed effects regression anywhere. I just want to run a completely trivial FE OLS regression on several hundred groups of times series observations in an unbalanced panel. Is the best current way just to manually add the fixed effects dummies to the endogenous variable set or is there something that will automate the generation of the dummies and report "correct" statistics?

If you don't have memory problems, then the easiest way is to create the fixed effects dummies with patsy from_formula. This will automatically do the right thing, drop one reference level and works the same way for multiple fixed effects.

I.e. fixed effects are just categorical variables given by group names, group indicators or time indices.

I have three things waiting in PR:

- general sparse regression independent of any interpretation of FE. A question and example for large number of fixed effects dummies was the reason for getting a recipe for that.

and two versions for absorbing FE or general categorical variables

- One is if the projection to absorb is done outside the model and the main part we need is a `ddof` argument to take the reduced degrees of freedom into account

- Absorbing inside the model: This has two methods in prototype stage: The first is using sparse regression to project out the FE, and the second is to use iterative demeaning (the latter doesn't work for the general case yet)

There are a few of the extra results that differ by method. e.g. the Null model for rsquared or fvalue is the absorbed-fixed- effects-only model and not a constant-only model, and maybe a few other results that I don't remember right now.

What I have still mostly missing is to recover properties of the absorbed FE, i.e. it's currently assumed that they are pure nuisance parameters for which we don't need estimates or inference.

(I don't remember how it affects the interpretation or assumptions for robust sandwich standard errors.)

Note: Kevin started in the meantime a separate linear model package that has some extra and specific panel features.

Josef

josef...@gmail.com

unread,

Jun 25, 2017, 2:23:37 PM6/25/17

to pystatsmodels

To add a bit as motivation (what I just remembered)

If there is a single categorical variable/FE or if the panel is balanced with two-way crossed effects (e.g. individuals and time), then simple groupby demeaning by one or two groups is absorbing the fixed effects correctly. This is easy to do with pandas. So all we need in that case is ddof sin order to use OLS directly.

With two-way or multi-way FE without balanced panel data, the FE are not orthogonal to each other and we need to use an iterative solver to project out the FE/categorical variables, and this is more conveniently supported within a model. This iterative solution never needs to construct a dense dummy matrix and requires only the amount of space corresponding the the group labels/indicators arrays (1-d per grouping/FE variable).

Josef

Kevin Sheppard

unread,

Jun 25, 2017, 9:23:57 PM6/25/17

to pystatsmodels

See the package linearmodels, pip installable using

pip install linearmodels

It contains a fully tested 2 way FE model that can easily handle hundreds of thousands of entities.

Git at

https://github.com/bashtage/linearmodels

Damien Moore

unread,

Jun 26, 2017, 10:29:56 AM6/26/17

to pystatsmodels

Thanks, I'll give it a spin.

One minor gripe (which is more about the state of econometrics than the library per se) is that there are lots of things called something similar to "linear models" making it hard for a new user to know where to find things: For example, from the statsmodels start page:

josef...@gmail.com

unread,

Jun 26, 2017, 10:52:15 AM6/26/17

to pystatsmodels

On Mon, Jun 26, 2017 at 10:29 AM, Damien Moore <damien...@gmail.com> wrote:

Thanks, I'll give it a spin.

One minor gripe (which is more about the state of econometrics than the library per se) is that there are lots of things called something similar to "linear models" making it hard for a new user to know where to find things: For example, from the statsmodels start page:

  Linear Regression
  Generalized Linear Models
  Generalized Estimating Equations
  Robust Linear Models
  Linear Mixed Effects Models

It's worse because it is "statistics and econometrics", GLM and GEE are not standard fare for econometricians, and linear mixed effects models only partially. RLM and other outlier robust models are also often/mostly not standard text book material.

Our time series models are also almost all linear/Gaussian models.

(As a consequence OLS/normal linear models is a special case of many other models.)

Mostly we assume users are familiar with some models by name from their text books and find them that way in statsmodels. Every once in a while, I try to start an overview of models by topic, but we don't have many contributors writing good overview notebooks or examples.

https://github.com/statsmodels/statsmodels/issues/2642

https://github.com/statsmodels/statsmodels/issues/3693

https://github.com/statsmodels/statsmodels/issues/3201 (essentially empty)

plus many GLM cases have a discrete "econometrics" equivalent (Logit, Probit, Poisson, ...)

(As somebody that didn't grow up with R, I still find different model names easier to understand than putting everything is some cryptic formula in R's `lm`.)

Josef

Reply all

Reply to author

Forward