Absorbing fixed effects

stoffprof

unread,

Aug 1, 2015, 1:07:49 AM8/1/15

to pystatsmodels

Unless I'm wrong (which happens quite a bit), the absorption technique for estimating models with fixed effects that is available in Stata and SAS is not currently part of statsmodels. Does anyone know if there has been any discussion about adding this? It is quite useful when one wants to include many fixed effects but is not interested in their individual estimates.

The Stata documentation on this technique ("areg") is here. Discussion of the SAS implementation is here.

josef...@gmail.com

unread,

Aug 1, 2015, 1:49:11 AM8/1/15

to pystatsmodels

On Fri, Jul 31, 2015 at 9:59 PM, stoffprof <stof...@gmail.com> wrote:

Unless I'm wrong (which happens quite a bit), the absorption technique for estimating models with fixed effects that is available in Stata and SAS is not currently part of statsmodels. Does anyone know if there has been any discussion about adding this? It is quite useful when one wants to include many fixed effects but is not interested in their individual estimates.

The Stata documentation on this technique ("areg") is here. Discussion of the SAS implementation is here.

Please open an issue on github. I'm on the road today but can look at it in the next week.

We don't currently have a model for it, but it has been mentioned in a few comments.

Roughly based on what I remember:

We have helper functions to do groupwise or panelwise demeaning (either numpy or pandas version with discussion several years ago which is faster and more convenient).

The main issue for using OLS with the demeaned data is that we need to adjust the degrees of freedom for the inference to take the number of group mean parameters into account.

The calculations might also be available in the Panel PR, as `within`, IIRC, estimator. But it might not be a full model.

We also have an open issue to just remove a large number of fixed effects from the summary2 tables.

cluster and panel robust standard errors should work without problems except that the degrees of freedom need to be verified.

So, my guess is that it will be mainly collecting the pieces and wiring them up in a standalone model.

separate issue:

The above works for OLS assumptions, I don't know (or remember) how a WLS version would look like.

With the interpretation of removing fixed effects as conditioning out random effects (that might be correlated with the explanatory variables), we should also get a GMM version. There were plans to extend it to panel system GMM, but that's still a way off because nobody is working on it right now. Those would need more elaborate differencing out of panel specific constants.

Demeaning the data doesn't generalize to models other than linear, but it would be possible for Logit (ConditionalLogit) and for Poisson. Those two are in draft versions.

Josef

stoffprof

unread,

Aug 1, 2015, 2:15:16 PM8/1/15

to pystatsmodels

Issue #2568 opened on Github.

josef...@gmail.com

unread,

Aug 1, 2015, 3:44:56 PM8/1/15

to pystatsmodels

On Sat, Aug 1, 2015 at 2:15 PM, stoffprof <stof...@gmail.com> wrote:

Issue #2568 opened on Github.

Thanks,

A brief usecase question: Do you have balanced or unbalanced panel?

Most existing code or code in PRs focuses on unbalanced panels because it's more general. However, special code for balanced panels would be easier to implement and faster.

A question out of curiosity which won't affect the implementation but might/will affect the interpretation:

Do you have large time series per panel unit (long panels) or wide panel data?

Stata's areg uses large T, fixed N asymptotics, but I'm surprised they use cluster robust instead of a panel HAC robust covariance. (AFAIK, Stata itself doesn't have any Panel Hac covariance options, although there are user packages.)

Josef

stoffprof

unread,

Aug 1, 2015, 7:54:35 PM8/1/15

to pystatsmodels

My current setting is an unbalanced panel, and that's usually what I encounter. I do work in financial economics, and it tends to be some data about firms measured repeatedly over time. Will typically be several thousand firms with anywhere from 100 to 1000+ time series observations. And since firms don't survive for the whole time, it's unbalanced. I might be interested in including fixed effects for firms, for time, or both (although I suppose only one can be absorbed).

Reply all

Reply to author

Forward