Driscoll Kraay standard errors in RegressionResults

Firdaus Janoos

unread,

Aug 7, 2015, 2:10:20 PM8/7/15

to pystatsmodels

Hello,

I have a panel dataset as pandas dataframe organized as:

time | individual | response | predictor_1 | predictor_2 |
---------------------------------------------------------------------------------
t | i | y_t,i | x1_t,i | x2_t,i |
...
and I've been using statsmodels.regression.linear_model.WLS to compute $y_{i,t} ~ x1_{i,t} + x2 _{i,t}$

I was interested in using the Driscoll Kraay method for computing standard errors in panel data with time-series auto-correlations (ie hac-groupsum option of RegressionResults.get_robustcov_results).

However, I am not sure how to setup the statsmodels.regression.linear_model.WLS such that the RegressionsResults object is aware of the panel structure (ie let it know the cross-sectional index and time-stamp of each row).

If you can point me to some example code on how to setup this problem - that would be greatly appreciated.

Thanks !

josef...@gmail.com

unread,

Aug 7, 2015, 2:26:18 PM8/7/15

to pystatsmodels

the standard pattern is now specifying it in the `fit` call

res = model.fit(cov_type='hac-groupsum', cov_kwds={'time': mytime_array, 'groups': mygroup_array})

with extra arguments in a dictionary cov_kwds

http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html

except looking for the code, the documentation and the code don't agree

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/base/covtype.py#L242

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/linear_model.py#L1926

In the code the `if` is for nw-groupsum

I got briefly worried because I didn't find `hac-groupsum' in any test module

It looks like the get_robustcov_results has unit tests but I didn't add them for the newer `fit` interface

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_robustcov.py#L521

res = model.fit(cov_type='nw-groupsum', cov_kwds={'time': mytime_array, 'groups': mygroup_array})

is supposed to work

Josef

Thanks !

Charles Martineau

unread,

Aug 12, 2015, 4:45:11 PM8/12/15

to pystatsmodels

Dear Josef,

I am also trying to compute the Driscoll-Kraay standard errors. But I always get a MemoryError issue.

For instance:

index Y X1 X2 ...... X17 GroupID

2012-01-25 12:30:00 -1.809030 2.126177 0.522877 1

2012-01-25 12:31:00 -0.434571 -1.809030 2.126177 1

2012-01-25 12:32:00 0.500806 -0.434571 -1.809030 1

2012-01-25 12:33:00 -0.877922 0.500806 -0.434571 1

2012-01-25 12:34:00 0.427819 -0.877922 0.500806 1

The data is of length 1410 by 17. I have four groups so GroupID goes from 1 to 4.

Now if I try the following:

time = [(t-datetime.datetime(1970,1,1)).total_seconds() for t in df.index] # convert my time index to number of seconds

res = sm.OLS(df.Y, df.X).fit(cov_type='nw-groupsum', cov_kwds={'time': time, 'groups': np.array(df.GroupID), 'maxlags': 5})

I get this error:

Traceback (most recent call last):

File "<ipython-input-81-be983d62f538>", line 4, in <module>

'groups': np.array(dec_all.Pid), 'maxlags':1})

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 211, in fit

cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 1099, in __init__

use_t=use_t, **cov_kwds)

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 1873, in get_robustcov_results

use_correction=use_correction)

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\stats\sandwich_covariance.py", line 871, in cov_nw_groupsum

S_hac = S_hac_groupsum(xu, time, nlags=nlags, weights_func=weights_func)

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\stats\sandwich_covariance.py", line 477, in S_hac_groupsum

x_group_sums = group_sums(x, time).T #TODO: transpose return in grou_sum

File "C:\Users\chamar.stu\AppData\Local\Continuum\Anaconda\lib\site-packages\statsmodels\stats\sandwich_covariance.py", line 437, in group_sums

for col in range(x.shape[1])])

MemoryError

What am I doing wrong? Thanks Josef

Charles Martineau

unread,

Aug 12, 2015, 4:45:58 PM8/12/15

to pystatsmodels

Oh I must had that I regress Y on 16 X variables.

josef...@gmail.com

unread,

Aug 12, 2015, 4:54:39 PM8/12/15

to pystatsmodels

On Wed, Aug 12, 2015 at 4:45 PM, Charles Martineau <martinea...@gmail.com> wrote:

Oh I must had that I regress Y on 16 X variables.

On Wednesday, August 12, 2015 at 1:45:11 PM UTC-7, Charles Martineau wrote:
Dear Josef,

I am also trying to compute the Driscoll-Kraay standard errors. But I always get a MemoryError issue.

For instance:

index Y X1 X2 ...... X17 GroupID
2012-01-25 12:30:00 -1.809030 2.126177 0.522877 1
2012-01-25 12:31:00 -0.434571 -1.809030 2.126177 1
2012-01-25 12:32:00 0.500806 -0.434571 -1.809030 1
2012-01-25 12:33:00 -0.877922 0.500806 -0.434571 1
2012-01-25 12:34:00 0.427819 -0.877922 0.500806 1

The data is of length 1410 by 17. I have four groups so GroupID goes from 1 to 4.

Now if I try the following:

time = [(t-datetime.datetime(1970,1,1)).total_seconds() for t in df.index] # convert my time index to number of seconds

What's `time.max()`?

Can you try to convert time to a consecutive integer index corresponding to np.arange(n_time_points)?

A memory error sounds bad. I can look at the code later (tonight or tomorrow).

One guess is that I use np.bincount which creates an array of length time.max(). (We fixed a similar problem in an unrelated part of the code.)

Overall there are a lot of assumptions on the structure of the data and arrays in these parts and not enough checking.

Josef

Charles Martineau

unread,

Aug 12, 2015, 5:30:01 PM8/12/15

to pystatsmodels

dear Josef,

yes you are right, a simple np.arange(n_time_points) fixed the issue.

thank you

Reply all

Reply to author

Forward