Thanks,
Esben
JosefThanks,
Esben
On Tue, Sep 8, 2015 at 3:58 PM, <josef...@gmail.com> wrote:On Tue, Sep 8, 2015 at 3:33 PM, <esben.h...@me.com> wrote:How can I use statsmodels OLS to calculate Hansen-Hodrick standard errors?Suppose I use overlapping data on the left-hand-side, e.g. yearly returns at a monthly frequency.I want to change the kernel from the default Bartlett used in Newey-West to a constant weight of 1, so I want to write something likeOLS(y,x).fit(cov_type='HAC', cov_kwds={'maxlags':11, kernel=xxx})On http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html?highlight=get_robustcov_results, I can see the kernel argument, but not the options.I don't know what Hansen-Hodrick standard errors are.If you just need a uniform truncated (flat top) kernel, then that is available, but I don't remember how this is wired up.In the underlying functions kernels are just callables or arraysI can try to find an example tonight.kernel is not wired up for HAC, and internally it's called weights_func (which is available in nw-panel, but incorrectly documented)Something like this **might** work correctly>>> import statsmodels.stats.sandwich_covariance as sw>>> res_flatb = mod.fit(cov_type='nw-panel', cov_kwds=dict(time=np.arange(mod.endog.shape[0]), maxlags=4, weights_func=sw.weights_bartlett))>>> res_flatb.bsearray([ 0.33125645, 0.29582348, 1.18593381])>>> res_flat = mod.fit(cov_type='nw-panel', cov_kwds=dict(time=np.arange(mod.endog.shape[0]), maxlags=4, weights_func=sw.weights_uniform))>>> res_flat.bsearray([ 0.33784466, 0.28640521, 1.15945044])There won't be any unit tests for this yet. In my last round I wrote unit tests and options mostly against Stata, but Stata's `newey` doesn't have a kernel option (and I focused more on panel and cluster robust standard errors).It looks like ivreg2 supports it http://www.stata.com/statalist/archive/2011-03/msg00362.html
Hi Josef,Thanks a lot, I'll try your suggestion.The canonical reference is Hansen and Hodrick (1980): Forward Exchange Rates as Optimal Predictors of Future Spot Rates: An Econometric Analysis. Journal of Political Economy, 1980 Volume 88, number 51.The need for Hansen-Hodrick standard errors shows up a lot when working with overlapping data in finance. Suppose you want to predict annual returns on the stock market, but that you sample the data monthly. On the left-hand-side you now have annual returns and observations next to each other have 11 month of data in common.The spectral density matrix S is the sum (as usual) over all the cross-second moments of g_t = x_t u_t:S = \sum_{j=-\infty}^{\infty} E(g_t g_{t+j}' )Newey-West standard errors are good when we don't know the correlation structure. They essentially down-weight the estimates of E(g_t g_{t+j}' ) as j grows.In the above example, we know the structure of the overlap, and we need to include 11 lags exactly (with equal weight). There's no guarantee that the resulting matrix will be positive definite, but it's the right thing to do.
The comment"If I'm not mistaken, Hansen-Hodrick SEs are the same as using the truncated kernel and assuming homoskedasticity."is correct. Hansen and Hodrick wrote their paper before GMM was developed, so they focused on the homoscedastic case. It seems like the name stuck, so now people say Hansen-Hodrick standard errors when they use GMM standard errors with a truncated, equal-weight, kernel.
Cochrane is a good source for this stuff.Your suggestion gives the correct answer:model.fit(cov_type='nw-panel', cov_kwds={'time':np.arange(model.endog.shape[0]), 'maxlags':n, 'weights_func':sw.weights_uniform})Of course, longer term it would be nice to be able to do this for a time-series without calling the panel-data functionality.
I don't have an explicit application for the autocorrelation example you mention. In financial applications we usually know the overlap, so there's no need to test anything.
Completely unrelated to this: Is there any integration with pandas for panel data? Suppose you have two pandas DataFrames for x and y in which the index is time and the columns are entities. You want to run a panel regression with various options like cluster by time or entity, time FE and so on. Is there an easy way to do this, avoiding stacking the data and setting up the time-index yourself (the stacked dataframe has a multi-level-index, which is annoying in this application). Like writing OLS(y,x).fit('cluster', groups=y.index, ...) or OLS(y,x).fit('cluster', groups=y.columns, fixed_effects=y.time, ...) and then the code figures out the stacking and df corrections?
Esben