FAQ: How good are standard errors?

23 views

Skip to first unread message

josef...@gmail.com

unread,

Nov 17, 2016, 12:45:00 PM11/17/16

to pystatsmodels

We have to start to worry about the quality or validity of standard errors of the parameter estimates as we get new methods into statsmodels. (machine learning just ignores them, we try not to. However, what are good standard errors is still in question for some newish methods. Also our implementation might initially not be always "state-of-the-art".)

I tried to summarize it a bit in

https://github.com/statsmodels/statsmodels/issues/3270

Note, most of this is not new, there are many problems for inference when users try many different specifications, use stepwise regression, make data dependent decisions in the "forking paths". Econometrics had some credibility discussion in the early nineteen eighties, psychology and other fields are suffering now (see e.g. Gelman's blog).

The difference is that before it was a problem for users (and publishers) how to get the software to spit out a p-value of 0.049, now we start to include some of these algorithms directly in the software, and the built-in specification searches like LASSO or regression trees might have unclear stochastic properties for inference.

Aside: Bootstrap does not make a fundamental difference, in my opinion. The question just changes from which asymptotic standard errors to use to which bootstrap to use (and still needs theorists to prove what's appropriate).

(Bootstrap can make a quantitative difference in improving, for example, small sample inference.)