GMM.IV2SLS regression with logit in the first stage

Charles Martineau

unread,

Dec 20, 2016, 3:22:51 AM12/20/16

to pystatsmodels

Hi guys,

Question: Can the first stage regression in GMM.IV2SLS be a logit regression?

I am currently running a simple OLS regression where one of the variable is constructed using predicted values from a logit regression. The independent variables from my logit regression do not appear in my OLS regression. Yet, since I am using predicted values in my OLS regression, I am worried about the standard errors linked to my predicted values. Therefore, can GMM.IV2SLS be a solution? Can we specify that the first stage is a logit?

OR would you suggest to simply do what I am doing now but adjust the standard errors using another procedure? (sorry, this question is more econometric than stats/coding).

Thank you!

josef...@gmail.com

unread,

Dec 20, 2016, 9:07:02 AM12/20/16

to pystatsmodels

There are too many topics in statistics and econometrics. I haven't looked at things related to this in more than 1.5 years, and I don't remember the econometrics for this case, or never looked it up specifically.

IV2SLS assumes we have two linear models, estimated essentially by OLS in two stages with the corrected standard errors.
(Just a thought right now: It could be that we could reuse this for local approximation similar to iteratively reweighted least squares if we had weights, but IV2SLS doesn't have them.)

The choices are either to calculate the corrected 2-stage standard errors explicitly, e.g. Murphy and Topel in Greene book, or to use the equivalent or corresponding general GMM problem.

Using GMM for this type of problems is what I would like to support and we or I have several open issues for it. However, nothing is available out of the box and verified/unit-tested.
example:
https://github.com/statsmodels/statsmodels/pull/2455/files#diff-1b3411f7d5ad604e44f690f8d7790fe0R169

To get the correct standard errors for treatment effect estimation with missing at random (no hidden heterogeneity conditional on observed variables) Stata and I use GMM.

Stata has also a IV versions for endogenous binary treatment, but I never looked at the details for those.

Josef

Charles Martineau

unread,

Dec 20, 2016, 5:15:19 PM12/20/16

to pystatsmodels

I see! Thanks!

With more reading I found out the best way is to do the bootstrap standard errors like suggested in Murphy and Topel.

There is no built-in "bstrap"

function like in Stata for Statsmodel right?

Thanks Josef,

josef...@gmail.com

unread,

Dec 20, 2016, 5:21:20 PM12/20/16

to pystatsmodels

On Tue, Dec 20, 2016 at 5:15 PM, Charles Martineau <martinea...@gmail.com> wrote:

I see! Thanks!
With more reading I found out the best way is to do the bootstrap standard errors like suggested in Murphy and Topel.

There is no built-in "bstrap"
function like in Stata for Statsmodel right?

Not in statsmodels (there is an ancient method in GenericLikelihoodModel for cov_params based on basic resampling of observations).

Kevin wrote large parts of the missing bootstrap functionality which is his arch package
https://github.com/bashtage/arch

Josef

Paul Hobson

unread,

Dec 20, 2016, 6:03:57 PM12/20/16

to pystat...@googlegroups.com

On Tue, Dec 20, 2016 at 2:21 PM, <josef...@gmail.com> wrote:

On Tue, Dec 20, 2016 at 5:15 PM, Charles Martineau <martinea...@gmail.com> wrote:
I see! Thanks!
With more reading I found out the best way is to do the bootstrap standard errors like suggested in Murphy and Topel.

There is no built-in "bstrap"
function like in Stata for Statsmodel right?

Not in statsmodels (there is an ancient method in GenericLikelihoodModel for cov_params based on basic resampling of observations).

Kevin wrote large parts of the missing bootstrap functionality which is his arch package
https://github.com/bashtage/arch

Josef

Kevin's ARCH package is great.

Also probably worth mentioning that a simple percentile bootstrap is alarming brief in numpy:

def percentile(data, statfxn, niter=10000, alpha=0.05):
elements = data.shape[0]
index = numpy.random.randint(low=0, high=elements, size=(niter, elements))
boot_stats = statfxn(data[index], axis=-1)
CI = numpy.percentile(boot_stats, [alpha * 50, 100 - (alpha * 50)], axis=0)

return CI

Where statfxn is any function that reduce a numpy array along an axis (e.g., np.median).

-p

Charles Martineau

unread,

Dec 20, 2016, 6:52:02 PM12/20/16

to pystatsmodels

Paul, Thanks for the example!.

Reply all

Reply to author

Forward