Missing values in prediction after logit command

11 views
Skip to first unread message

J

unread,
Nov 12, 2013, 1:15:02 PM11/12/13
to stata-us...@googlegroups.com

I am running a logit model with many explanatory variables. I am purely concerned with prediction; I will not be interpreting the coeffecients. The application is estimating conditional choice probabilities for a dynamic discrete choice model. The binary choice is represented as a function of the state space. I want to estimate this in the most flexible way possible.  

I have a polynomial expansion of a continuous variable, two dummy variables and the interaction of every variable with each other as explanatory variables. There are 100-200 RHS variables. These are a flexible function of the state space of the dynamic model.  I have more than 2900 observations for estimation.

 Some of the explanatory variables are collinear which Stata throws out at the beginning. Others perfect predict outcomes so those variables and the observations they perfectly predict are eliminated. About 1500 observations are eliminated in this way. 

I have two issues:

1) I run into convergance problems when estimating unless I manually eliminate several more variables manually. I am assuming that this is an issue of almost perfect collinearity. Is this correct? What is a good rule here for elminating variables automatically?

2) Because of the dropped observations, I don’t get predictions for 1500 observations. What do I use as a prediction here? I need a prediction for each possible state.


I will be repeating the estimation process hundereds of times for different individuals so I can’t manually finesse the variables to make it work each time. 


So far I have seen two options:

1) Limit the model to use less flexible functional forms of the state variables. I.e. get rid of some of the interactions. This isn’t ideal since I want as much flexiblitiy as possible.

2) Ditch the logit and use a linear probablity model.  Restrrict the predictions to be between 0 and 1 after the estimation. Again, not ideal, but it works every time.  

Reply all
Reply to author
Forward
0 new messages