I am able to create comparable model, but only at the expense of too many variables in the model (e.g. 600). I have 3,000 attributes and all of them are factors (enums), leading to 18,000 weights to estimate in total.
My goal is to find parsimonious model that will include less than few tens of attributes (< 20). I tried grid search over alpha with lambda search enabled, manually choosing lambdas/alphas, but I do not see any way how to get dense model (in terms of used attributes, not estimated weights) that would be powerful enough.
What would be the good approach to this?
that is surprising to me as well, although I think your case is specific in that you want to select only very few number of predictors of the total and it might be possible that step-wise approach works better here as the regularization strength needed to filter all other coefficients out might be too strong.
The approach I would recommend would be to run lambda search with alpha = 1 and set max_predictors slightly higher than what you want, e.g. ~30 if you want about ~20 in your model. the number really specifies number of active predictors after applying strong rules screening and is generally a little higher than actual number of nonzero coefficients in the final model. If the model is not good enough you can retrain glm with only the coefficients present in the model.
Alternatively, you may try to run full lambda search, get the best result and then take N coefficients with highest absolute value and then retrain your model with those.
Best,
Tomas