standard error of coeficients in Lasso regression

1,815 views
Skip to first unread message

ronen

unread,
Jan 30, 2013, 4:05:47 AM1/30/13
to israel-r-...@googlegroups.com
Shalom,
I use Lasso and glmnet regressions because of many potential covariates which some are correlated within each other.
the advantage over ridge regression is that few covariates are selected into the regression- all the rest are set to zero
the simple packages do not give standard error for the coefficients or the predicted values.
 
In one paper on penalized regression they explain that since we introduce meaningful bias by the penalization there is no meaning to the STDERR
This is at least for the coefficients - not the predictive value.
Others, use Bayesian methods such as the package "blasso"
One possibility is to use bootstrap.
Another suggestion i saw was to run a simple regression with the covariates selected by the lasso. This sounds to me also biased.
 
 
I will be happy for any contribution to this problem.
 
Thank you
Ronen

david golan

unread,
Jan 30, 2013, 12:13:14 PM1/30/13
to israel-r-...@googlegroups.com
Hello all,

You are touching a very delicate subject. Post selection inference is problematic, especially in the case of variable selection in (linear) regression. There are some very recent attempts to solve this problem in general. For example, the post-selection inference (PoSI) approach tries to protect against all possible variable selection methods and routines (even those you are unaware of), and is described (among other places) here. However I am not sure it is very practical as of yet. 

A second approach, as you suggested, is the bootstrap. Here there are also some subtleties - since the variable selection procedure is part of your general procedure, it must be part of your "bootstrap world" as well. It is suggested to do a "split bootstrap" in this case: split your sample to two, use bootstrap on the first, apply your model selection procedure, and then use the second part to do bootstrap inference on the selected coefficients. Combine both parts to produce correct CIs for the coefficients. Intuitively - you must keep in mind that in some scenarios the variables you chose in real life are not chosen. Therefore the distribution of the estimators should have an atom at 0. This also explains why your intuition is correct - applying standard regression inference post selection would result in bad CIs, because it ignores these atoms (and the effect of using different sets of variables in different instances is also neglected). This approach (and also the previous) is described at the last slides of this presentation.

Lastly, there's the Bayesian approach. I am no expert on Bayesian statistics (or any other type of statistics for that matter), but I think that in your case I would choose it over the other options for two reasons: First - existing implementation. Second - Since you're doing LASSO, there's a very clear Bayesian interpretation to your model (exponential prior on effect sizes) so it makes sense even if you are not a fan of Bayesian inference. 

Hope this helps a bit,
David.





--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

ronen

unread,
Jan 31, 2013, 3:14:13 AM1/31/13
to israel-r-...@googlegroups.com
Thank you David,
 
I will read that presentation about the bootstrap
 
is there a bayesian elastic net in R?
 
thanks
ronen 
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsub...@googlegroups.com.

Tal Galili

unread,
Jan 31, 2013, 3:18:10 AM1/31/13
to israel-r-...@googlegroups.com
Hi Ronen,

Also, give a look to this package:
It gives a lm-like output for elastic net models.
It might prove more useful for your purposes.

Best,
Tal


----------------Contact Details:-------------------------------------------------------
Contact me: Tal.G...@gmail.com
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)
----------------------------------------------------------------------------------------------



To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.

ronen

unread,
Jan 31, 2013, 4:30:43 AM1/31/13
to israel-r-...@googlegroups.com
Thank you Tal!
It looks like the implementation of the 2 options David suggested (2 stage bootstrap and bayesian).
There is not enough explanation what the function actually does
also the  elastic net needs 2 tuning parameters, alpha and lambda.
There is no input in the function that refers to lambda... how come?
I will look in the reference.
 
Thanks again
Ronen
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tal Galili

unread,
Jan 31, 2013, 4:48:54 AM1/31/13
to israel-r-...@googlegroups.com
You're welcome Ronen.

If you get to find interesting insights while diving into the documentation, please come back and enlighten us in the group.

Best,
Tal


On Thu, Jan 31, 2013 at 11:30 AM, ronen <rfl...@gmail.com> wrote:
Ronen

ronen

unread,
Jan 31, 2013, 4:50:54 AM1/31/13
to israel-r-...@googlegroups.com
I will, unless I drown...
Reply all
Reply to author
Forward
0 new messages