standard error of coeficients in Lasso regression

ronen

unread,

Jan 30, 2013, 4:05:47 AM1/30/13

to israel-r-...@googlegroups.com

Shalom,

I use Lasso and glmnet regressions because of many potential covariates which some are correlated within each other.

the advantage over ridge regression is that few covariates are selected into the regression- all the rest are set to zero

the simple packages do not give standard error for the coefficients or the predicted values.

In one paper on penalized regression they explain that since we introduce meaningful bias by the penalization there is no meaning to the STDERR

This is at least for the coefficients - not the predictive value.

Others, use Bayesian methods such as the package "blasso"

One possibility is to use bootstrap.

Another suggestion i saw was to run a simple regression with the covariates selected by the lasso. This sounds to me also biased.

I will be happy for any contribution to this problem.

Thank you

Ronen

david golan

unread,

Jan 30, 2013, 12:13:14 PM1/30/13

to israel-r-...@googlegroups.com

Hello all,

You are touching a very delicate subject. Post selection inference is problematic, especially in the case of variable selection in (linear) regression. There are some very recent attempts to solve this problem in general. For example, the post-selection inference (PoSI) approach tries to protect against all possible variable selection methods and routines (even those you are unaware of), and is described (among other places) here. However I am not sure it is very practical as of yet.

A second approach, as you suggested, is the bootstrap. Here there are also some subtleties - since the variable selection procedure is part of your general procedure, it must be part of your "bootstrap world" as well. It is suggested to do a "split bootstrap" in this case: split your sample to two, use bootstrap on the first, apply your model selection procedure, and then use the second part to do bootstrap inference on the selected coefficients. Combine both parts to produce correct CIs for the coefficients. Intuitively - you must keep in mind that in some scenarios the variables you chose in real life are not chosen. Therefore the distribution of the estimators should have an atom at 0. This also explains why your intuition is correct - applying standard regression inference post selection would result in bad CIs, because it ignores these atoms (and the effect of using different sets of variables in different instances is also neglected). This approach (and also the previous) is described at the last slides of this presentation.

Lastly, there's the Bayesian approach. I am no expert on Bayesian statistics (or any other type of statistics for that matter), but I think that in your case I would choose it over the other options for two reasons: First - existing implementation. Second - Since you're doing LASSO, there's a very clear Bayesian interpretation to your model (exponential prior on effect sizes) so it makes sense even if you are not a fan of Bayesian inference.

Hope this helps a bit,

David.

--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ronen

unread,

Jan 31, 2013, 3:14:13 AM1/31/13

to israel-r-...@googlegroups.com

Thank you David,

I will read that presentation about the bootstrap

is there a bayesian elastic net in R?

thanks

ronen

To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsub...@googlegroups.com.

Tal Galili

unread,

Jan 31, 2013, 3:18:10 AM1/31/13

to israel-r-...@googlegroups.com

Hi Ronen,

Also, give a look to this package:

http://cran.r-project.org/web/packages/hdlm/index.html

It gives a lm-like output for elastic net models.

It might prove more useful for your purposes.

Best,

Tal

----------------Contact Details:-------------------------------------------------------
Contact me: Tal.G...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.

ronen

unread,

Jan 31, 2013, 4:30:43 AM1/31/13

to israel-r-...@googlegroups.com

Thank you Tal!

It looks like the implementation of the 2 options David suggested (2 stage bootstrap and bayesian).

There is not enough explanation what the function actually does

also the elastic net needs 2 tuning parameters, alpha and lambda.

There is no input in the function that refers to lambda... how come?

I will look in the reference.

Thanks again

Ronen

To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-group+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Tal Galili

unread,

Jan 31, 2013, 4:48:54 AM1/31/13

to israel-r-...@googlegroups.com

You're welcome Ronen.

If you get to find interesting insights while diving into the documentation, please come back and enlighten us in the group.

Best,

Tal

On Thu, Jan 31, 2013 at 11:30 AM, ronen <rfl...@gmail.com> wrote:

Ronen

ronen

unread,

Jan 31, 2013, 4:50:54 AM1/31/13

to israel-r-...@googlegroups.com

I will, unless I drown...

Reply all

Reply to author

Forward