Feature Selection High Dimensional Data

Skip to first unread message

Sarthak Kala

Sep 21, 2022, 9:54:52 AM9/21/22
to TensorFlow Probability
I am using PyMC to create MAP estimate with a Laplace prior on weights (Lasso regression), it is shrinking coefficients close to zero but not getting exactly sparse coefficients. I am working with a high dimensional data (>100 columns), even after Laplace prior condition getting 28 columns. Is there any other way to reduce the features ? 

Colin Carroll

Sep 21, 2022, 10:26:24 AM9/21/22
to TensorFlow Probability, sarthak...@gmail.com
The PyMC MAP estimate is a thin wrapper around `scipy.minimize` to optimize your log probability (using L-BFGS-B, unless you have discrete variables, then it uses Powell). 

Do you have a reason to believe 28 non-zero columns is too many? 

If you are using version 4 of PyMC, you might use

from pymc import sampling_jax

with pm.Model():
    ... define your model...
    jax_logp = sampling_jax.get_jaxified_logp()
Then look at a library like JAXopt to have more control over the optimization of the log probability.

Sarthak Kala

Sep 22, 2022, 4:35:40 AM9/22/22
to TensorFlow Probability, colca...@google.com, Sarthak Kala
Yes, I have a reason to believe 28 features are way too high. It should be around 5-7 features only.

Mike Lawrence

Sep 22, 2022, 8:04:52 AM9/22/22
to Sarthak Kala, TensorFlow Probability, colca...@google.com
Check out the Finnish Horseshoe prior?

You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/006e28b0-f978-4d00-8b12-384bdaa6cfcen%40tensorflow.org.

Mike Lawrence, PhD
Co-founder & Research Scientist
Axem Neurotechnology

~ Certainty is (usually) folly ~
Reply all
Reply to author
0 new messages