Variable Selection Through GLM or Variable Importance Through RF ?

843 views
Skip to first unread message

Mayank Aggarwal

unread,
Feb 12, 2014, 4:10:25 AM2/12/14
to h2os...@googlegroups.com
Hi H2O team 

Wonderful platform and thanks for all the hardwork 

I was exploring the platform and was trying to play around in R with it . 

However as you know , Predictive analytics is just not all about prediction , it is also about inference 

With that context , can you tell us whether you have developed algorithms which can help me know important significant variables (based on p- values just like forward/stepwise selection methods)

or variable importance features with RF and GBM 


Thanks
Mayank

ccl...@gmail.com

unread,
Feb 12, 2014, 12:28:19 PM2/12/14
to h2os...@googlegroups.com

RF has variable importance. Work-in-progress for GBM.
For logistic regression, we sort the normalized coef's; larger magnitude coef's have more importance.

Cliff

mic...@0xdata.com

unread,
Feb 12, 2014, 2:24:39 PM2/12/14
to h2os...@googlegroups.com
Mayank,

thanks for trying H2O!

A few details regarding variable importance in RF:

We support variable importance for DRF (menu "Model > Distributed RF (beta)", you have to check explicit option "compute variable importance" before launching DRF).
The variable importance is computed as mean decrease in accuracy (as it was proposed in original Breiman's RF paper - see here http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#varimp)  and we produce unscaled numbers. 

As Cliff mentioned, we are working on different strategies to measure varimp and on varimp for GBM.

Michal

Mayank Aggarwal

unread,
Feb 14, 2014, 8:43:46 AM2/14/14
to mic...@0xdata.com, h2os...@googlegroups.com
Hi Michael 

Thanks for your response

Do you have any function (variable selection through RF)  for the same in R 

If you can share code snippet that would be great 


Thanks
Mayank



--
You received this message because you are subscribed to a topic in the Google Groups "H2O Users - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/NIEBbW-hUo0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michal Malohlava

unread,
Feb 16, 2014, 11:37:01 PM2/16/14
to Mayank Aggarwal, h2os...@googlegroups.com
Hi Mayank,

our R-API for RF does not publish the variable importance option right now.
I proposed a change - I will let you know as soon as it will be accessible (perhaps during this week)

Thank you!
Michal

Dne 2/14/14 5:43 AM, Mayank Aggarwal napsal(a):

Mayank Aggarwal

unread,
Feb 17, 2014, 12:09:51 AM2/17/14
to Michal Malohlava, h2os...@googlegroups.com
Hey Thanks .. thats wonderful . I tested h2o .. it is really cool . 

I hope you add more functions to the R-API .... to engage R community in a bigger way .  because R is connected to infinite other interfaces

Will wait for your notice on the variable importance option

Sri

unread,
Feb 17, 2014, 12:57:13 AM2/17/14
to Mayank Aggarwal, Michal Malohlava, h2os...@googlegroups.com
Mayank,
Thanks for the comment.

R is very core to our mission - we started almost directly to make the R community's experience faster and ready for larger datasets.

Plyr on H2O is a focus for us this season. And more is coming, thanks to the encouragement of our early adopters. 

Perhaps you can give a git pull for an RUnit or two yourself to express your use case in code form?
https://github.com/0xdata/h2o/tree/master/R

Thanks again,
Sri
You received this message because you are subscribed to the Google Groups "H2O Users - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.

Mayank Aggarwal

unread,
Feb 21, 2014, 5:40:24 AM2/21/14
to Sri, Michal Malohlava, h2os...@googlegroups.com
Hi Sri 

Will surely do that in spare time . I also have some thoughts on how this could be integrated and used for Enterprise practices . I am my self trying to develop a prototype using h20 

Will write you a detailed mail on the what i think this could potentially be useful 

Mike - Would be great if you can share the progress on variable importance feature for RF  . I am looking forward to it 


Thanks



Reply all
Reply to author
Forward
0 new messages