h2o+R h2o.predict function and confidence intervals

1,076 views
Skip to first unread message

georgina

unread,
Jan 12, 2015, 12:44:33 AM1/12/15
to h2os...@googlegroups.com
Hello,

I was wondering whether there is a function on h2o which provides the 95% confidence interval of the values from the:
h2o.predict function?

Thank you very much

Erin LeDell

unread,
Jan 20, 2015, 3:09:48 PM1/20/15
to h2os...@googlegroups.com, georgi...@gmail.com
Georgina,
I don't think there is anything in H2O that provides CIs for the individual predicted values.  However, you might find it useful to quantify the uncertainty of the performance of your predictive model in general. 

If your interest is assessing model performance for binary classification, and you'd like confidence intervals for AUC, then you can use this package in R: https://github.com/ledell/cvAUC
If you are interested in quantifying uncertainty for some other estimator, then you can consider boostrapping.

-Erin

SriSatish Ambati

unread,
Jan 29, 2015, 1:42:29 AM1/29/15
to Erin LeDell, Patrick Aboyoun, h2os...@googlegroups.com, georgi...@gmail.com
Erin & Patrick,
Seems like this is useful to add as a feature for H2O? If it is let's add a JIRA -
thanks, Sri

--
You received this message because you are subscribed to the Google Groups "H2O & Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
ceo & co-founder, 0xdata Inc

Patrick Aboyoun

unread,
Jan 29, 2015, 2:51:46 AM1/29/15
to SriSatish Ambati, Erin LeDell, h2os...@googlegroups.com, georgi...@gmail.com
Sri and Erin,
Here is the JIRA we can use to track this feature:


The first step down this road is working with Cliff on what the shape of the output should be. Once we have the infrastructure in place, the next step would be to add confidence interval support in GLM for the case where there is no L1 penalty as a representative example for how algorithms can use this infrastructure.


Patrick

Erin LeDell

unread,
Jan 29, 2015, 3:42:58 AM1/29/15
to Patrick Aboyoun, SriSatish Ambati, h2os...@googlegroups.com, georgi...@gmail.com
Sri and Patrick,

If you want fast confidence intervals (no boostrapping), you can use influence curve based variance estimation like I do in this R package, cvAUC.  Influence curve (IC) based CIs are unique to a particular estimator (ie. AUC), so you have to derive the IC for any estimator you want to provide CIs for.

It's already relatively fast in R (because it uses data.table sorting), but I'm sure you could find a place to put this inside h2o (and speed it up even more). 

Runtime (v 1.1.0):
  • 100,000 observations: ~0.5 seconds
  • 1 million observations: ~13.0 seconds

-Erin


Patrick Aboyoun

unread,
Jan 29, 2015, 10:19:10 AM1/29/15
to Erin LeDell, SriSatish Ambati, h2os...@googlegroups.com, Georgina Zu
Thanks Erin!
I was looking at your code based upon your previous response. We'll make sure we add influence curve based confidence intervals (ICBCI, a palindrome initialism ;-) ) to the mix. Cool stuff.


Patrick
Reply all
Reply to author
Forward
0 new messages