Xgboost in H2o

1,736 views
Skip to first unread message

Aakash Gupta

unread,
Sep 8, 2015, 8:41:41 AM9/8/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hi

Is it possible to implement the xgboost algorithm in H2O.  or does the development team have any plans to implement it in the future?



Regards
Aakash 

ccl...@gmail.com

unread,
Sep 8, 2015, 12:46:21 PM9/8/15
to H2O Open Source Scalable Machine Learning - h2ostream
We've talked about this at some length.
It's not on our immediate road map, but it's definitely in our longer road map.
Of course, we'd be happy to take a pull request. :-)

Cliff

Sri Ambati

unread,
Sep 8, 2015, 2:52:13 PM9/8/15
to ccl...@gmail.com, Mark Landry, Arno Candel, Nidhi Mehta, H2O Open Source Scalable Machine Learning - h2ostream
Mark,Nidhi and Arno have a project in the works to compare xgboost with h2o GBM - we are considering adding this in shortly based on the results of their findings.

Sri
> --
> You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Aakash Gupta

unread,
Sep 10, 2015, 12:21:53 AM9/10/15
to H2O Open Source Scalable Machine Learning - h2ostream, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Hi Sri 

Thanks for the quick response. Great to know that you are considering to have some version of xgboost in future versions. It would be a great add-in to the h2o engine

Aakash

kris

unread,
Jan 11, 2016, 11:51:44 AM1/11/16
to H2O Open Source Scalable Machine Learning - h2ostream, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
I'm wondering if there is any progress on comparing xgboost to h2o GBM and are there any benchmarks available? 

Thanks!
Kristina

SriSatish Ambati

unread,
Jan 11, 2016, 12:45:53 PM1/11/16
to kris, H2O Open Source Scalable Machine Learning - h2ostream, Cliff Click, Mark Landry, Arno Candel, Nidhi Mehta
Kris,
 Our latest version of GBM has xgboost and stochastic. 
 Accuracy should be better than the basic gbm already (from empirical testing.)
 Let us know what your experiences are.
Thanks, Sri
--
culture . code . customer
c o m m u n i t y, h2o.ai, 

kris

unread,
Jan 11, 2016, 1:24:50 PM1/11/16
to H2O Open Source Scalable Machine Learning - h2ostream, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Hi, 

I'm looking forward to trying xgboost on h2o! Is there any resource where I can learn more about h2o's implementation of xgboost? I have read all about GBM at http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/3/docs-website/h2o-docs/index.html#Data%20Science%20Algorithms-GBM  and the GBM booklet. 

My understanding is that xgboost's implementation/variation of Friedman's GBM has the following features: 
- more regularization - there are parameters gamma (number of leaves), lambda (L2 regularization of weights) and alpha (L1 regularization of weights)
- special handling of missing values (missing values are imputed in some way)
- subsample and colsample_bytree - which I know are already there in h2o GBM
- custom objective functions

Is there any resource I can consult on h2o's implementation? (Including source code, if necessary...)

Many thanks!
Kristina

Nidhi Mehta

unread,
Jan 11, 2016, 2:11:03 PM1/11/16
to kris, H2O Open Source Scalable Machine Learning - h2ostream, Cliff Click, Mark Landry, Arno Candel
Hi Kristina,

H2O's gbm implementations is similar to xgboost's tree booster in functionality.

In H2O-
- For controlling model complexity, can specify - max depth and min_rows (number of leaves) 
- Regularization can be adjusted by specifying learn rate(eta)
- For robustness to noise, you can specify - sample rate and column sample rate(by tree) 
- For early stoping, you can specify - stoping round, stoping metric and tolerance 
- H2O facilitates cross validation and you can use the balance class flag to deal with imbalanced data. 
-Presently, H2O does not support custom-objective functions.  


 Hope this helps. Please let us know if you have additional questions.

Thanks,
Nidhi

Arno Candel

unread,
Jan 11, 2016, 4:08:20 PM1/11/16
to Nidhi Mehta, kris, H2O Open Source Scalable Machine Learning - h2ostream, Cliff Click, Mark Landry, Arno Candel
Hi Kristina,

For categorical predictors, nbins_cats is another option to control regularization in H2O.


Best,
Arno

Jose A. Guerrero

unread,
Feb 19, 2016, 7:59:32 AM2/19/16
to H2O Open Source Scalable Machine Learning - h2ostream, ni...@0xdata.com, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai
Hi Arno,

I think the main difference between xgboost and gbm is in use of hessian:


So I suppose the difference is still there.

Jiri Materna

unread,
Jul 26, 2016, 2:45:25 AM7/26/16
to H2O Open Source Scalable Machine Learning - h2ostream, ni...@0xdata.com, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai
Hello, 

very interesting thread. Is there any update regarding the comparison of xgboost vs H2O gbm please? 

If I understand correctly there are some plans/acitivities to integrate H2O with different deep learning frameworks (Tensorflow, mxnet, Theano, Caffe) - would not be possible to consider the option to integrate H2O with the core xgboost project?

Thanks a lot,

Jiri

Dne pátek 19. února 2016 13:59:32 UTC+1 Jose A. Guerrero napsal(a):

aw3...@gmail.com

unread,
Dec 18, 2016, 6:30:24 AM12/18/16
to H2O Open Source Scalable Machine Learning - h2ostream, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Hi Sri

Apologies if I'm just being stupid but from your reply it appears that xgboost and stochastic are in the most recent H2O GBM version. However, I can't see any reference to them in the R docs on CRAN - is this functionality available via R?

Many thanks

A

Erin LeDell

unread,
Dec 20, 2016, 6:55:19 PM12/20/16
to aw3...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Hi,

Sorry for the confusion, but xgboost is not "in" H2O, however both
xgboost and H2O implement a stochastic GBM. So in other words, they
implement the same algo, with near identical functionality. There are
some implementation differences that lead to different results depending
on your data, however.

-Erin
--
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

SriSatish

unread,
Dec 21, 2016, 12:28:44 AM12/21/16
to Erin LeDell, aw3...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Thanks, Erin.
What I meant to say is that we have the same algorithm implemented. I must have written a response in a hurry. Sorry for the confusion.

Although we have (more recently) thought of integrating the real original xgboost into our newer distributions of h2o - especially after we have been doing something like that with other packages like say, mxnet in the DeepWater

Sri

Peng Lee

unread,
Jan 6, 2017, 12:00:15 PM1/6/17
to H2O Open Source Scalable Machine Learning - h2ostream, kpl...@gmail.com, ccl...@gmail.com, ma...@0xdata.com, ar...@h2o.ai, ni...@0xdata.com
Based on my experience, H2O's GBM has comparable predictive performance to that of xgboost. 

Peng 

Arun Kiran Aryasomayajula

unread,
Apr 11, 2017, 11:38:17 AM4/11/17
to H2O Open Source Scalable Machine Learning - h2ostream
Hi Arno, Sri

Can you comment on the speed benchmarks of the H2o GBM vs xgboost (original) say on a standard classification task? Are these comparable because I remember h2o gbm being considerably slower than xgboost (tried both on R on a Intel i-7 6th gen with 8 total cores and 16GB RAM)?

-Arun

Tom Kraljevic

unread,
Apr 11, 2017, 3:31:54 PM4/11/17
to Arun Kiran Aryasomayajula, H2O Open Source Scalable Machine Learning - h2ostream

Hi,

Note a speed improvement was made to H2O’s gbm within the last few months regarding ping-ponging of CAS instructions (Atomic updates in Java) when updating the shared histograms.
(As in:  removing the ping-ponging and corresponding false sharing of cache lines.)

This improved gbm speeds on some datasets by as much as 4x.  The effect would be more pronounced on hosts with a lot of cores.
So it’s worth trying again if you haven’t tried for a while.

Thanks,
Tom


g.carmic...@gmail.com

unread,
Apr 13, 2017, 8:49:12 AM4/13/17
to H2O Open Source Scalable Machine Learning - h2ostream
It looks like XGBoost is coming to H2O! If I have read this correctly?
https://www.slideshare.net/0xdata/arno-candel-aibythebay-030617

On slide 15. What do you guys think??

Erin LeDell

unread,
Apr 13, 2017, 2:56:09 PM4/13/17
to g.carmic...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream

Yes, that's correct. 

-Erin

George Carmichael

unread,
Apr 13, 2017, 4:08:47 PM4/13/17
to Erin LeDell, H2O Open Source Scalable Machine Learning - h2ostream
Great, thanks!
--
Thanks,

George 

connek...@googlemail.com

unread,
Apr 20, 2017, 10:52:47 AM4/20/17
to H2O Open Source Scalable Machine Learning - h2ostream, g.carmic...@gmail.com
Hi Erin,

any information about the release date of the new h2o version (including the single-node xgboost implementation)?

Thanks in advance,
Constantin

Peng Lee

unread,
Apr 20, 2017, 11:37:11 AM4/20/17
to connek...@googlemail.com, H2O Open Source Scalable Machine Learning - h2ostream, g.carmic...@gmail.com
Hi Erin:

When you say single-node, will this be the GPU version?

Peng


--
You received this message because you are subscribed to a topic in the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/Hj7pZW5-HVc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+unsubscribe@googlegroups.com.

Darren Cook

unread,
Apr 20, 2017, 12:09:59 PM4/20/17
to h2os...@googlegroups.com
> When you say single-node, will this be the GPU version?

A poke around the commits on github suggests it will use a GPU if available:

https://github.com/h2oai/h2o-3/pull/699/commits/e9f3343c3414fa8bd95927f2635176bee0d63d00


> > Single-node XGBoost will be released in H2O 3.12.0 (our next
> > major release).https://github.com/h2oai/h2o-3/pull/699
> <https://github.com/h2oai/h2o-3/pull/699>


Darren




--
Darren Cook, Software Researcher/Developer
My New Book: Practical Machine Learning with H2O:
http://shop.oreilly.com/product/0636920053170.do

Erin LeDell

unread,
Apr 20, 2017, 4:05:24 PM4/20/17
to Darren Cook, h2os...@googlegroups.com

The next major release of H2O will be 3.12 (including XGBoost and AutoML) and my best guess is that it's sometime in May, but don't hold me to that....

Single node GPU/CPU is part of the first release.  Multinode is a future release.

If you are interested in playing around with it, for now, you can checkout the arno-xgboost branch of h2o-3 and build it yourself.  I will see if we can post the jar files to S3 so you all can try it out and provide your feedback without having to build it from scratch.  We have some jars floating around internally that I can track down...

-Erin


On 4/20/17 9:09 AM, Darren Cook wrote:
When you say single-node, will this be the GPU version?
A poke around the commits on github suggests it will use a GPU if available:

https://github.com/h2oai/h2o-3/pull/699/commits/e9f3343c3414fa8bd95927f2635176bee0d63d00


    >       Single-node XGBoost will be released in H2O 3.12.0 (our next
    >         major release).https://github.com/h2oai/h2o-3/pull/699
    <https://github.com/h2oai/h2o-3/pull/699>

Darren





-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

Erin LeDell

unread,
Apr 20, 2017, 4:06:51 PM4/20/17
to Darren Cook, h2os...@googlegroups.com

> Single node GPU/CPU is part of the first release.  Multinode is a future release.

^^ I am referring to single node GPU/CPU support for XGBoost here (rather than H2O), if that was not clear.
Reply all
Reply to author
Forward
0 new messages