GBM slows down when adding more cores?

Shaun Williams

unread,

Aug 2, 2021, 6:28:38 PM8/2/21

to H2O Open Source Scalable Machine Learning - h2ostream

Training the following GBM model on 2 cores vs 96 cores (on EC2 c5.large and c5.metal) results in faster training times when using *less* cores. I checked the water meter to verify all cores were running.

Training times:

c5.large (2 cores): ~1min
c5.metal (96 cores): ~2min

Training details:

training set size 6840 rows x 95 cols

seed 1

ntrees 1000

max_depth 50

min_rows 10

learn_rate 0.005

sample_rate 0.5

col_sample_rate 0.5

stopping_rounds 2

stopping_metric "MSE"

stopping_tolerance 1.0E-5

score_tree_interval 500

histogram_type "UniformAdaptive"

nbins 800

nbins_top_level 1024

Any thoughts on why this is happening and what to watch out for?

Thanks.

Erin LeDell

unread,

Jan 25, 2022, 3:51:01 PM1/25/22

to H2O Open Source Scalable Machine Learning - h2ostream

Your data is too small to require extra nodes. It's probably fastest if you train on a single node.

-Erin

Michal Kurka

unread,

Jan 27, 2022, 10:44:23 AM1/27/22

to H2O Open Source Scalable Machine Learning - h2ostream

Hello Shaun,

One of the important factors for good performance is data distribution in H2O. When H2O imports a file it first looks at the #cores available and the size of the file and it tries to find an optimal number of "chunks" to split the file in and keep in memory.

Your dataset is too small for this amount of cores, your findings won't translate to a much larger dataset. However, it is still something we would like to look into. It basically means that for the small data like yours the heuristic that split the data in chunks didn't work all that well.

If you could share details about the data and/or the log of your experiment we would be happy to look into it.