GBM slows down when adding more cores?

26 views
Skip to first unread message

Shaun Williams

unread,
Aug 2, 2021, 6:28:38 PM8/2/21
to H2O Open Source Scalable Machine Learning - h2ostream
Training the following GBM model on 2 cores vs 96 cores (on EC2 c5.large and c5.metal) results in faster training times when using *less* cores.  I checked the water meter to verify all cores were running.

Training times:
  • c5.large (2 cores): ~1min
  • c5.metal (96 cores): ~2min
Training details:

    training set size     6840 rows x 95 cols
    
    seed                  1
    ntrees                1000
    max_depth             50
    min_rows              10
    learn_rate            0.005
    sample_rate           0.5
    col_sample_rate       0.5
    stopping_rounds       2
    stopping_metric       "MSE"
    stopping_tolerance    1.0E-5
    score_tree_interval   500
    histogram_type        "UniformAdaptive"
    nbins                 800
    nbins_top_level       1024

Any thoughts on why this is happening and what to watch out for?

Thanks.

Erin LeDell

unread,
Jan 25, 2022, 3:51:01 PM1/25/22
to H2O Open Source Scalable Machine Learning - h2ostream
Your data is too small to require extra nodes.  It's probably fastest if you train on a single node.

-Erin

Michal Kurka

unread,
Jan 27, 2022, 10:44:23 AM1/27/22
to H2O Open Source Scalable Machine Learning - h2ostream
Hello Shaun,

One of the important factors for good performance is data distribution in H2O. When H2O imports a file it first looks at the #cores available and the size of the file and it tries to find an optimal number of "chunks" to split the file in and keep in memory.

Your dataset is too small for this amount of cores, your findings won't translate to a much larger dataset. However, it is still something we would like to look into. It basically means that for the small data like yours the heuristic that split the data in chunks didn't work all that well.

If you could share details about the data and/or the log of your experiment we would be happy to look into it.

Thank you,
MK
Reply all
Reply to author
Forward
0 new messages