Why is running h2o on a Macbook so slow?

charle...@gmail.com

unread,

Jul 4, 2016, 1:42:15 PM7/4/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hey,

I am having trouble running on a Windows 10 machine, so I thought I would try running on a new Macbook Air. This machine is running Mac OS 10.11.5, and it has a 2.5 GHZ I7 processor with 16BG memory.

I started h2o with the command: h2o.init(nthreads=-1). After a while the Activity Monitor showed that:

Idle: 86.89%
User: 12.82%
System: 6.47%

I tried to renice (sudo renice 20 -p 738), but that did not change the numbers. At this rate it will take months to run my script.

Any suggestions will be greatly appreciated.

Charles

charle...@gmail.com

unread,

Jul 4, 2016, 1:54:03 PM7/4/16

to H2O Open Source Scalable Machine Learning - h2ostream, charle...@gmail.com

Bye the way, I also tried "sudo renice -20 -p 738" but that did not change the CPU usage numbers muich.

mat...@0xdata.com

unread,

Jul 5, 2016, 3:10:29 AM7/5/16

to H2O Open Source Scalable Machine Learning - h2ostream, charle...@gmail.com

Hey Charles,

First it would be great if you could tell us which H2O version you are running and what you are exactly doing. Also how big is your data?

Could you check your memory consumption (either via FlowUI in the Cluster Status or from the command line with the top command) - maybe that's your bottleneck? By default it should use 1/4th of your memory but you might try bumping it by starting h2o.init(nthreads=-1, max_mem_size=8g).

If you are loading more data than you assigned memory then you can get swaps to disk which will slow everything down a lot.

Mateusz

charle...@gmail.com

unread,

Jul 5, 2016, 1:13:13 PM7/5/16

to H2O Open Source Scalable Machine Learning - h2ostream, charle...@gmail.com

On Monday, July 4, 2016 at 10:42:15 AM UTC-7, charle...@gmail.com wrote:

Hi Mateusz,

Here is my startup message:

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 501 milliseconds
H2O cluster version: 3.8.3.2
H2O cluster name: H2O_started_from_R_CBrauer_zkd027
H2O cluster total nodes: 1
H2O cluster total memory: 0.08 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.1 (2016-06-21)

My dataset is rather small (4 x 1000), and I'm doing regression on this dataset.
In particular, I'm doing:

# Hyper-parameter Tuning with Random Search
min_mse = 10000
models <- c()
for (i in 1:100) {
rand_activation = c("TanhWithDropout",
"RectifierWithDropout",
"MaxoutWithDropout")[sample(1:3, 1)]
rand_numlayers = sample(1:5,1)
rand_hidden = c(sample(seq(10,500,5), rand_numlayers, TRUE))
rand_l1 = runif(1, 0, 0.001)
rand_l2 = runif(1, 0, 0.001)
rand_dropout = c(runif(rand_numlayers, 0, 0.6))
rand_input_dropout = runif(1, 0, 0.5)
rand_stopping_metric = c("AUTO", "deviance", "MSE", "r2")[sample(1:4, 1)]

dlmodel <- h2o.deeplearning(x = 1:3, # column numbers for predictors
y = 4, # column number of label
training_frame = train.h2o,
validation_frame = test.h2o,
force_load_balance = TRUE,
hidden = rand_hidden,
activation = rand_activation,
l1 = rand_l1,
l2 = rand_l2,
input_dropout_ratio = rand_input_dropout,
hidden_dropout_ratios = rand_dropout,
nfolds = 5,
variable_importances = TRUE,
epochs = 10000,
seed = 7
)

mse = dlmodel@model$validation_metrics@metrics$MSE
hidden = dlmodel@parameters$hidden
cat("mse: ", mse, ", hidden: ", hidden, "\n")
if (mse < min_mse) {
min_mse = mse
best_model = dlmodel
results("random search", best_model)
}
}

After running for several hours, I also get the messages:

Warning in .h2o.__checkConnectionHealth() :
H2O cluster node 127.0.0.1:54321 is behaving slowly and should be inspected manually
Warning in .h2o.__checkConnectionHealth() :
Check H2O cluster status here: http://localhost:54321/3/Cloud?skip_ticks=true

It looks like the h2o server has shut down.

Charles

Darren Cook

unread,

Jul 5, 2016, 2:06:32 PM7/5/16

to h2os...@googlegroups.com

> rand_stopping_metric = c("AUTO", "deviance", "MSE", "r2")[sample(1:4, 1)]

You don't seem to be using this (not that it matters, as early stopping
should be on by default).

> validation_frame = test.h2o,
> ...
> nfolds = 5,

Normally you use nfolds, or validation_frame. nfolds=5 means it will
make 6 models. So, altogether you are making 600 models.

> epochs = 10000,

This is very high, but as you are (implicitly) using early stopping that
shouldn't be the cause of a slowdown.

Can you watch the models being made on Flow? (http://127.0.0.1:54231/).
How many models has it made by the time you get the warning message? And
as you watch each model being made, does any model seem to take much
longer to build? If so, which set of random parameters did it get? How
many epochs did it use?

Darren

Tom Kraljevic

unread,

Jul 5, 2016, 3:10:32 PM7/5/16

to charle...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream

H2O cluster total memory: 0.08 GB

0.08 GB?

Check if that’s really right. But if it is, that’s your problem.

charle...@gmail.com

unread,

Jul 5, 2016, 3:51:57 PM7/5/16

to H2O Open Source Scalable Machine Learning - h2ostream, charle...@gmail.com

As I said I started h2o with the command: h2o.init(nthreads=-1). If the server only assigned 0.8 GB, then this is a bug!

Darren Cook

unread,

Jul 5, 2016, 4:05:52 PM7/5/16

to h2os...@googlegroups.com

>> H2O cluster total memory: 0.08 GB
>>

>> Check if that’s really right. But if it is, that’s your problem.
>
> As I said I started h2o with the command: h2o.init(nthreads=-1). If
> the server only assigned 0.8 GB, then this is a bug!

Have you installed 32-bit java? (or is it 32-bit Mac OS - not sure if
such a thing still exists...)

(Though I think you'd still get 2-4GB even with 32-bit.)

Darren

charle...@gmail.com

unread,

Jul 5, 2016, 4:22:08 PM7/5/16

to H2O Open Source Scalable Machine Learning - h2ostream, charle...@gmail.com

I started the h2o server with:

localH2O = h2o.init(ip = "localhost",
port = 54321,
startH2O = TRUE,
max_mem_size="16G",
nthreads = -1)

and I now see CPU usage at 86%. This means that we MUST specify the memory size. Thanks guys for the help and suggestions.

Charles

Reply all

Reply to author

Forward