I am having trouble running on a Windows 10 machine, so I thought I would try running on a new Macbook Air. This machine is running Mac OS 10.11.5, and it has a 2.5 GHZ I7 processor with 16BG memory.
I started h2o with the command: h2o.init(nthreads=-1). After a while the Activity Monitor showed that:
Idle: 86.89%
User: 12.82%
System: 6.47%
I tried to renice (sudo renice 20 -p 738), but that did not change the numbers. At this rate it will take months to run my script.
Any suggestions will be greatly appreciated.
Charles
Bye the way, I also tried "sudo renice -20 -p 738" but that did not change the CPU usage numbers muich.
First it would be great if you could tell us which H2O version you are running and what you are exactly doing. Also how big is your data?
Could you check your memory consumption (either via FlowUI in the Cluster Status or from the command line with the top command) - maybe that's your bottleneck? By default it should use 1/4th of your memory but you might try bumping it by starting h2o.init(nthreads=-1, max_mem_size=8g).
If you are loading more data than you assigned memory then you can get swaps to disk which will slow everything down a lot.
Mateusz
Hi Mateusz,
Here is my startup message:
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 501 milliseconds
H2O cluster version: 3.8.3.2
H2O cluster name: H2O_started_from_R_CBrauer_zkd027
H2O cluster total nodes: 1
H2O cluster total memory: 0.08 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.1 (2016-06-21)
My dataset is rather small (4 x 1000), and I'm doing regression on this dataset.
In particular, I'm doing:
# Hyper-parameter Tuning with Random Search
min_mse = 10000
models <- c()
for (i in 1:100) {
rand_activation = c("TanhWithDropout",
"RectifierWithDropout",
"MaxoutWithDropout")[sample(1:3, 1)]
rand_numlayers = sample(1:5,1)
rand_hidden = c(sample(seq(10,500,5), rand_numlayers, TRUE))
rand_l1 = runif(1, 0, 0.001)
rand_l2 = runif(1, 0, 0.001)
rand_dropout = c(runif(rand_numlayers, 0, 0.6))
rand_input_dropout = runif(1, 0, 0.5)
rand_stopping_metric = c("AUTO", "deviance", "MSE", "r2")[sample(1:4, 1)]
dlmodel <- h2o.deeplearning(x = 1:3, # column numbers for predictors
y = 4, # column number of label
training_frame = train.h2o,
validation_frame = test.h2o,
force_load_balance = TRUE,
hidden = rand_hidden,
activation = rand_activation,
l1 = rand_l1,
l2 = rand_l2,
input_dropout_ratio = rand_input_dropout,
hidden_dropout_ratios = rand_dropout,
nfolds = 5,
variable_importances = TRUE,
epochs = 10000,
seed = 7
)
mse = dlmodel@model$validation_metrics@metrics$MSE
hidden = dlmodel@parameters$hidden
cat("mse: ", mse, ", hidden: ", hidden, "\n")
if (mse < min_mse) {
min_mse = mse
best_model = dlmodel
results("random search", best_model)
}
}
After running for several hours, I also get the messages:
Warning in .h2o.__checkConnectionHealth() :
H2O cluster node 127.0.0.1:54321 is behaving slowly and should be inspected manually
Warning in .h2o.__checkConnectionHealth() :
Check H2O cluster status here: http://localhost:54321/3/Cloud?skip_ticks=true
It looks like the h2o server has shut down.
Charles
H2O cluster total memory: 0.08 GB