Ok, third and final answer (my previous email had a mistake in the code)
-- I realized that you don't actually need to explicitly set
`h2o.init(nthreads = 1)`, it will force H2O to run single-threaded
automatically.
However, what you are probably missing is setting the seed directly in
the h2o.deeplearning function directly (rather than using `set.seed(1)`).
This code shows the correct way to enforce reproducibility:
library(h2o)
h2o.init(nthreads = -1)
# Import a sample binary outcome train/test set into R
train <-
read.table("
http://www.stat.berkeley.edu/~ledell/data/higgs_10k.csv",
sep=",")
test <-
read.table("
http://www.stat.berkeley.edu/~ledell/data/higgs_test_5k.csv", sep=",")
# Convert R data.frames into H2O parsed data objects
training_frame <- as.h2o(train)
validation_frame <- as.h2o(test)
y <- "V1"
x <- setdiff(names(training_frame), y)
family <- "binomial"
training_frame[,c(y)] <- as.factor(training_frame[,c(y)]) #Force Binary
classification
validation_frame[,c(y)] <- as.factor(validation_frame[,c(y)])
fit <- h2o.deeplearning(x = x, y = y, training_frame = training_frame,
reproducible = TRUE, seed = 1)
h2o.auc(fit)
#[1] 0.873428
fit2 <- h2o.deeplearning(x = x, y = y, training_frame = training_frame,
reproducible = TRUE, seed = 1)
h2o.auc(fit2)
#[1] 0.873428