I created a deep learning model with h2o in R. But it still has very poor performance. How can I optimise the outcome? What parameters can I how change? Is it just randomly changing parameters and see what happens to the outcome?
I already integrated grid search and automatic parameter tuning. How should I change the hidden layers? Is there a concept to further optimise the number of layers?
This is my code so far:
res.dl <- h2o.deeplearning(x = 2:23, y = 1, data = trData,classification=TRUE,
activation = "Tanh",hidden=rep(160,150,200),epochs = 20)
pred_labels <- h2o.predict(res.dl, tsData)[,1]
actual_labels <- tsData[,1]
cm <- h2o.confusionMatrix(pred_labels,actual_labels)
res.dl@model$confusion
#grid search
grid_search <- h2o.deeplearning(x=c(2:23), y=1, data=trData, ,classification=TRUE,validation=tsData,
hidden=list(c(10,10),c(20,20)), epochs=0.1,
activation=c("Tanh", "Rectifier"), l1=c(0,1e-5))
best_model <- grid_search@model[[1]]
best_model
best_params <- best_model@model$params
best_params$activation
best_params$hidden
best_params$l1
#Hyper-parameter Tuning with Random Search
models <- c()
for (i in 1:10) {
rand_activation <- c("TanhWithDropout", "RectifierWithDropout")[sample(1:2,1)]
rand_numlayers <- sample(2:5,1)
rand_hidden <- c(sample(10:50,rand_numlayers,T))
rand_l1 <- runif(1, 0, 1e-3)
rand_l2 <- runif(1, 0, 1e-3)
rand_dropout <- c(runif(rand_numlayers, 0, 0.6))
rand_input_dropout <- runif(1, 0, 0.5)
dlmodel <- h2o.deeplearning(x=2:23, y=1, data=trData, validation=tsData, ,classification=TRUE,epochs=0.1,
activation=rand_activation, hidden=rand_hidden, l1=rand_l1, l2=rand_l2,
input_dropout_ratio=rand_input_dropout, hidden_dropout_ratios=rand_dropout)
models <- c(models, dlmodel)
}
best_err <- best_model@model$valid_class_error #best model from grid search above
for (i in 1:length(models)) {
err <- models[[i]]@model$valid_class_error
if (err < best_err) {
best_err <- err
best_model <- models[[i]]
}
}
best_model
best_params <- best_model@model$params
best_params$activation
best_params$hidden
best_params$l1
best_params$l2
best_params$input_dropout_ratio
best_params$hidden_dropout_ratios
dlmodel_continued <- h2o.deeplearning(x=c(2:23), y=1, data=trData, validation=tsData,,classification=TRUE,
checkpoint = best_model, l1=best_params$l1, l2=best_params$l2, epochs=0.5)
dlmodel_continued@model$valid_class_error
#train further
dlmodel_continued_again <- h2o.deeplearning(x=c(2:23), y=1, data=trData, validation=tsData,,classification=TRUE,
checkpoint = dlmodel_continued, l1=best_params$l1, epochs=0.5)
dlmodel_continued_again@model$valid_class_error
Thank you in advance.
Regards
--
You received this message because you are subscribed to the Google Groups "H2O & Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
metalearner <- c("SL.nnls")
family <- "binomial"
fit <- h2o.ensemble(x = 2:23, y = 1, data = trData, family = family, learner = learner, metalearner = metalearner, cvControl = list(V=20))
#Performance
pred <- predict(fit, tsData)
labels <- as.data.frame(tsData[,1])[,1]
# Ensemble test AUC
AUC(predictions=as.data.frame(pred$pred)[,1], labels=labels)