Loading Ensemble models (saved using save{base})

298 views
Skip to first unread message

Pedro Henrique Veronezi e Sá

unread,
Dec 14, 2015, 3:39:27 PM12/14/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hello guys,

I developed a model using ensemble, from h2o.ensemble, and I had to save it using the function save{base}, but since I closed R and opened again I think the ip had changed, so I cant make it work.

I can load the model and I can see the attributes for it as an object of h2o class. I did some research and I found this on the documentation:

"Currently, the h2o.ensemble function outputs a list object which makes up the ensemble "model". This R object can be serialized to disk using the R base save function. However, if you save the ensemble model to disk, then use it in the future to generate predictions on a test set using a new H2O cluster instance (with a different cluster IP address), this will not work. This can be fixed by updating the cluster IP address in the saved object with the new one. The model saving process will probably be modified in the future to serialize each of the individual H2O base models using the h2o::saveModel function. Therefore, the saved H2O base models will be accessible individually. Currently, the ensemble fit is stored as a single R list object which contains all the base learner fits, the metalearner fit, and a few other pieces of data."

Specially this part here:
"This can be fixed by updating the cluster IP address in the saved object with the new one."

But I dont know how to fix (change the ip on the model). Can anyone help me with that?

Thanks

Pedro Veronezi

Erin LeDell

unread,
Dec 14, 2015, 6:33:57 PM12/14/15
to Pedro Henrique Veronezi e Sá, H2O Open Source Scalable Machine Learning - h2ostream
Hi Pedro,
What version of h2oEnsemble are you using?  Did you save these ensembles recently or with an old version of h2oEnsemble?  The text below is from an out-of-date version of the README.md file in the repo.  There are now functions to save ensembles that you should use:  `h2o.save_ensemble` and `h2o.load_ensemble`, rather than R's `base::save()` function.

More information in the current README file: https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble

Since you used `base::save()`, to fix, you can try updating the metadata that specifies the H2O cluster IP address to reflect your current IP.  When you `base::load()` the "h2o.ensemble" model, the model will try to connect to whatever IP address was saved in the model object (which is now out of date).  This issue is resolved by using the specialized `h2oEnsemble::h2o.save_ensemble` and `h2oEnsemble::h2o.load_ensemble` functions.

Best,
Erin
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

Pedro Henrique Veronezi e Sá

unread,
Dec 21, 2015, 4:29:22 PM12/21/15
to H2O Open Source Scalable Machine Learning - h2ostream, veronez...@gmail.com
Hello Erin, 

First of all, thanks for your prompt answer. But unfortunately I'm still having problems in saving the models for the Ensemble Package in R.

The versions are:

> packageDescription("h2o")$Version
[1] "3.6.0.8"
> packageDescription("h2oEnsemble")$Version
[1] "0.1.5"


The code that I'm running is:


# Specify the base learner library & the metalearner
learner <- c("h2o.glm.wrapper", "h2o.randomForest.wrapper", 
             "h2o.gbm.wrapper", "h2o.deeplearning.wrapper")
metalearner <- "SL.glm"


#creates the ensemble
fit <- h2o.ensemble( x = predictors,y = response, training_frame = train, 
                    family = "binomial", 
                    learner = learner, 
                    metalearner = metalearner,
                    cvControl = list(V = 5, shuffle = TRUE))

#print(paste0("Time spent:",as.integer(as.integer(Sys.time())-timestamp)," min"))
#evaluate the models:
pred <- predict(fit, test_original)
labels <- as.data.frame(test_original[,c(response)])[,1]

AUC(predictions=as.data.frame(pred$pred)[,1], labels=labels)

#for each single model
L <- length(learner)
sapply(seq(L), function(l) AUC(predictions = as.data.frame(pred$basepred)[,l], labels = labels))



It works fine, until the moment that I try to save the model using the command:

setwd("~/FannieMae")
h2oEnsemble::h2o.save_ensemble(fit,path="ensemble",force=TRUE)

or either:

h2oEnsemble::h2o.save_ensemble(fit,filename="file:/home/hanlon01/FannieMae/ensemble",force=TRUE)
h2oEnsemble::h2o.save_ensemble(fit,path="file:/home/hanlon01/FannieMae/ensemble",force=TRUE)

or this, that I read somewhere that would work:
h2oEnsemble::h2o.save_ensemble(fit,path="file:///home/hanlon01/FannieMae/ensemble",force=TRUE)

any of those seems to work, the error is:

Error in h2o.saveModel(object = object$metafit, path = path, force = force) : 
  `object` must be an H2OModel object

Can you help me with that?

Also, if I need to use the base::save(), how can I modify the metadata for the saved model.

Thanks in advance.

PEdro Veronezi

Erin LeDell

unread,
Dec 21, 2015, 8:30:12 PM12/21/15
to Pedro Henrique Veronezi e Sá, H2O Open Source Scalable Machine Learning - h2ostream
Hi,
Looks like you are running linux.  If so, this should work (remove the file::// stuff):

h2oEnsemble::h2o.save_ensemble(fit,path="/home/hanlon01/FannieMae/ensemble",force=TRUE)

Let me know if that works,
Erin

ianw

unread,
Jan 26, 2016, 4:08:23 PM1/26/16
to H2O Open Source Scalable Machine Learning - h2ostream
Erin I'm having a similar issue on windows.

I've updated my h2o and ensemble packages.

I think you are suggesting that this is a path syntax problem.

h2o.save_ensemble(fit, path = "Project/hens10models", force = TRUE, export_levelone = FALSE)

It saves the individual models to the path but then fails on the meta model

Error ... `object` must be an H2OModel object

Thanks

Ian

Erin LeDell

unread,
Jan 26, 2016, 4:29:30 PM1/26/16
to ianw, H2O Open Source Scalable Machine Learning - h2ostream
Hi Ian,
Did you use an H2O metalearner or a SuperLearner metalearner function
(like SL.glm)? That's my guess based on the error. The support for
SuperLearner metalearners is not really meant to be used (noted in
README), it's more of a debugging tool for me. :-)

For now, you can use the h2o glm for a metalearner to get around the
error. Is there some reason you are using SL metalearner functions? I
would be curious to know if you had better results than h2o glm and that
was the reason.

I'll look into whether it makes sense to support model saving when using
a SL-based metalearner.

Thanks,
Erin

Ianw

unread,
Jan 26, 2016, 4:58:20 PM1/26/16
to H2O Open Source Scalable Machine Learning - h2ostream, ian.c...@googlemail.com
Thank you for such a quick reply.

metalearner <- "SL.glm"

I will try changing it over.

I'm afraid I probably got the choice from an example and it was not really well thought through.

I'll try some other models now I can see the error of my ways!

Regards

Ian

Erin LeDell

unread,
Jan 26, 2016, 5:01:34 PM1/26/16
to Ianw, H2O Open Source Scalable Machine Learning - h2ostream
Yeah,
My examples used to use SL.glm for the metalearner because there was a
time when I was not able to get proper results with h2o.glm, but I fixed
that a while ago. It's probably where you found your example. :-) Look
here for more recent h2oEnsemble examples:
http://learn.h2o.ai/content/tutorials/ensembles-stacking/index.html

Best,
Erin

ianw

unread,
Jan 26, 2016, 8:58:37 PM1/26/16
to H2O Open Source Scalable Machine Learning - h2ostream, ian.c...@googlemail.com
Erin,

This did indeed let me save the complete ensemble but broke most of the code that tried to process a prediction from the model.

metalearner <- "h2o.glm"

I would say the data structure being returned from predict.h2o,ensemble is not consistent when you change the meta-learner. This is not the most friendly behaviour. If I wasn't really supposed to use the sl.glm then it maybe that it is that model that is the issue. However, I did previously have my auc and prediction file save working!

Regards

Ian

pred <- predict.h2o.ensemble(fit, validation_frame)
> labels <- as.data.frame(validation_frame[,c(y)])[,1]
> cat("AUC",AUC(predictions=as.data.frame(pred$pred)[,1], labels=labels),"\n")
Error in prediction(predictions = predictions, labels = labels, label.ordering = label.ordering) :
Format of predictions is invalid.



Erin LeDell

unread,
Jan 26, 2016, 9:22:12 PM1/26/16
to ianw, H2O Open Source Scalable Machine Learning - h2ostream
Hi,
Yep, that's one reason that its not advised to use the SL metalearner
functions. It should be a relatively simple change to your code to
process the prediction output though. The output from an H2O model and
the output from an SL model are different, which is why it's not really
ideal to support both types of metalearners and why I warn against it on
the README. I should probably put some warnings in the actual code (if
someone uses an SL metalearner) to let users know that they somehow
stumbled upon an unsupported feature.

The example code in h2o.ensemble shows the correct way to process the
output. Are you doing a regression or binary classification? Looks like
regression, since you have this:

predictions=as.data.frame(pred$pred)[,1]

The cvAUC::AUC function requires a vector... Can you let me know the
output of this:

predictions <- as.data.frame(pred$pred)[,1]
str(predictions)


Best,
Erin
Reply all
Reply to author
Forward
0 new messages