Saving Output of H2O Python Code

724 views
Skip to first unread message

katherin...@gmail.com

unread,
Jun 7, 2016, 4:32:53 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream, katherin...@macys.com
Hello,

I'm running distributed random forests on data in Python 3.5, using H2O h2o-3.8.2.8. I can print/look at most of my output, but I'm having trouble saving the output. For example, I'd like to export and save my confusion matrix to csv. When I check the type of the confusion matrix, I see that it is "h2o.model.confusion_matrix.ConfusionMatrix". Does anyone know how to transform this to a dataframe or other python object to export and save?

Thanks for any help!

brendan...@capitalone.com

unread,
Jun 7, 2016, 7:46:08 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream, katherin...@macys.com, katherin...@gmail.com
How are you accessing the confusion matrix? If its returning an H2OFrame, you should be able to use something like:

h2o.H2OFrame().as_data_frame(use_pandas=False)

katherin...@gmail.com

unread,
Jun 8, 2016, 9:11:23 AM6/8/16
to H2O Open Source Scalable Machine Learning - h2ostream, katherin...@macys.com, katherin...@gmail.com
Thanks so much for responding.

When I use the following command:

confusion=rf_model.confusion_matrix(test)
type(confusion)

I get - h2o.model.confusion_matrix.ConfusionMatrix

I've tried to turn it into a dataframe but have been unsuccessful.

Lauren DiPerna

unread,
Jun 9, 2016, 12:28:28 AM6/9/16
to katherin...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream, katherin...@macys.com
Hi Katherine,

you can do:

print(your_model.confusion_matrix(test_set).as_data_frame())

here is example code:

cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# specify the predictors within the dataset
predictors = ["displacement","power","weight","acceleration","year"]
# specify the response column
response_col = "cylinders"
# convert response column from a string to a factor
cars[response_col] = cars[response_col].asfactor()
cars_model = H2OGradientBoostingEstimator(#nfolds= 3,

                                      ntrees=5)

cars_model.train(x=predictors, y=response_col, training_frame=cars)

print(cars_model.confusion_matrix(cars).as_data_frame())
type(cars_model.confusion_matrix(cars).as_data_frame())

cheers,

Lauren

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages