Best parameters from grid search h2o.randomforest

3,548 views
Skip to first unread message

Kunal Jain

unread,
May 27, 2016, 1:59:50 AM5/27/16
to H2O Open Source Scalable Machine Learning - h2ostream, manisa...@gmail.com
Hello,

I am using this grid search random forest model to find the best tuning parameters.

#tune random forest #15minutes
h2o.random <- h2o.grid("randomForest", x = x.indep, y = y.dep, training_frame = h.train,
                       hyper_params = list(ntrees = c(100,300,500), mtries = c(2,3,4), max_depth = c(3,4,5)), seed = 1122)

Results:
[[1]]
Model Details:
==============

H2ORegressionModel: drf
Model ID:  Grid_DRF_c.train_model_R_1464324826772_2_model_24 
Model Summary: 
  number_of_trees model_size_in_bytes min_depth max_depth
1             100               14108         5         5                5
   mean_depth min_leaves max_leaves mean_leaves
1     5.00000          6               9             6.56000


H2ORegressionMetrics: drf
** Reported on training data. **
Description: Metrics reported on Out-Of-Bag training samples

MSE:  36.29314
R2 :  0.0006315823
Mean Residual Deviance :  36.29314


This is the best model I could get.

Question: I want to know why didn't I get mtries parameter in model summary. I have tuned it as well during modeling. mtries is an essential parameter hence its best parameters should be shown in model results. 

Please help!

Lauren DiPerna

unread,
May 27, 2016, 2:32:14 PM5/27/16
to Kunal Jain, H2O Open Source Scalable Machine Learning - h2ostream, manisa...@gmail.com
Hi Kunal,

you can get a table of the mtries used with their corresponding model_ids and logloss by running `h2o.random` by itself. (this returns the H2O Grid Details)

For the output you ran above did you do the following (this will always print out the same summary values)?

> model_ids <- h2o.random@model_ids
> models <- lapply(model_ids, function(id) { h2o.getModel(id)})
> models
(this would output all of the Model Details)

Here is an example you can follow along:
> library(h2o)
> h2o.init()
> cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
  |================================================================================| 100%
> seed <- 691895
> r <- h2o.runif(cars,seed=seed)
> train <- cars[r > 0.2,]
> predictors <- c("displacement","power","weight","acceleration","year")
> response_col <- "economy_20mpg"
> grid_space <- list()
> grid_space$ntrees <- c(5, 2, 3)
> grid_space$max_depth <- c(4, 1, 5)
> grid_space$nbins <- c(6, 4, 3)
> grid_space$nbins_cats <- c(370, 449)
> grid_space$mtries <- c(2, 4, 3)
> grid_space$sample_rate <- c(0.327667, 0.735594, 0.415836)
> train[,response_col] <- as.factor(train[,response_col])
> cars_drf_grid <- h2o.grid("randomForest", grid_id="drf_grid_cars_test", x=predictors, y=response_col,
+                           training_frame=train, hyper_params=grid_space)
  |================================================================================| 100%
> cars_drf_grid


outputs:


H2O Grid Details
================

Grid ID: drf_grid_cars_test
Used hyper parameters:
  -  nbins_cats
  -  sample_rate
  -  nbins
  -  ntrees
  -  max_depth
  -  mtries
Number of models: 486
Number of failed models: 0

Hyper-Parameter Search Summary: ordered by increasing logloss
  nbins_cats sample_rate nbins ntrees max_depth mtries                    model_ids
1        370    0.735594     3      3         4      2  drf_grid_cars_test_model_50
2        370    0.735594     4      3         1      3 drf_grid_cars_test_model_422
3        370    0.735594     3      5         1      4 drf_grid_cars_test_model_230
4        449    0.735594     3      3         1      4 drf_grid_cars_test_model_267
5        449    0.415836     4      2         1      2  drf_grid_cars_test_model_83
            logloss
1 0.169650906870245
2 0.270960246663464
3 0.273574489129419
4 0.280455064239284
5 0.281979603934791

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Erin LeDell

unread,
May 27, 2016, 3:00:15 PM5/27/16
to Kunal Jain, H2O Open Source Scalable Machine Learning - h2ostream, manisa...@gmail.com

typo: Lauren meant "h2o.grid" not "h2o.random"

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

Manish Saraswat

unread,
May 28, 2016, 2:27:07 AM5/28/16
to Erin LeDell, Kunal Jain, H2O Open Source Scalable Machine Learning - h2ostream
Thank You! 
Let me see if this works.
Reply all
Reply to author
Forward
0 new messages