I've trained a model using h2o.deeplearning() and used h2o.performance() on a new h2oDataframe to look at the confusionMatrix() results. These numbers are supposed to be the representation of h2o.predict() however, they are not exactly the same. For example when predicting a binary class (1/0) the model using h2o.performance() tells me that it classified everything as 1 and nothing as 0:
h2o.performance() / h2o.confusionMatrix() results:
Confusion Matrix for F1-optimal threshold:
0 1 Error Rate
0 0 11 1.000000 =11/11
1 0 18 0.000000 =0/18
Totals 0 29 0.379310 =11/29
This surprised me so I looked at the actual predictions:
table(as.data.frame(h2o.predict(model, data))[["predict"]])
0 1
9 20
, and see that there were several things predicted as 0s in the model.
Is there a way to fix this?
Thank you,
Jose
h2o.performance(model, data , measure = "")
with the same results.
Thank you
Jose
> perf <- h2o.performance(m,"f1")
> h2o.find_threshold_by_max_metric(perf,"f1")
[1] 0.4549497
the default print out of the h2o.performance also prints out the values for each criteria.
Unfortunately h2o.find_threshold_by_max_metric() appears to be deprecated. However I found it through (h2o.performance()) itself (duh) or more specifically at perf@metrics$max_criteria_and_metric_scores.
With regards to h2o.performance(m, fr), the "measure" and "thresholds" parameters have been removed as in:
http://rpackages.ianhowson.com/cran/h2o/man/h2o.performance.html
so I'm not sure exactly how to input manual thresholds as before (http://www.rdocumentation.org/packages/h2o/functions/h2o.performance) or how to choose the metric to maximize.
Jose
h2o.performance will calculate a new threshold based on the new data - it will be the threshold that maximizes F1-score on the new data
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 55 0 0.000000 =0/55
1 2 49 0.039216 =2/51
Totals 57 49 0.018868 =2/106
When we release a new major release (3.20), we sometimes make
fixes to a previous release as well.
However, it looks like the release version for this ticket has
been changed to 3.20.0.3.
-Erin
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.