Hi,
I am trying to run Random Forest on Python 2.7 for a classification task. How do I specify that this is a classification task not regression? I do not see an appropriate parameter.
classification = TRUE,
However, there isn't here:
Also, for the documentation of H2O in Python, it looks like there are blocks of R code
I want to be able output the AUC and ROC but it seems be running it as a regression.
> rf = H2ORandomForestEstimator(seed=12, ntrees=10, max_depth=20, balance_classes=False)
> rf.train(x=X, y=Y, training_frame=df_h2o_train_hex, validation_frame=df_h2o_valid_hex)
Model Details
=============
H2ORandomForestEstimator : Distributed Random Forest
Model Key: DRF_model_python_1455741432772_1
Model Summary:
| number_of_trees | model_size_in_bytes | min_depth | max_depth | mean_depth | min_leaves | max_leaves | mean_leaves |
| 10.0 | 624029.0 | 20.0 | 20.0 | 20.0 | 4502.0 | 5701.0 | 5365.2 |
ModelMetricsRegression: drf
** Reported on train data. **
MSE: 0.203627622723
R^2: 0.083251815499
Mean Residual Deviance: 0.203627622723
ModelMetricsRegression: drf
** Reported on validation data. **
MSE: 0.190313952689
R^2: 0.151951524407
Mean Residual Deviance: 0.190313952689