Hi all,
I am trying to first estimate a decision tree model and then a random forest model on a dataset. I know the indicator variables are associated with the target variable. However, when I ran the tree model, the output said there is only a root node.
Summary of the Decision Tree model for Classification (built using 'rpart'):
n= 16439
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 16439 713 0 (0.95662753 0.04337247) *
Classification tree:
rpart(formula = Smoker ~ ., data = crs$dataset[crs$train, c(crs$input,
crs$target)], method = "class", parms = list(split = "information"),
control = rpart.control(usesurrogate = 0, maxsurrogate = 0))
Variables actually used in tree construction:
character(0)
Root node error: 713/16439 = 0.043372
n= 16439
CP nsplit rel error xerror xstd
1 0 0 1 0 0
Time taken: 0.28 secs
Rattle timestamp: 2014-04-10 12:57:53 choitk
======================================================================
And when I estimate a random forest model, I got an error message:
Summary of the Random Forest Model
==================================
Number of observations used to build the model: 16439
Missing value imputation is active.
Call:
randomForest(formula = as.factor(Smoker) ~ .,
data = crs$dataset[crs$sample, c(crs$input, crs$target)],
ntree = 500, mtry = 3, importance = TRUE, replace = FALSE, na.action = na.roughfix)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 4.34%
Confusion matrix:
0 1 class.error
0 15726 0 0
1 713 0 1
Analysis of the Area Under the Curve (AUC)
==========================================
No output generated.
No output generated.
Variable Importance
===================
0 1 MeanDecreaseAccuracy MeanDecreaseGini
Asian 13.78 3.46 14.50 4.15
White 13.01 -5.08 13.74 6.35
Black 11.37 6.46 12.40 4.87
Grade 9.31 10.89 12.35 35.68
Latino 6.22 12.02 10.07 4.15
Mexican 7.59 3.29 8.11 3.28
gender 4.70 5.80 6.49 6.64
HI 5.39 -6.84 4.21 2.91
AI -3.80 2.58 -2.67 4.06
Time taken: 13.18 secs
Rattle timestamp: 2014-04-10 12:59:10 choitk
======================================================================
randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following object is masked from ‘package:colorspace’:
coords
The following objects are masked from ‘package:stats’:
cov, smooth, var
Error in roc.default(crs$rf$y, crs$rf$votes) :
Response and predictor must be vectors of the same length.
In addition: Warning messages:
1: package ‘pmml’ was built under R version 3.0.3
2: package ‘randomForest’ was built under R version 3.0.3
3: package ‘pROC’ was built under R version 3.0.3
Error in roc.default(response, predictor, ci = FALSE, ...) :
Response and predictor must be vectors of the same length.
>
Will greatly appreciate if someone can help me with these dumb questions...
Thanks,
Kelvin