I am new to Rattle and need some help

398 views
Skip to first unread message

Kelvin Choi

unread,
Apr 10, 2014, 1:00:38 PM4/10/14
to rattle...@googlegroups.com
Hi all,

I am trying to first estimate a decision tree model and then a random forest model on a dataset. I know the indicator variables are associated with the target variable. However, when I ran the tree model, the output said there is only a root node.

Summary of the Decision Tree model for Classification (built using 'rpart'):

n= 16439

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 16439 713 0 (0.95662753 0.04337247) *

Classification tree:
rpart(formula = Smoker ~ ., data = crs$dataset[crs$train, c(crs$input,
    crs$target)], method = "class", parms = list(split = "information"),
    control = rpart.control(usesurrogate = 0, maxsurrogate = 0))

Variables actually used in tree construction:
character(0)

Root node error: 713/16439 = 0.043372

n= 16439

  CP nsplit rel error xerror xstd
1  0      0         1      0    0

Time taken: 0.28 secs

Rattle timestamp: 2014-04-10 12:57:53 choitk
======================================================================

And when I estimate a random forest model, I got an error message:

Summary of the Random Forest Model
==================================

Number of observations used to build the model: 16439
Missing value imputation is active.

Call:
 randomForest(formula = as.factor(Smoker) ~ .,
              data = crs$dataset[crs$sample, c(crs$input, crs$target)],
              ntree = 500, mtry = 3, importance = TRUE, replace = FALSE, na.action = na.roughfix)

               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 4.34%
Confusion matrix:
      0 1 class.error
0 15726 0           0
1   713 0           1

Analysis of the Area Under the Curve (AUC) 
==========================================
No output generated.

No output generated.

Variable Importance
===================

            0     1 MeanDecreaseAccuracy MeanDecreaseGini
Asian   13.78  3.46                14.50             4.15
White   13.01 -5.08                13.74             6.35
Black   11.37  6.46                12.40             4.87
Grade    9.31 10.89                12.35            35.68
Latino   6.22 12.02                10.07             4.15
Mexican  7.59  3.29                 8.11             3.28
gender   4.70  5.80                 6.49             6.64
HI       5.39 -6.84                 4.21             2.91
AI      -3.80  2.58                -2.67             4.06

Time taken: 13.18 secs

Rattle timestamp: 2014-04-10 12:59:10 choitk
======================================================================

randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
Type 'citation("pROC")' for a citation.

Attaching package: ‘pROC’

The following object is masked from ‘package:colorspace’:

    coords

The following objects are masked from ‘package:stats’:

    cov, smooth, var

Error in roc.default(crs$rf$y, crs$rf$votes) :
  Response and predictor must be vectors of the same length.
In addition: Warning messages:
1: package ‘pmml’ was built under R version 3.0.3
2: package ‘randomForest’ was built under R version 3.0.3
3: package ‘pROC’ was built under R version 3.0.3
Error in roc.default(response, predictor, ci = FALSE, ...) :
  Response and predictor must be vectors of the same length.
>
Will greatly appreciate if someone can help me with these dumb questions...

Thanks,
Kelvin

Rick Gordon

unread,
Apr 11, 2014, 11:55:25 AM4/11/14
to rattle...@googlegroups.com
This would appear to be a problem with your datasets.
The message:

     
     Response and predictor must be vectors of the same length.

means you need to look at the two vectors in question. Maybe one has some NA's in it?
Make sure they lineup - same number of elements - etc.

Kelvin Choi

unread,
Apr 14, 2014, 4:38:56 PM4/14/14
to rattle...@googlegroups.com
Thanks! I have looked into the dataset, and the variables are either 1 vs. 0 or 1 vs. 2. Do I have to check them all to 1 vs. 0?
Reply all
Reply to author
Forward
0 new messages