How to get training error of the cross validation error?

2T

unread,

Sep 16, 2016, 6:07:50 PM9/16/16

to python-weka-wrapper

Hi,

I'm trying to get training error of the cross-validation model and happen to try the below codes. In the first summary, I believe this is cross validation error and the last one is test error.

May I know what it's doing when I call crossvalidate_model to train the model and test it over training set? You are testing the averaged of cross_validated model over training sets so I wonder if it could represent training error of the cross validation?

==== Test code ====

cls = Classifier(classname="weka.classifiers.trees.J48", options = ["-C", "0.3", "-M", "2"])

cls.build_classifier(train)

evl = Evaluation(train)

evl.crossvalidate_model(cls, train, 10, Random(1))

print evl.summary()

evltrain = Evaluation(train)

evltrain.crossvalidate_model(cls, train, 10, Random(1))

evltrain.test_model(cls, train)

print evltrain.summary()

evl2 = Evaluation(test)

evl2.test_model(cls, test)

print evl2.summary()

==== Output =====

Correctly Classified Instances 1972 57.5263 %

Incorrectly Classified Instances 1456 42.4737 %

Kappa statistic 0.3674

Mean absolute error 0.0816

Root mean squared error 0.2592

Relative absolute error 66.397 %

Root relative squared error 104.6286 %

Total Number of Instances 3428

Correctly Classified Instances 5075 74.0228 %

Incorrectly Classified Instances 1781 25.9772 %

Kappa statistic 0.6124

Mean absolute error 0.054

Root mean squared error 0.2006

Relative absolute error 43.9602 %

Root relative squared error 80.9387 %

Total Number of Instances 6856

Correctly Classified Instances 830 56.4626 %

Incorrectly Classified Instances 640 43.5374 %

Kappa statistic 0.3529

Mean absolute error 0.0829

Root mean squared error 0.2631

Relative absolute error 67.3634 %

Root relative squared error 106.1937 %

Total Number of Instances 1470

Peter Reutemann

unread,

Sep 16, 2016, 7:33:13 PM9/16/16

to python-weka-wrapper

> I'm trying to get training error of the cross-validation model and happen to
> try the below codes. In the first summary, I believe this is cross
> validation error and the last one is test error.

Correct in regards to the output:
1st - cross-validation
2nd - train
3rd - test

However, the code is not quite correct, which probably confuses you.

> May I know what it's doing when I call crossvalidate_model to train the
> model and test it over training set?

X-fold cross-validation creates X copies of the classifier template
(do not provide a built model!), which it trains and evaluates on the
train/test set generate for each of the X folds. The statistics
obtained from the X model evaluations get aggregated, the models get
discarded.

> You are testing the averaged of
> cross_validated model over training sets so I wonder if it could represent
> training error of the cross validation?

Not quite sure what you mean...

Here's the correct code:

cls = Classifier(classname="weka.classifiers.trees.J48", options =
["-C", "0.3", "-M", "2"])

# cross-validation
evlCV = Evaluation(train)
evlCV.crossvalidate_model(cls, train, 10, Random(1))
print(evlCV.summary(title="cross-validation"))

# now we build the classifier
cls.build_classifier(train)

# evaluate the built model on the training set
evlTrain = Evaluation(train)
evlTrain.test_model(cls, train)
print(evlTrain.summary(title="train"))

# evaluate the built model on the test set
evlTest = Evaluation(test)
evlTest.test_model(cls, test)
print(evlTest.summary(title="test"))

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

2T

unread,

Sep 16, 2016, 9:00:05 PM9/16/16

to python-weka-wrapper, frac...@waikato.ac.nz

Hi Peter,

Thank you very much for a quick response. As for the #2 "Train" error, what I was trying to calculate was average training error within 10 folds cross validation. I understand weka outputs the final result after running 10 folds (90 train/10 test within a fold) + 1 iteration over a full train data set. I was trying to calculate the average of the training error within the cross-validation folds. (i.e. train vs test within the performance of the cross validation)

Sorry it sounds very confusing as I'm learning and terminology is a bit blur. I was curious to know what the program is actually calculating when I layout the below (i.e. the "wrong" way). My hope was it calculated training error within the cross-validation I evaluated earlier.

cls.build_classifier(train)

evltrain = Evaluation(train)

evltrain.crossvalidate_model(cls, train, 10, Random(1))

evltrain.test_model(cls, train)

Regards,

Tsuji

2016年9月17日土曜日 8時33分13秒 UTC+9 Peter Reutemann:

Peter Reutemann

unread,

Sep 16, 2016, 9:21:21 PM9/16/16

to python-weka-wrapper

> Thank you very much for a quick response. As for the #2 "Train" error, what
> I was trying to calculate was average training error within 10 folds cross
> validation. I understand weka outputs the final result after running 10
> folds (90 train/10 test within a fold) + 1 iteration over a full train data
> set. I was trying to calculate the average of the training error within the
> cross-validation folds. (i.e. train vs test within the performance of the
> cross validation)
>
> Sorry it sounds very confusing as I'm learning and terminology is a bit
> blur. I was curious to know what the program is actually calculating when I
> layout the below (i.e. the "wrong" way). My hope was it calculated training
> error within the cross-validation I evaluated earlier.
>
> cls.build_classifier(train)
>
> evltrain = Evaluation(train)
>
> evltrain.crossvalidate_model(cls, train, 10, Random(1))
>
> evltrain.test_model(cls, train)

I'm actually not sure what you'll be ending up with when you run the
above code, as you're mixing the statistics from CV with the ones
obtained from the train set.

From what it sounds like you would like to investigate what the
results are for each of the fold pairs of train/test, right? In that
case, you might want to have a look at the following example, which
simulates cross-validation, by generating the train/test pairs:
https://github.com/fracpete/python-weka-wrapper-examples/blob/master/src/wekaexamples/classifiers/crossvalidation_addprediction.py

However, you will need to instantiate a new Evaluation object inside
the for-loop, as you don't want to aggregate the statistics (like the
example does), but have them for each pair.

HTH

2T

unread,

Sep 16, 2016, 10:17:23 PM9/16/16

to python-weka-wrapper, frac...@waikato.ac.nz

Hi Peter,

Yes! That's I'm trying to do. Somehow, the code I've written was giving me pretty attractive result I was looking for but good to know that's not correct and something in coincident.

Let me spend time to review the link and will see how it goes.

Again, thank you very much for great support.

Regards,

Tsuji

2016年9月17日土曜日 10時21分21秒 UTC+9 Peter Reutemann:

Reply all

Reply to author

Forward