Re: How to calculate the performance of a decision tree?

26 views

Skip to first unread message

Message has been deleted

Mauricio Cirelli

unread,

May 10, 2014, 6:16:47 PM5/10/14

to accor...@googlegroups.com

Well, you can use CrossValidation to evaluate the performance of any machine learning algorithm.

Regarding your main question, cross validation will better represent your model as you have larger data sets. On small datasets, it is very likely that your decision tree will overfit on these data and will answer to it correctly.
I could see from your code that you are using 14 data samples (too few) and 7 fold. 7 fold-Cross Validation will create 7 subsets of 2 samples in your case. It will train with 12 samples and test on the remaining 2.. repeating that 7 times.

If you can not have more data samples, then I would recommend you use the 2-Fold Cross Validation. It will generate two data sets of 7 samples to test and another 7 samples to train. That would better estimate your model error. But I would recommend 2-Fold Cross Validation or 10-Fold Cross Validation, which are more common, but definitely you need more samples. Try to get at least 100 and run 10-Fold Cross Validation.

I hope this helps you.