What is "Cross-validation" in Test options under "Classify" panel ?

51 views
Skip to first unread message

Kumar Chinnakali

unread,
Jul 7, 2014, 12:00:43 PM7/7/14
to wekamooc...@googlegroups.com
Team, Tons of Thanks for your time !

What is "Cross-validation" in Test options under "Classify" panel ? Also like to get it clarified of my understandings.
  • Use training set - It'll use 100% of data for Training and the same 100% of data will used for Test (Single Dataset is needed)
  • Supplied test set - Training data set, will be different from test data set (Two Dataset is needed)
  • Cross-validation - Could you please help me to get it .....
  • Percentage split - From the single dataset provided, weka will take care to split based on our % value(Single Dataset is needed)
Yours in learning,

Kumar Chinnakali

CRistobal Bonillo

unread,
Jul 7, 2014, 12:33:47 PM7/7/14
to wekamooc...@googlegroups.com
Hi Kumar,

    I would try to explain briefly:

    In k-fold cross-validation, the original data is randomly partitioned into k subsamples (folds). The algorithm uses a single fold for testing and the remaining for training. The process is repeated k times (one for each fold), so each fold is used once for testing and k-1 times for training.
    Finally, the k results from the folds then can be averaged to produce a single estimation.

    You can read more in 5.3 and 5.4 sections from text book.

Cristobal Bonillo
Community TA
Spain

CRistobal Bonillo

unread,
Jul 7, 2014, 12:41:49 PM7/7/14
to wekamooc...@googlegroups.com
Hello again,

I forgot to answer the other questions: your understanding about other options are correct.

Well done!


Cristobal Bonillo
Community TA
Spain

El lunes, 7 de julio de 2014 18:00:43 UTC+2, Kumar Chinnakali escribió:

Thales Maciel

unread,
Jul 7, 2014, 12:42:03 PM7/7/14
to wekamooc...@googlegroups.com
Hello, Kumar!

Are you from India? I hear Kumar is a very popular name there. =)

Cross-validation is to extract a certain number of equal slices of the entire training dataset and performing the classification activity the same number of times, each time using one of these slices for testing and the remaining slices altogether por training. These slices are called "folds" and this is why you have, for example, 10-fold cross-validation.
After performing that, WEKA will run the algorithm one more time to build the model to be used in practice, but this is just a detail, to give you a complete response.

I hope it helps.

Cheers,

Thales Vaz Maciel
Bacharel em Sistemas de Informação - Universidade da Região da Campanha, Bagé, RS, Brasil
Especialista em Sistemas Distribuídos - Universidade Federal do Pampa, Bagé, RS, Brasil
Mestrando em Engenharia de Computação - Universidade Federal do Rio Grande, Rio Grande, RS, Brasil
Informata de Nível Superior - Prefeitura Municipal de Bagé, Bagé, RS, Brasil

http://lattes.cnpq.br/7166030596636868
F: +55 (53) 99419258 (novo)


--
You received this message because you are subscribed to the Google Groups "WekaMOOC-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wekamooc-gener...@googlegroups.com.
To post to this group, send email to wekamooc...@googlegroups.com.
Visit this group at http://groups.google.com/group/wekamooc-general.
To view this discussion on the web, visit https://groups.google.com/d/msgid/wekamooc-general/021e38b0-ebc0-4bb2-bc36-eefce7292c4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kumar Chinnakali

unread,
Jul 7, 2014, 6:49:44 PM7/7/14
to wekamooc...@googlegroups.com
Thank you, got it !

Gabriel Santos

unread,
Jul 7, 2014, 10:45:49 PM7/7/14
to wekamooc...@googlegroups.com
Hi Kumar,

Just would like to make sure you understood well the concept of k-fold validation, specially the last step.

The last validation (e.g. if you choose 10-fold cross validation, then the last one is the 11th) uses 100% of the data to produce the final result.

BR,
Gabriel Santos
Community TA
Reply all
Reply to author
Forward
0 new messages