Re: [computationalstylistics] Cross-Validation

53 views
Skip to first unread message
Message has been deleted
Message has been deleted

Maciej Eder

unread,
Dec 13, 2017, 10:29:28 AM12/13/17
to computationalstylistics, Alvaro Cuellar
Dear Alvaro,

performing cross-validation is relatively straightforward using the function classify(), without any manual swaps between the two sets. You define your "primary_set" and the "secondary_set", and then you type e.g.:

classify(cv.folds = 10)

or, if you want to have an access to particular cv folds:

# perform the classification:
results = classify(cv.folds = 10)
# get the classification accuracy:
results$cross.validation.summary

This will give you the stratified cross-validation, or the variant that reproduces the representation of classes from your "training_set" in N random iterations.

Now, there is a function crossv() that is meant to replace some core fragments of classify() in the future. I am not there yet, though (as always: lack-of-time-related issues). So far, it's a beta-version function with some basic functionality. To perform leave-one-out, you prepare the "training_set" only, and put your stuff there. Then you have to load the corpus, and prepare a document-term matrix. Let's assume you've already got it:

library(stylo)
data(galbraith)

type help(galbraith) to see what the matrix contains. Then you type:

crossv(training.set = galbraith, cv.mode = "leaveoneout", classification.method = "svm")

To build the document-term matrix, some more steps have to be undertaken beforehand:

library(stylo)
# loading the corpus
texts = load.corpus.and.parse(files = "all", corpus.dir = "corpus")
# getting a genral frequency list
freq.list = make.frequency.list(texts, head = 1000)
# preparing the document-term matrix:
word.frequencies = make.table.of.frequencies(corpus = texts, features = freq.list)
# now the main procedure takes place:
crossv(training.set = word.frequencies, cv.mode = "leaveoneout", classification.method = "svm")


I hope this helps. 

All the best,
Maciej 








2017-12-01 11:40 GMT+01:00 Alvaro Cuellar <alvarocu...@hotmail.com>:
Dear colleagues,

I am working with 300 texts and I want to perform a Cross-Validation with them. Until now, I have been using classify(): I use a primary_set and a secondary_set, I take out texts of the primary and I put them in the secondary one. I make the procedures and record the results in an Excel manually. However, I have recently seen that there is a crossv() function and it can perform a "leave-one-out cross-validation, which moves one sample from the train set to the test set, performs a classification, and then repeates the same procedure untill the available samples are exhausted."

That is exactly what I want!

Unfortunately, my knowledge of computing is so poor. When I try to use it with corpus it says: "training.set" is missing and I don't know how create it.  :(

In conclusion, I have 300 texts and I want make a leave-one-out cross-validation with, for example, SVM, 300 MFW, 30% Culling. Could you help me to make it?

Thank you very much!

Álvaro Cuéllar
University of Valladolid

--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to computationalstylistics+unsub...@googlegroups.com.
Visit this group at https://groups.google.com/group/computationalstylistics.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Jan Rybicki

unread,
Dec 14, 2017, 6:54:49 AM12/14/17
to Alvaro Cuellar, computationalstylistics
Well done, Maciej!
Best
Jan

czw., 14 gru 2017 o 12:34 użytkownik Alvaro Cuellar <alvarocu...@hotmail.com> napisał:
Dear Maciej,

Now I understand all! It works!

What a nice answer, I really appreciate it. 

Best regards.
Álvaro Cuéllar

--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to computationalstyl...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages