Re: [computationalstylistics] Cross-Validation

Message has been deleted

Maciej Eder

unread,

Dec 13, 2017, 10:29:28 AM12/13/17

to computationalstylistics, Alvaro Cuellar

Dear Alvaro,

performing cross-validation is relatively straightforward using the function classify(), without any manual swaps between the two sets. You define your "primary_set" and the "secondary_set", and then you type e.g.:

classify(cv.folds = 10)

or, if you want to have an access to particular cv folds:

# perform the classification:

results = classify(cv.folds = 10)

# get the classification accuracy:

results$cross.validation.summary

This will give you the stratified cross-validation, or the variant that reproduces the representation of classes from your "training_set" in N random iterations.

Now, there is a function crossv() that is meant to replace some core fragments of classify() in the future. I am not there yet, though (as always: lack-of-time-related issues). So far, it's a beta-version function with some basic functionality. To perform leave-one-out, you prepare the "training_set" only, and put your stuff there. Then you have to load the corpus, and prepare a document-term matrix. Let's assume you've already got it:

library(stylo)

data(galbraith)

type help(galbraith) to see what the matrix contains. Then you type:

crossv(training.set = galbraith, cv.mode = "leaveoneout", classification.method = "svm")

To build the document-term matrix, some more steps have to be undertaken beforehand:

library(stylo)

# loading the corpus

texts = load.corpus.and.parse(files = "all", corpus.dir = "corpus")

# getting a genral frequency list

freq.list = make.frequency.list(texts, head = 1000)

# preparing the document-term matrix:

word.frequencies = make.table.of.frequencies(corpus = texts, features = freq.list)

# now the main procedure takes place:

crossv(training.set = word.frequencies, cv.mode = "leaveoneout", classification.method = "svm")

I hope this helps.

All the best,

Maciej

2017-12-01 11:40 GMT+01:00 Alvaro Cuellar <alvarocu...@hotmail.com>:

Dear colleagues,

I am working with 300 texts and I want to perform a Cross-Validation with them. Until now, I have been using classify(): I use a primary_set and a secondary_set, I take out texts of the primary and I put them in the secondary one. I make the procedures and record the results in an Excel manually. However, I have recently seen that there is a crossv() function and it can perform a "leave-one-out cross-validation, which moves one sample from the train set to the test set, performs a classification, and then repeates the same procedure untill the available samples are exhausted."

That is exactly what I want!

Unfortunately, my knowledge of computing is so poor. When I try to use it with corpus it says: "training.set" is missing and I don't know how create it. :(

In conclusion, I have 300 texts and I want make a leave-one-out cross-validation with, for example, SVM, 300 MFW, 30% Culling. Could you help me to make it?

Thank you very much!

Álvaro Cuéllar
University of Valladolid

--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to computationalstylistics+unsub...@googlegroups.com.
Visit this group at https://groups.google.com/group/computationalstylistics.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Jan Rybicki

unread,

Dec 14, 2017, 6:54:49 AM12/14/17

to Alvaro Cuellar, computationalstylistics

Well done, Maciej!

Best

Jan

czw., 14 gru 2017 o 12:34 użytkownik Alvaro Cuellar <alvarocu...@hotmail.com> napisał:

Dear Maciej,

Now I understand all! It works!

What a nice answer, I really appreciate it.

Best regards.
Álvaro Cuéllar

--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.

To unsubscribe from this group and stop receiving emails from it, send an email to computationalstyl...@googlegroups.com.

Reply all

Reply to author

Forward