Exporting classification results

124 views
Skip to first unread message

Corin Moss

unread,
Oct 31, 2011, 10:05:58 AM10/31/11
to rtexttools-help
Hi,

I wondered if there's an easy way to export a list of the "testing"
portion of the classified corpus, including the various probabilities,
expected classification etc?

I've taken a look at corpus@classification_matrix@ia and that would
seem to be the "testing" portion of the data, however I've not seen a
good way to link that up with the results of (for example)
classify_model(corpus,maxent_model).

Is there anything that allows this pre-baked into the package that
I've missed?

Many thanks,

Corin

Tim Jurka

unread,
Nov 1, 2011, 12:17:08 AM11/1/11
to rtextto...@googlegroups.com
Hi Corin,

Are you referring to the results as reported in the create_analytics function? You can write this data to a CSV file using the write.csv() command. The document_summary slot contains the raw results data, and you can combine this with your original data by using cbind(original_data,analytics@document_summary).

write.csv(analytics@algorithm_summary,"SampleData_AlgorithmSummary.csv")
write.csv(analytics@label_summary,"SampleData_LabelSummary.csv")
write.csv(analytics@document_summary,"SampleData_DocumentSummary.csv")
write.csv(analytics@ensemble_summary,"SampleData_EnsembleSummary.csv")

Hopefully this is of some help!

Best,
Tim


--
Timothy P. Jurka
Ph.D. Student
Department of Political Science
University of California, Davis
www.timjurka.com

Corin Moss

unread,
Nov 1, 2011, 4:11:22 AM11/1/11
to rtextto...@googlegroups.com
Hi Tim,

Sorry, I explained my question terribly ;)  I'll explain it better, and then show the solution (based on what you mentioned in your email) just in case someone in the future wants something similar.

I've been testing various algorithms on a 1000 document data-set.  I've been seeing accuracy between 80% and 95%  - which is obviously a large variation, so I was keen to see what sort of entries were being incorrectly classified, so that I could tune as required.  To do this, I wanted to get an output which included the original document which was corrrectly or incorrectly classified.  Something like the document_summary slot, but with the original text.

Based on your suggestion, I've found that the following works well for this:

library(RTextTools)

#CSV File with simple Code and Text columns
data <- read_data("limited.csv",type="csv")

data <- data[sample(1:1001,size=1000,replace=FALSE),]
matrix <- create_matrix(cbind(data$Text), language="english", removeNumbers=FALSE, stemWords=TRUE, removePunctuation=TRUE, stripWhitespace=TRUE, weighting=weightTfIdf)

#Testing with 100 entries
corpus <- create_corpus(matrix,data$Code,trainSize=1:900, testSize=901:1000,virgin=FALSE)

maxent_model <- train_model(corpus,"MAXENT")
maxent_results <- classify_model(corpus,maxent_model)
analytics <- create_analytics(corpus,cbind(maxent_results))

#here the Text column from the original data (limited to the testing range) is combined with the document_summary slot
write.csv(cbind(data$Text[901:1000],analytics@document_summary),"SampleData_DocumentSummary.csv")

Maybe that will help someone else some time - thanks again for your suggestion!

Regards,

Corin 
Reply all
Reply to author
Forward
0 new messages