Hi Tim,
Sorry, I explained my question terribly ;) I'll explain it better, and then show the solution (based on what you mentioned in your email) just in case someone in the future wants something similar.
I've been testing various algorithms on a 1000 document data-set. I've been seeing accuracy between 80% and 95% - which is obviously a large variation, so I was keen to see what sort of entries were being incorrectly classified, so that I could tune as required. To do this, I wanted to get an output which included the original document which was corrrectly or incorrectly classified. Something like the document_summary slot, but with the original text.
Based on your suggestion, I've found that the following works well for this:
#CSV File with simple Code and Text columns
data <- read_data("limited.csv",type="csv")
data <- data[sample(1:1001,size=1000,replace=FALSE),]
matrix <- create_matrix(cbind(data$Text), language="english", removeNumbers=FALSE, stemWords=TRUE, removePunctuation=TRUE, stripWhitespace=TRUE, weighting=weightTfIdf)
#Testing with 100 entries
corpus <- create_corpus(matrix,data$Code,trainSize=1:900, testSize=901:1000,virgin=FALSE)
maxent_model <- train_model(corpus,"MAXENT")
maxent_results <- classify_model(corpus,maxent_model)
analytics <- create_analytics(corpus,cbind(maxent_results))
#here the Text column from the original data (limited to the testing range) is combined with the document_summary slot
write.csv(cbind(data$Text[901:1000],analytics@document_summary),"SampleData_DocumentSummary.csv")
Maybe that will help someone else some time - thanks again for your suggestion!
Regards,
Corin