RStudio crashes on running classify_model()

291 views
Skip to first unread message

Mayank Sharma

unread,
Aug 6, 2012, 3:09:42 AM8/6/12
to rtextto...@googlegroups.com
Hi.
I'm an undergrad student in my final year. I've been learning R for the last month. I recently tried RTextTools.
I'm stuck with a problem.
I'm using a set of 477 words tagged by part of speech (I've converted the labels to numeric quantities too) using the openNLP package. I used that (150 observations) to train a MAXENT model and then use that to classify the tagged set. The problem is that this makes my RStudio crash. I tried it on another system too, same result.
The examples from the RTextTools_GettingStarted pdf ran perfectly even though they have more observations to process.
Could you help me out here please?
I've pasted the R code below.

P. S. Thanks a lot for this package. It seems like a very powerful tool.
 
 
 
 
# ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
library(openNLP)
library(RTextTools)
library(tm)
crude[[2L]]

# Tokenize the data and put it in a vector called tokens
tokens <- tokenize(crude[[2L]])

# Cleaning the tokenized data by removing all puntuations (Since this will be out true set)
tokens <- tokens[-which(tokens[1:496]==","|tokens[1:496]=="."|tokens[1:496]=="\\"|tokens[1:496]=="("|tokens[1:496]==")"|tokens[1:496]=="\"")]

# Using openNLP to get a tagged set to be used as our TRUE SET
tagged <- tagPOS(tokens)

# Organising the data in a list holding both token and tag
testList <- strsplit(tagged,"/")

test <- data.frame(c(0),c(0))
names(test) <- c("token","PennTag")

# Converting the list to a data.frame
try <- as.data.frame(testList)
for(i in 1:447){
  test[i,1] <- try[1,i]
  test[i,2] <- try[2,i]
}

# Creating a document term matrix
docTermMat <- create_matrix(cbind(test$token),language="english", removeNumbers=TRUE,removeSparseTerms=.998)

# Assigning which part of the document term matrix is to be used for training the models
# and which one to test the accuracy of the trained model
corpus1 <- create_corpus(docTermMat,as.numeric(docTermMat$PennTag),trainSize=1:150, testSize=151:447,virgin=FALSE)

# Training model using the Maximum Entropy Algorithm
MAXENT1 <- train_model(corpus1,"MAXENT")

# Tagging using Maxmum Entropy trained model
MAXENT_CLASSIFY1 <- classify_model(corpus1, MAXENT1)
#RStudio crashes

 

Timothy P. Jurka

unread,
Aug 6, 2012, 11:41:39 AM8/6/12
to rtextto...@googlegroups.com
Hi Mayank,

It is possible this is a problem with the algorithm RTextTools uses and not RTextTools itself. However, I cannot reproduce the problem unless I have access to the dataset you're using. If you'd like me to investigate the problem, please send me a zipped or gzipped file with the R code and the data.

Best,
Tim

--
Timothy P. Jurka
Ph.D. Student
Department of Political Science
University of California, Davis
www.timjurka.com

Mayank Sharma

unread,
Aug 7, 2012, 4:04:04 AM8/7/12
to rtextto...@googlegroups.com
Hey Tim,
The dataset I used was an inbuilt one in the package 'tm' called 'crude'.
I used crude[[2L]]. POS tagged it using openNLP after some pre-processing. All those steps are given in the R code I have posted.
If you could just run that and check, I would be grateful.
If there is someway I could give you the processed data by writing it in a csv or something like that, then let me know please.
 
Regards,
Mayank

--
Mayank Sharma
Fourth Year, B. E. (Hons.)
Electrical and Electronics Engineering
BITS Pilani

Timothy P. Jurka

unread,
Aug 7, 2012, 12:31:43 PM8/7/12
to rtextto...@googlegroups.com
Hello Mayank,

You did not provide a working example. First, you do not attach the crude dataset using data(crude). Second, when I run your script I get the following error.

> # Converting the list to a data.frame
> try <- as.data.frame(testList)
Error in data.frame(c("OPEC", "NNP"), character(0), c("may", "MD"), character(0),  : 
  arguments imply differing number of rows: 2, 0
> for(i in 1:447){
+   test[i,1] <- try[1,i]
+   test[i,2] <- try[2,i]
+ }
Error in try[1, i] : object of type 'closure' is not subsettable

Additionally, you have lots of missing values (e.g. character(0)) in your testList dataset. These need to be removed prior to using any tools in RTextTools. If you provide a working example script, I would be happy to spend a few minutes looking at what is wrong.

Best,
Tim

--
Timothy P. Jurka
Ph.D. Student
Department of Political Science
University of California, Davis
www.timjurka.com
Reply all
Reply to author
Forward
0 new messages