RStudio crashes on running classify

Mayank Sharma

unread,

Aug 6, 2012, 3:09:42 AM8/6/12

to rtextto...@googlegroups.com

Hi.
I'm an undergrad student in my final year. I've been learning R for the last month. I recently tried RTextTools.
I'm stuck with a problem.
I'm using a set of 477 words tagged by part of speech (I've converted the labels to numeric quantities too) using the openNLP package. I used that (150 observations) to train a MAXENT model and then use that to classify the tagged set. The problem is that this makes my RStudio crash. I tried it on another system too, same result.
The examples from the RTextTools_GettingStarted pdf ran perfectly even though they have more observations to process.
Could you help me out here please?
I've pasted the R code below.

P. S. Thanks a lot for this package. It seems like a very powerful tool.

# ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

library(openNLP)
library(RTextTools)
library(tm)
crude[[2L]]

# Tokenize the data and put it in a vector called tokens
tokens <- tokenize(crude[[2L]])

# Using openNLP to get a tagged set to be used as our TRUE SET
tagged <- tagPOS(tokens)

# Organising the data in a list holding both token and tag
testList <- strsplit(tagged,"/")

test <- data.frame(c(0),c(0))
names(test) <- c("token","PennTag")

# Converting the list to a data.frame
try <- as.data.frame(testList)
for(i in 1:447){
test[i,1] <- try[1,i]
test[i,2] <- try[2,i]
}

# Creating a document term matrix
docTermMat <- create_matrix(cbind(test$token),language="english", removeNumbers=TRUE,removeSparseTerms=.998)

# Assigning which part of the document term matrix is to be used for training the models
# and which one to test the accuracy of the trained model
corpus1 <- create_corpus(docTermMat,as.numeric(docTermMat$PennTag),trainSize=1:150, testSize=151:447,virgin=FALSE)

# Training model using the Maximum Entropy Algorithm
MAXENT1 <- train_model(corpus1,"MAXENT")

# Tagging using Maxmum Entropy trained model
MAXENT_CLASSIFY1 <- classify_model(corpus1, MAXENT1)
#RStudio crashes

Timothy P. Jurka

unread,

Aug 6, 2012, 11:41:39 AM8/6/12

to rtextto...@googlegroups.com

Hi Mayank,

It is possible this is a problem with the algorithm RTextTools uses and not RTextTools itself. However, I cannot reproduce the problem unless I have access to the dataset you're using. If you'd like me to investigate the problem, please send me a zipped or gzipped file with the R code and the data.

Best,

Tim

--

Timothy P. Jurka
Ph.D. Student

Department of Political Science
University of California, Davis
www.timjurka.com

Mayank Sharma

unread,

Aug 7, 2012, 4:04:04 AM8/7/12

to rtextto...@googlegroups.com

Hey Tim,
The dataset I used was an inbuilt one in the package 'tm' called 'crude'.
I used crude[[2L]]. POS tagged it using openNLP after some pre-processing. All those steps are given in the R code I have posted.
If you could just run that and check, I would be grateful.
If there is someway I could give you the processed data by writing it in a csv or something like that, then let me know please.

Regards,
Mayank

--

Mayank Sharma
Fourth Year, B. E. (Hons.)
Electrical and Electronics Engineering
BITS Pilani

Timothy P. Jurka

unread,

Aug 7, 2012, 12:31:43 PM8/7/12

to rtextto...@googlegroups.com

Hello Mayank,

You did not provide a working example. First, you do not attach the crude dataset using data(crude). Second, when I run your script I get the following error.

> # Converting the list to a data.frame

> try <- as.data.frame(testList)

Error in data.frame(c("OPEC", "NNP"), character(0), c("may", "MD"), character(0),  : 

  arguments imply differing number of rows: 2, 0

> for(i in 1:447){

+   test[i,1] <- try[1,i]

+   test[i,2] <- try[2,i]

+ }

Error in try[1, i] : object of type 'closure' is not subsettable

>

Additionally, you have lots of missing values (e.g. character(0)) in your testList dataset. These need to be removed prior to using any tools in RTextTools. If you provide a working example script, I would be happy to spend a few minutes looking at what is wrong.

Best,

Tim

--

Timothy P. Jurka
Ph.D. Student

Department of Political Science
University of California, Davis
www.timjurka.com

Reply all

Reply to author

Forward

RStudio crashes on running classify_model()

Mayank Sharma

Timothy P. Jurka

Mayank Sharma

Timothy P. Jurka