Manipulating TDM then Training Models

15 views
Skip to first unread message

Loren Collingwood

unread,
Jun 14, 2011, 2:35:38 PM6/14/11
to rtextto...@googlegroups.com
So I have a project where even after I create my TDM I have words such as "appropri" and "appropriat". I turn the TDM into a matrix and collapse these words into one variable, but now, from what I gather I cannot use the train_model/s function because it takes the corpus@ object. Essentially, I need to convert the matrix to a document term matrix, but from what I can tell (cursory search) the various DocumentTermMatrix functions only take text and turn them into matrices. So I have to run the models one by one from what I can tell. This isn't really a problem, but thought I should point out in case people have some ideas as to how to get around this. Probably for version 2, we could write a function for this sort of thing...

-Loren

Loren Collingwood
Ph.D. Candidate
Department of Political Science
University of Washington
http://staff.washington.edu/lorenc2
lor...@uw.edu

Tim Jurka

unread,
Jun 14, 2011, 2:57:04 PM6/14/11
to rtextto...@googlegroups.com
Does this work?

full_matrix <- as.matrix(matrix)
# your data processing here
dtm <- as.DocumentTermMatrix(full_matrix, weighting=weightTf)

Tim

Loren Collingwood

unread,
Jun 14, 2011, 3:02:17 PM6/14/11
to rtextto...@googlegroups.com
No, that doesn't work. Has that worked for you?
--
Loren Collingwood
loren.co...@gmail.com

Tim Jurka

unread,
Jun 14, 2011, 3:06:30 PM6/14/11
to rtextto...@googlegroups.com
Yeah I was able to convert between them like that... although all I did was remove some columns. I don't know if you're doing more.

> matrix
A document-term matrix (1500 documents, 4314 terms)

Non-/sparse entries: 11966/6459034
Sparsity           : 100%
Maximal term length: 17 
Weighting          : term frequency (tf)
> full <- as.matrix(matrix)
> full <- full[,300:1000]
> dtm <- as.DocumentTermMatrix(full,weighting=weightTf)
> dtm
A document-term matrix (1500 documents, 701 terms)

Non-/sparse entries: 2189/1049311
Sparsity           : 100%
Maximal term length: 15 
Weighting          : term frequency (tf)

Tim

Loren Collingwood

unread,
Jun 14, 2011, 3:23:16 PM6/14/11
to rtextto...@googlegroups.com
Yeah, I'm not recognizing as.DocumentTermMatrix; wouldn't that be from tm package?

matrix <- create_matrix(major_agree_list2$major_topic_1$vaccine.synopsis, language="english", removeNumbers=TRUE, removeSparseTerms=.99)
full <- as.matrix(matrix)
full <- full[,100:150]
dtm <- as.DocumentTermMatrix(full,weighting=weightTf)
Error: could not find function "as.DocumentTermMatrix"

-Loren

Tim Jurka

unread,
Jun 14, 2011, 3:26:35 PM6/14/11
to rtextto...@googlegroups.com
Strange... perhaps tm was updated ( update.packages('tm') )?

Tim

Loren Collingwood

unread,
Jun 14, 2011, 3:38:44 PM6/14/11
to rtextto...@googlegroups.com
No dice, I'll just roll with it. But if anyone has the same problem let me know...
-Loren

Tim Jurka

unread,
Jun 14, 2011, 5:25:30 PM6/14/11
to rtextto...@googlegroups.com
That's weird... it shows up in my documentation. My only idea is that somehow you have an older version that doesn't support it, but that doesn't seem to be the case.

Tim
 

Loren Collingwood

unread,
Jun 14, 2011, 5:29:50 PM6/14/11
to rtextto...@googlegroups.com
that must be it. I have version .5-4.1. I'll have to uninstall it and reinstall it. Thanks Tim. I'll let you know it still doesn't work!
-Loren

On Jun 14, 2011, at 2:25 PM, Tim Jurka wrote:

That's weird... it shows up in my documentation. My only idea is that somehow you have an older version that doesn't support it, but that doesn't seem to be the case.

Tim
 <tm_dtm.png>

Loren Collingwood

unread,
Jun 14, 2011, 6:15:11 PM6/14/11
to rtextto...@googlegroups.com
Alright sweet that worked. But I had to install R-2.13.1, that was the problem, then install the packages from there. On the RTextTools install three packages did not install: SparseM, randomForest, and glmnet. I just installed them one at a time, but thought I should point out for the record.
-Loren
Reply all
Reply to author
Forward
0 new messages