Hi Philip,
To make the demo run quickly, it is trained on a very small dataset. In this case, GLMNET classifies a lot of documents as #19 (leaving the other fields blank), and SVM doesn't classify any #4 documents. Therefore, those fields are filled with either NaN or 0.00. I agree that we should clarify this in the documentation.
On my mac the data files live at /Library/Frameworks/R.framework/Versions/2.13/Resources/library/RTextTools/, in the data/ directory. This brings up another point that I should probably include the raw .tar.gz source file on the installation page.
Thank you for your feedback!
Tim
--
Timothy P. Jurka
Graduate Student
On Aug 4, 2011, at 11:25 AM, P Resnik wrote:
Ah, I see! I read the example before executing it, and assumed that 'data' must be a missing subdirectory of the current directory. Your code was smarter than I was. :)
This does raise the question of where the installed package and particularly the data directory actually live on my Mac, if you happen to know (please forgive my ignorance!)?
The simple_demo.R did run to completion, though I'm not sure if the NaN's below indicate something's not right. Perhaps you might consider also including example output files in the package, so that it's possible to compare one's own output and feel confident that things are running correctly?
Thanks again for your time,
Philip
> head(results@algorithm_summary)
SVM_PRECISION SVM_RECALL SVM_FSCORE GLMNET_PRECISION GLMNET_RECALL
1 0.50 0.45 0.47 NaN 0
2 0.43 0.19 0.26 NaN 0
3 0.50 0.56 0.53 NaN 0
4 NaN 0.00 NaN NaN 0
5 0.69 0.64 0.66 NaN 0
6 0.71 0.77 0.74 NaN 0
GLMNET_FSCORE MAXENTROPY_PRECISION MAXENTROPY_RECALL MAXENTROPY_FSCORE
1 NaN 0.69 0.82 0.75
2 NaN 0.71 0.31 0.43
3 NaN 0.65 0.72 0.68
4 NaN NaN 0.00 NaN
5 NaN 0.64 0.50 0.56
6 NaN 0.65 0.85 0.74
SVM_ACCURACY GLMNET_ACCURACY MAXENTROPY_ACCURACY
1 45.45455 0 81.81818
2 18.75000 0 31.25000
3 55.55556 0 72.22222
4 0.00000 0 0.00000
5 64.28571 0 50.00000
6 76.92308 0 84.61538
On Thu, Aug 4, 2011 at 2:17 PM, Timothy Jurka
<tpj...@ucdavis.edu> wrote:
I see... the comments said to change the file path. That's an artifact of the older version of R, but now the datasets are bundled. I'll correct the instructions ASAP.
The example should run as-is on your computer. If it doesn't, please let me know!
Best,
Tim
--
Timothy P. Jurka
Department of Political Science
On Aug 4, 2011, at 11:08 AM, P Resnik wrote:
Hi! Trying out RTextTools... But I can't find pointers to example data in the quick start guide, documentation, or example scripts. For example, simple_demo.R refers to data/NYTimes.csv.gz but ... Ah, ok, I did some Google searching, which led to http://dirk.eddelbuettel.com/cranberries/, which mentions that file, which led me to the .tgz file at http://cran.r-project.org/web/packages/maxent/index.html. So I've got the file and the demo ran to completion.
So, pulling back from that little stream-of-conciousness meandering... :) I guess my general suggestion is to provide pointers to any of the .csv files you refer to in your distribution (or better yet, include them with the distribution). Also, thanks for putting these tools together; I look forward to playing with them!
All best,
Philip
Philip Resnik, Professor
Dept of Linguistics and Institute for Advance Computer Studies
University of Maryland
http://umiacs.umd.edu/~resnik/
res...@umd.edu