Hi team,
RTextTools v1.3.3 is currently being uploaded to CRAN. This update includes optimizations to the create_matrix() and create_analytics() functions.
Users have reported difficulties using saved models to classify new data. This issue affects several algorithms (e.g. svm, glmnet) because they require that new document-term matrices contain the same terms as the original training matrix. Unfortunately, I did not write these algorithms so I have to adjust each new matrix to contain the same terms as the original training matrix. Therefore, I will be pushing out another update, v1.3.4, that adds an originalMatrix parameter to the create_matrix() function. Passing in the original training matrix will cause RTextTools to adjust the new document-term matrix to only contain the terms from the original matrix. This allows svm and glmnet to use saved models to classify new data without needing to be re-trained.
Note that the maximum entropy algorithm is not affected by this issue.
Best,
Tim
--
Timothy P. Jurka
Ph.D. Student