The new build was rolled out on ( install.rtexttools.com ). I am also disabling the old repository so as to not create any confusion with version numbers.
Changelog:
1. Re-wrote dtm_to_sparsem() function to run faster on larger matrices, and eliminate several bugs that were causing a "invalid class "matrix.csr" object: ia has wrong number of elements" error.
2. Corrected code in create_corpus that was causing the error in ( 1 ).
3. Fixed the topic code truncation bug for boosting, bagging, random forests, glmnet, decision trees, and neural nets.
4. Added Wouter's wizard functions- train/classify_models, wizard_read_data, and wizard_train_test.
Errors:
1. cross_validate() is giving 0 accuracy all the time for maximum entropy. I'll have to look into what's causing this over the next few days.
Unfortunately, I did not get the time to create a demo file for this version. I plan to release that tomorrow ( 6/9 ), so perhaps we should hold off on sending this to the others until that's done. Is there a dataset we can use for demonstration purposes- Amber, perhaps your truncated NYT dataset?
Best,
Tim
Great news, looking forward to playing with the new release, probably this
weekend.
> 1. cross_validate() is giving 0 accuracy all the time for maximum entropy.
> I'll have to look into what's causing this over the next few days.
Is this the "old" cross_validate or my partial code? Because I can have a look
at it again and see what it going wrong?
> Unfortunately, I did not get the time to create a demo file for this
> version. I plan to release that tomorrow ( 6/9 ), so perhaps we should
> hold off on sending this to the others until that's done. Is there a
> dataset we can use for demonstration purposes- Amber, perhaps your
> truncated NYT dataset?
Also, I would like to update the documentation to reflect the new wizards, that
should make it much easier to start. However, I cannot do that until this
weekend. Shall we aim at release on Monday?
-- Wouter
I'll try to polish the documentation tomorrow because I think Amber was targeting a Friday release at latest. Most of us will be leaving by 6/20, so I want to give at least 10 days for the testers to give us some comments about the software.
Tim