The quality of model depends on a dataset, of course.
For improving accuracy you can do several things
- Extend your dataset (1300 samples is not enough for topic models and embedding like doc2vec)
- Check balance between classes (enough objects for each class and no skew between classes).
- More accurate preprocessing of texts (stemming, tokenization, filtering infrequent/too frequent tokens).
- Tweak params of models ONLY to validation set (if you do this in test, you should overfit)
- Use more complex (nonlinear) model as a upper-level model.
From my practical observations, the weak point of Lda/Doc2Vec is short texts, remember this.
пятница, 12 мая 2017 г., 3:51:04 UTC+5 пользователь Satya Gunnam написал: