the preformance of fastText on chinese document classification

614 views
Skip to first unread message

Shi Jerry

unread,
Sep 22, 2016, 5:06:14 AM9/22/16
to fastText library
Hi all, 
Thanks for fastText library team!

I am using fastText on chinese document classification with parameters as classification-example.sh , and the data may be short text , the test results as below:

Sample Size Label Numbers Test Sample Size Recall Precision Time(s)
7w184 21717 0.419 0.419 24.162
15.7237w 199 49524 0.603 0.603 60.115
80w 207 67861 0.632 0.632 88.171
93.045w 208 73737 0.646 0.646 100.818


how can I improve the performance, if there is any idea about this task? thank you all!

Edouard G.

unread,
Sep 23, 2016, 11:18:23 AM9/23/16
to fastText library
Hi,

If your training set size is small, you can improve performance by increasing the learning rate (for example using -lr 0.5) as well as the number of epochs (for example using -epoch 20). You can also try to increase the dimension of the word embeddings (for example using -dim 50).

Finally, you can probably speedup training and testing time by using the hierarchical softmax instead of the softmax (using -loss hs).

Best,
Edouard.

wuwenm...@gmail.com

unread,
Oct 31, 2017, 3:11:24 AM10/31/17
to fastText library

Hi all,
   Thanks for your share!
   I want to know how you prepare the chinese training set for classification problem? need the sentence be segmented first?
Thx!

在 2016年9月22日星期四 UTC+8下午5:06:14,Shi Jerry写道:
Reply all
Reply to author
Forward
0 new messages