the preformance of fastText on chinese document classification

Shi Jerry

unread,

Sep 22, 2016, 5:06:14 AM9/22/16

to fastText library

Hi all,

Thanks for fastText library team!

I am using fastText on chinese document classification with parameters as classification-example.sh , and the data may be short text , the test results as below:

Sample Size	Label Numbers	Test Sample Size	Recall	Precision	Time(s)
7w	184	21717	0.419	0.419	24.162
15.7237w	199	49524	0.603	0.603	60.115
80w	207	67861	0.632	0.632	88.171
93.045w	208	73737	0.646	0.646	100.818

how can I improve the performance, if there is any idea about this task? thank you all!

Edouard G.

unread,

Sep 23, 2016, 11:18:23 AM9/23/16

to fastText library

Hi,

If your training set size is small, you can improve performance by increasing the learning rate (for example using -lr 0.5) as well as the number of epochs (for example using -epoch 20). You can also try to increase the dimension of the word embeddings (for example using -dim 50).

Finally, you can probably speedup training and testing time by using the hierarchical softmax instead of the softmax (using -loss hs).

Best,

Edouard.

wuwenm...@gmail.com

unread,

Oct 31, 2017, 3:11:24 AM10/31/17

to fastText library

Hi all,

Thanks for your share!

I want to know how you prepare the chinese training set for classification problem? need the sentence be segmented first?