Significance of pre trained vectors for supervised classification

ananda...@gmail.com

unread,

May 4, 2017, 9:32:46 AM5/4/17

to fastText library

I am new to fasttext and I was wondering on how pretrained vectors would improve a supervised classification task. Say I have got a decently sized labeled dataset for the supervised task. How or in what ways would the usage of pre trained vectors affect the classification in general? Any pointers or suggestions would be much appreciated. Thanks!

isneh...@gmail.com

unread,

May 4, 2017, 10:31:39 AM5/4/17

to fastText library, ananda...@gmail.com

Pretrained vectors provide semantic/syntactic knowledge about language in form of embeddings. vector models trained with fastText generates embeddings that are significantly better than original word2vec at encoding syntactic information. So, lets say if you have imbalanced dataset for supervised classification, usage of pretrained vectors will improve the accuracy of classification models for all labels.

Edward Dixon

unread,

May 9, 2017, 9:26:09 AM5/9/17

to fastText library, ananda...@gmail.com

"Decently sized" is the key criteria here. With enough labelled data, you are better off not using pre-trained vectors, but rather training only on text from your problem domain. However, in practice, you are likely to find that "quantity has a quality all its own", and the pre-trained vectors give you an edge by incorporating information about the structure of language gleaned from the larger set. Of course, it is very easy to try training models with and without pre-trained vectors so as to empirically verify the claims of strange folk who talk to you on the internet!

Reply all

Reply to author

Forward