Training GloVe on new corpus - corpus size

230 views
Skip to first unread message

Murat Aydogdu

unread,
Aug 5, 2021, 9:47:54 AM8/5/21
to GloVe: Global Vectors for Word Representation
Hi all,

I am planning to train GloVe on a new corpus. I haven't seen much guidance regarding the appropriate size of the corpus. The test corpus is much smaller than the actual corpora the authors used in their paper (Common Crawl, Wikipedia, Twitter) for the pretrained vectors.

So my question is: what is a reasonable size for a new corpus for GloVe to capture a domain's characteristics?

Any insights / leads are much appreciated.

Reply all
Reply to author
Forward
0 new messages