Hi all,
I am planning to train GloVe on a new corpus. I haven't seen much guidance regarding the appropriate size of the corpus. The test corpus is much smaller than the actual corpora the authors used in their paper (Common Crawl, Wikipedia, Twitter) for the pretrained vectors.
So my question is: what is a reasonable size for a new corpus for GloVe to capture a domain's characteristics?
Any insights / leads are much appreciated.