I'm trying to train a word2vec model with approximately 14 million unique words and vector size of 100. gensim crashes after using 12 gb out of 16 gb RAM with following log
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 783, in __init__
fast_version=FAST_VERSION)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 759, in __init__
self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 943, in build_vocab
self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1876, in prepare_weights
self.reset_weights(hs, negative, wv)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1897, in reset_weights
self.syn1neg = zeros((len(wv.vocab), self.layer1_size), dtype=REAL)
MemoryError
anybody know how much memory do I need to get rid of this ?
is it possible to train in batches and then construct a single model ?