The streaming approach works if you have a process (like training) that needs to look at the data, momentarily in order, then proceed. The GoogleNews-vectors-negative300.bin.gz is on the other hand the end-result of a training process. Most uses will want to have the results completely in memory for common bulk or random-access operations (like finding the N most-similar vectors to a target value).
Note that the data in that file is around 3.5GB uncompressed – so you essentially need 4+ GB free to have any chance of completing a load and then doing other operations on the data.
And further, to do operations like `most_similar()`, the vectors need to be unit-normalized. By default, this is done non-destructively – so it winds up creating another 3.5+ GB structure in memory, alongside the 3.5+ GB raw vectors. (You can force this to happen in-place, saving memory, by manually calling `model.init_sims(replace=True)` after loading but before any similarity-operations. But that requires the initial load to have succeeded.)
I believe the GoogleNews-vectors-negative300.bin.gz vectors are front-loaded, with more-common tokens at the beginning – so you could consider uncompressing the data and truncating/splitting the file at reasonable points – then editing the 1st-lines of any such edited files to still include an accurate count. That'd allow you to work with subsets of the data with fewer than the full 3M tokens. We don't have any code to do this; you'd have to use other tools to edit the file(s) based on reading what the loading code expects.
(With gensim's native save format, there are some tricks that by using memory-mapping might allow leaving most of the data in non-resident memory. But, you'd have to succeed in loading the full data at least once before re-saving it to try those, and even if it works the performance for common tasks would likely be very poor.)
This sized dataset essentially requires more RAM, to work with sensibly.
- Gordon