I implemented this and the two dictionaries come out to ~150 MB together. And after the initial load of the two pickled dictionaries the individual speeds of the vector extraction is quite fast, (as in it is at least not frustratingly slow). Sorry, I cannot give you better metrics than this because I don't have access to a machine that can load the entire model and then compare the two speeds (however, I can possibly test on a subset of the data).
This obviously doesn't answer the question of the nearest neighbours search. But I am thinking that some clever indexing will also be able to solve that, no concrete thoughts right now on that.
As a plug for this method: maybe in virtual instances this can be used to save money on RAM and use the large and cheaper disk memory. Also, larger models like the ones trained on Wikipedia etc can now be dealt with if needed given we don't want to cutoff the model at some point.
-Ved