There are many online examples showing how to pickle an object to a local file, for example at StackOverflow:
(There are probably HDFS-specific examples, too, in HDFS-specific forums - but I've not used HDFS in around a decade.)
If continually updating an older model, keep in mind that a fresh training, using *all* data/vocabulary, may outperform constant incremental updates – at least with regard to repeatability, and often in other harder-to-measure qualities.
Why?
Imagine you have three separate training corpora: A, B, & C. A model trained from scratch on the mixed combination of [A, B, C] will treat all examples equally, include all relevant words that (across the whole combined corpus) appear `min_count` times, and keep those words in the (usually-most-efficient) most-frequent-to-least-frequent storage order.
If you instead train on A, use the model a bit, then update-train on B, use the model a bit, then update-train on C, the model is always most-influenced by the examples it has seen most-recently. The influence of the B-session, then the C-session, will (depending on other parameters) tend to dilute/shift words left over from A without the interleaved influence of usage examples in A. New words that never co-appear with earlier words, and further never underwent interleaved training with earlier words, may be trained into positions that aren't fully compatible with the unadjusted earlier words. If there's a word that in each individual training corpus appeared only `min_count-1` times, it'll never get a vector – even though altogether in the combined corpus, it appears a full `(3*min_count)-3` times. And, the incremental appending of new words always happens at the end, & incremental training never changes earlier-session word slots – so the incrementally-updated model no longer reliably has its words in the preferred most-frequent-to-least-frequent ordering.
Some projects may manage to get benefits from such incremental updates, but many are doing it without even realizing (or testing for) the kind of model-weaknesses they may be creating, compared to a full balanced training sessions with all data.
- Gordon