(1) No.
In general you wouldn't want to start by loading some model from disk. If you're starting a new training (on 900GB of data!), just create a new model with explicit parameters.
Also, if at all possible you should only be calling `build_vocab()` once, without the `update` parameter. There are lots of gotchas with incremental updates that you'd only want to deal with as an advanced user. (And, as an advanced user, I personally don't think it's ever a good idea, though clearly some people find it useful.)
Similarly, only a single call to `train()` which includes all data will give the best results. (Also, ideally, the data that's *late* in the corpus shouldn't be wildly different in vocabulary/usage than that early.)
So, if using the `corpus_file` method, all your data should be in one file. (Alternatively, an iterable sequence class like `PathLineIterator` provided as a `corpus_iterable` could process multiplepre-tokenized files in one directory, but may not be able to keep all cores as busy as the `corpus_file` method.)
(2) & (3) With `corpus_file`, you're likely to get the highest training throughput with `workers` equal to the number of available cores. (With a `corpus_iterable`, the actual best throughput will vary based on your other parameters, needs to be experimentally discovered, and may max out with a `workers` value lower than the number of cores.)
Tweaking other parameters – `negative`, `window`, `sample`, `min_count`, etc – will also have effects on total runtime, but whether their speed-ups are worth whatever other changes in vector-quality you observe, you'd have to test experimentally with rgard to your project's goals.
With a corpus that large, especially-aggressive `min_count` (larger to shrink the surviving vocabulary) and `sample` (smaller to drop more highly-repeated words) could offer big reductions in runtime at no cost in final vector quality. (In fact, more-aggressively dropping rare words or undersampling frequent words often *improves* remaining vector quality for important tasks.)
Also, with a corpus that larger, and if the later-texts exhibit the same word-usage-patterns as the earlier ones, you may not practically need as many training epochs - you've got what's functionally as good as a repeated epoch in the large corpus itself. (If I recall correctly, the famous circa-2013 'GoogleNews' vectors, trained from ~100B words of news stories, used on 3 epochs, rather than a typical default of 5 or the even-larger count-of-epochs often used in smaller corpora.)
If you can get all your data into a single `corpus_file`, then moving to a system with even more cores (32, 64, etc) should further accelerate training.
Good luck!
- Gordon