My 1st guess would be that the tally of counts will only be for words that survived the `min_count` cutoff, and your separate count might not apply that. If that's not it, perhaps you're not counting tokens in the *exact* same form of the corpus as was passed to the model (as part of the constructor or `.build_vocab()`.
If you take the *exact* same corpus you're counting manually, and create a *new* `Word2Vec` instance, with the same parameters, then pass *that* corpus to the new model's `.build_vocab()`, is there still a tally mismatch? If so, you could do a separate by-word survey – say, using the Python `Counter` class to tally all the words in your reference corpus – then check if the sum of all word counts in that new survey is more like the earlier model or your manual count.
If this new count disagrees with the oldeer model, you can drill into individual word counts to see which differ, which might hint as to the source of the discrepancy. (Are the words with different counts rare words? Exceptional in some other way? etc.)
- Gordon