models.phrases.Phrases documentation says:
min_count (float, optional) – Ignore all words and bigrams with total collected count lower than this value
and while the bigrams original_scorer does clearly use min_count, it does not appear to have any effect on unigrams, so I'm not sure what 'words' refers to in the "Ignore all words and bigrams"
I've looked at the code for models.phrases._sentence2token , as called by the Phrases.__getitem__ , and it only uses Phrases.analyze_sentence() to join unigrams to bigrams, then uses new_s.append(words) two lines from the end to return a list of all unigrams and bigrams. My interpretation of min_count as described above is that it would not return words below the min_count value, i.e., would have instead
if phrase_class.vocab[words] >= phrase_class.min_count: new_s.append(words)
I do want that functionality and of course can just subclass and redefine _sentence2token , but I'm wondering what I'm misinterpreting in the above documentation. Tks