Ngram detection Phrases vs Phraser

3,348 views
Skip to first unread message

Simon Lindgren

unread,
Mar 26, 2017, 8:05:54 AM3/26/17
to gensim
Gensim is great!

I have this code for detecting bigrams:

# Find bigrams in the texts
texts2
=[]
from gensim.models import Phrases
bigram
= Phrases(texts) # find bigrams
for t in texts:
 texts2
.append(bigram[t]) # append the bigrams version of the text


It works! But it throws this error:

/python/site-packages/gensim/models/phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class
  warnings
.warn("For a faster implementation, use the gensim.models.phrases.Phraser class")


I find no Phraser to import in Gensim. Should I do things differently?

Shubh Vachher

unread,
Mar 26, 2017, 4:45:18 PM3/26/17
to gensim

Gordon Mohr

unread,
Mar 26, 2017, 7:10:21 PM3/26/17
to gensim
That's a warning rather than an error - so if everything else seems to be working, there's no need to change anything. 

The warning is kind of misleading, too. `Phraser` is an alternative that must be built from an initial `Phrases` instance, in a time-consuming step. It then works a little faster but uses much less memory – but that Phraser is stuck with the min_count/threshold parameters in effect when it was constructed. In contrast, if you retain the larger/slower `Phrases`, you can still tinker with those parameters. So `Phraser` is not really a drop-in 'faster implementation' everyone would want to consider. You'd need to review the doc-comments & code to evaluate if it's helpful for your needs.

- Gordon
Reply all
Reply to author
Forward
0 new messages