Passnig a custome tokenizer with parameters in TfidfVectorizer

9 views
Skip to first unread message

David S. Batista

unread,
Aug 21, 2016, 3:39:44 PM8/21/16
to nltk-users
I'm trying to build a TfidfVectorizer passing a customer tokenizer which takes lang as parameter.

def keyphrases(text, lang):
   #TODO: extracts keyphrases from different languages

vectorizer = TfidfVectorizer(
lowercase=True, min_df=2, norm='l2', smooth_idf=True,
stop_words='english', tokenizer=keyphrases(lang),
sublinear_tf=True)

how can be this achieved?

Reply all
Reply to author
Forward
0 new messages