Meaning of 'cutoff' in NgramTaggers

21 views

Skip to first unread message

tom.pr...@gmail.com

unread,

Jan 27, 2015, 3:44:14 PM1/27/15

to nltk-...@googlegroups.com

I'm a bit new to all this.

I am having a problem with automatic tagging.

I have 4000+ corpora and each has a sentence like [(A,1) (B,2) (C,3) (D,4)].

I am using a 3-gram tagger backed off to 2-gram, 1-gram, Default.. Problem, the tags generated by the tagger don't correspond to the sequence 1,2,3,4. I am obviously doing something wrong here so I wanted to ask about the cutoff setting. I'm not clear on the effect a non-zero setting would have. Could someone explain what cutoff=N actually means/does?

Alexis Dimitriadis

unread,

Jan 29, 2015, 4:36:00 AM1/29/15

to nltk-...@googlegroups.com

The cutoff parameter is used while training a model. The following description appears in the documentation of the Ngram class:

   :param cutoff: If the most likely tag for a context occurs
        fewer than *cutoff* times, then exclude it from the
        context-to-tag table for the new tagger.

By excluding rare contexts, you allow them to be passed to the backoff tagger which can presumably offer a better guess. Otherwise even a single instance would cause the current tagger (e.g., the trigram tagger) to return a result.

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages