BigramCollocationFinder from documents window size

339 views
Skip to first unread message

gabor...@maximilianeum.de

unread,
Oct 31, 2017, 9:34:52 AM10/31/17
to nltk-users
Dear All,

I am using BigramCollocationFinder.from_documents() method, and I would really like to set a window size other than the default 2. I have checked the source code, _build_new_documents internal function, which is used by from_documents(),  does have a window_size attribute but it is automatically using the default 2.  I was playing with workarounds but none of them brought any result:

        finder=BigramCollocationFinder('','',window_size=3)
        finder.default_ws=3

        finder = finder.from_documents(documents)

Any suggestion?

Best,

Gabor

Dimitriadis, A. (Alexis)

unread,
Oct 31, 2017, 3:32:38 PM10/31/17
to nltk-...@googlegroups.com
The `window_size` attribute is a parameter of the `collocate()` method, so use it after you create the finder:

finder.collocate(window_size=3)

This means that you can use different window sizes with the same collocation finder. But I hope you realize that these are still bigram collocations (see the documentation). The module `nltk.collocations` has classes for trigrams and quadgrams as well.

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dimitriadis, A. (Alexis)

unread,
Oct 31, 2017, 3:59:31 PM10/31/17
to nltk-...@googlegroups.com
Sorry, i misremembered. It’s an argument of Text.collocations(), which passes the argument to BigramCollocationFinder.from_words, like this:

    finder = BigramCollocationFinder.from_words(tokens, window_size)

If you’re really set to use this with `from_documents()`, you need to “guerrilla patch” the class itself:

    BigramCollocationFinder.default_ws = 3


Again, this will not give you ngrams but non-continuous bigrams (“skip-grams”).

Alexis
Reply all
Reply to author
Forward
0 new messages