Write collocates to a file.

464 views
Skip to first unread message

David Baines

unread,
Aug 10, 2009, 3:54:31 AM8/10/09
to nltk-...@googlegroups.com
Hi All,

I was trying to write a short script that would read in a file, and write out to another file a list of the collocates that were found.

The Collocates examples show the following call which prints a list of the collocates:

text4.collocations()

Building collocations list
United States; fellow citizens; years ago; Federal Government; General
Government; American people; Vice President; Almighty God; Fellow
citizens; Chief Magistrate; Chief Justice; God bless; Indian tribes;
public debt; foreign nations; political parties; State governments;
National Government; United Nations; public money

However this code snippet below doesn't work, because text.collocations() returns a NoneType.
The collocations method only prints the results and doesn't return anything.

f = open(options.infile, "r")
raw = f.read()
f.close
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)
collocates = text.collocations()
text = string.join(collocates, '\n')
f = open(options.outfile, "w")
f.write(text)
f.close


Steve pointed me to the collocations HowTo at http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html
Trying the first exercise yields the following result on my machine:

>>> import nltk
>>> from nltk.collocations import *
>>> finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
>>> finder.nbest(bigram_measures.pmi, 10)

Traceback (most recent call last):
 File "<pyshell#8>", line 1, in <module>
   finder.nbest(bigram_measures.pmi, 10)
NameError: name 'bigram_measures' is not defined
>>>

Are there modules that need to be imported? Do other's have the same error, or is it just a problem with my installation?

I've installed Python 2.6.2, nltk, and the various supporting modules.
I'm running on 32 bit Vista.

Many Thanks for any help.
David.


Steven Bird

unread,
Aug 10, 2009, 11:52:53 PM8/10/09
to nltk-...@googlegroups.com
2009/8/10 David Baines <david_...@sil.org>:

> Hi All,
>
> I was trying to write a short script that would read in a file, and write
> out to another file a list of the collocates that were found.
>
> The Collocates examples show the following call which prints a list of the
> collocates:
>
> text4.collocations()
>
> Building collocations list
> United States; fellow citizens; years ago; Federal Government; General
> Government; American people; Vice President; Almighty God; Fellow
> citizens; Chief Magistrate; Chief Justice; God bless; Indian tribes;
> public debt; foreign nations; political parties; State governments;
> National Government; United Nations; public money
>
> However this code snippet below doesn't work, because text.collocations()
> returns a NoneType.

Note that text.collocations() is just a convenience function to
demonstrate various operations on texts. For serious work with
collocations, you can read the implementation of that collocations()
method, or read the collocations howto:

http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk/text.py
http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html

> Trying the first exercise yields the following result on my machine:
>
>>>> import nltk
>>>> from nltk.collocations import *
>>>> finder =
>>>> BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
>>>> finder.nbest(bigram_measures.pmi, 10)
>
> Traceback (most recent call last):
>  File "<pyshell#8>", line 1, in <module>
>    finder.nbest(bigram_measures.pmi, 10)
> NameError: name 'bigram_measures' is not defined

bigram_measures is defined at the top of the howto document:

bigram_measures = nltk.collocations.BigramAssocMeasures()

-Steven Bird

David Baines

unread,
Aug 11, 2009, 4:01:26 AM8/11/09
to nltk-...@googlegroups.com
Thanks Steve,

I had tried to define bigram_measues myself, but I didn't find the correct way to do it.

I don't see this line:
bigram_measures = nltk.collocations.BigramAssocMeasures()
at the top of the Collocations HowTo  at:
http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html

All the best,
David.

Steven Bird

unread,
Aug 14, 2009, 5:57:02 PM8/14/09
to nltk-...@googlegroups.com
2009/8/11 David Baines <david_...@sil.org>:

> I don't see this line:
> bigram_measures = nltk.collocations.BigramAssocMeasures()
> at the top of the Collocations HowTo  at:
> http://nltk.googlecode.com/svn/trunk/doc/howto/collocations.html

Fixed now, sorry. The published howto documents were slightly out of date.

-Steven Bird

Reply all
Reply to author
Forward
0 new messages