lexical diversity based on genre in brown corpus

497 views
Skip to first unread message

typetoken

unread,
Jul 25, 2012, 7:42:18 AM7/25/12
to nltk-...@googlegroups.com
Dear All, 
While doing the exercise 9 on page75, 
1) I fail to search for any example such as ‘monstrous' which has different meanings across different texts

2) I wrote the following codes to obtain lexical diversity according to different genre in the Brown Corpus. However, it always pops up and tells me the variable is not defined:

>>> def lexical_diveristy (genre):
genre_text = brown.words(categories='genre')
genre_text_lower = [w.lower() for w in genre_text]
return len(genre_text_lower)/len(set(genre_text_lower))

>>> lexical_diversity(news)

Traceback (most recent call last):
  File "<pyshell#146>", line 1, in <module>
    lexical_diversity(news)
NameError: name 'news' is not defined

Thanks for your kind tips.

Sincerely
Typetoken

John H. Li

unread,
Jul 25, 2012, 9:54:40 AM7/25/12
to nltk-...@googlegroups.com
I try a new set of cod by inserting a loop of for, yet it still failed:

>>> def lexical_diveristy (genre):
for genre in brown.categories():
genre_text = brown.words(categories='genre')
genre_text_lower = [w.lower() for w in genre_text]
return len(genre_text_lower)/len(set(genre_text_lower))

>>> lexical_diversity(news)

Traceback (most recent call last):
  File "<pyshell#189>", line 1, in <module>
Typetoken

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/nltk-users/-/3hUB2SQya8QJ.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.

Kyle Marek-Spartz

unread,
Jul 25, 2012, 10:02:34 AM7/25/12
to nltk-...@googlegroups.com
It seems you are swapping variables and strings.

news is a variable
'news' is a string


Try this:

def lexical_diveristy (genre):
genre_text = brown.words(categories=genre)
genre_text_lower = [w.lower() for w in genre_text]
return len(genre_text_lower)/len(set(genre_text_lower))

lexical_diversity('news')

The function then binds the input string to the variable genre,
allowing it to be used in the function.


Kyle Marek-Spartz

University of Minnesota – Twin Cities: Linguistics Undergraduate
Computer Science Teaching Assistant
Amateur Radio Callsign – KDØGTK
kyle.mar...@gmail.com
mare...@umn.edu

John H. Li

unread,
Jul 25, 2012, 11:19:56 AM7/25/12
to nltk-...@googlegroups.com
Dear Kyle,

Thanks indeed for your helping me out timely. Now it works. As a beginner , I have been really frustrated for my whole day experiment on the exercises. Actually, my codes are based on the following model in the book of Natural Language Processing with Python (P9)

>>> def lexical_diversity(text): 
... return len(text) / len(set(text))

>>> lexical_diversity(text3)
16.050197203298673
>>> lexical_diversity(text5)
7.4200461589185629

I notice that text3 and text5 contain no single quotation when they use it in the following example. Isn't it strange? 

lexical_diversity(text3)
lexical_diversity(text5)

Therefore, I type in lexical_diversity(news)  instead of lexical_diversity('news'), which, however, leads to my error.


I am quite confused now. Any further suggestions?

Thanks indeed

Sincerely
Typetoken

Kyle Marek-Spartz

unread,
Jul 25, 2012, 11:56:04 AM7/25/12
to nltk-...@googlegroups.com
At the beginning of the chapter you call:

from nltk.book import *

Which creates the text variables.


Kyle

Alexis Dimitriadis

unread,
Jul 25, 2012, 12:17:58 PM7/25/12
to nltk-...@googlegroups.com
I notice that text3 and text5 contain no single quotation when they use it in the following example. Isn't it strange? 


text3 and text5 are the names of objects, defined (imported) when you do `from nltk import *`.
"news" is a string, a possible value for the argument categories (passed in as the value of the genre variable).


As a beginner , I have been really frustrated for my whole day experiment on the exercises.
Despite the efforts of the authors, the NLTK book is not really sufficient as an introduction to python-- especially if you're studying it on your own. I recommend working through the python tutorial (http://docs.python.org/tutorial/), especially sections 3-6 and 9, then returning to the nltk. I think you'll find it much more rewarding then.

Best,

Alexis

John H. Li

unread,
Jul 25, 2012, 9:52:28 PM7/25/12
to nltk-...@googlegroups.com
Dear Alexis,

Thanks very much indeed for your kind instructions and sharing of your valuable experience. It is helpful for my self-instruction.

Sincerely
Typetoken
Reply all
Reply to author
Forward
0 new messages