pylab: ValueError: x and y must have same first dimension

typetoken

unread,

Aug 22, 2012, 12:10:07 AM8/22/12

to nltk-...@googlegroups.com

For the following exercise 23 on page 76, or http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html,

★ Zipf's Law: Let f(w) be the frequency of a word w in free text. Suppose that all the words of a text are ranked according to their frequency, with the most frequent word first. Zipf's law states that the frequency of a word type is inversely proportional to its rank (i.e. f × r = k, for some constant k). For example, the 50th most common word type should occur three times as frequently as the 150th most common word type.

Write a function to process a large text and plot word frequency against word rank using pylab.plot. Do you confirm Zipf's law? (Hint: it helps to use a logarithmic scale). What is going on at the extreme ends of the plotted line?

I wrote the following codes to make a plot. However, it pops up ValueError: x and y must have same first dimension. Any tips for the following codes? thanks.

>>> def zif(text):

fdist = nltk.FreqDist(text)

import pylab

word = fdist.keys()

x = []

y = []

index = []

for i in range(0,len(set(text)),1):

x.extend(word[i:i+1])

y.extend('fdist[word[i:i+1]]')

index.extend('i+1')

pylab.plot(index, y,'b')

pylab.title('zipf law')

pylab.xlabel('word rank')

pylab.ylabel('word frequency')

pylab.show()

>>> zif(text)

Traceback (most recent call last):

File "<pyshell#35>", line 1, in <module>

zif(text)

File "<pyshell#34>", line 12, in zif

pylab.plot(index, y,'b')

File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2458, in plot

ret = ax.plot(*args, **kwargs)

File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 3848, in plot

for line in self._get_lines(*args, **kwargs):

File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 323, in _grab_next_args

for seg in self._plot_args(remaining, kwargs):

File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 300, in _plot_args

x, y = self._xy_from_xy(x, y)

File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 240, in _xy_from_xy

raise ValueError("x and y must have same first dimension")

ValueError: x and y must have same first dimension

Thanks indeed.

Tarik Naeem

unread,

Sep 3, 2012, 1:23:34 AM9/3/12

to nltk-...@googlegroups.com

That was helpful..

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/nltk-users/-/UYjFHCSpuC0J.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.

peter ljunglöf

unread,

Sep 3, 2012, 3:36:17 AM9/3/12

to nltk-...@googlegroups.com

Why do you use string quotes in the calls to y.extend and index.extend?

/Peter

John H. Li

unread,

Sep 3, 2012, 3:44:06 AM9/3/12

to nltk-...@googlegroups.com

Many thanks. Now the solution has been found:

For the previous codes, I replace y.extend(fdist[word[i:i+1]]) with y.extend([fdist[word[i]]]) and index.extend([i+1]). Then it works. I come to realize the difference between y.extend('90') , y.extend(['90']), y.extend(90) and y.extend([90]). The correct codes are as follows:

>>> import nltk

>>> text = nltk.corpus.brown.words(categories = 'news')

>>> def zif(text):

fdist = nltk.FreqDist(text)

import pylab

word = fdist.keys()

x = []

y = []

index = []

for i in range(0,len(set(text)),1):

x.extend([word[i]])

y.extend([fdist[word[i]]])

index.extend([i+1])

pylab.plot(index, y,'-bo')

pylab.title('zipf law')

pylab.xlabel('word rank')

pylab.ylabel('word frequency')

pylab.show()

>>> zif(text)

--

Reply all

Reply to author

Forward