pylab: ValueError: x and y must have same first dimension

1,027 views
Skip to first unread message

typetoken

unread,
Aug 22, 2012, 12:10:07 AM8/22/12
to nltk-...@googlegroups.com
For the following exercise 23 on page 76, or  http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html,

★ Zipf's Law: Let f(w) be the frequency of a word w in free text. Suppose that all the words of a text are ranked according to their frequency, with the most frequent word first. Zipf's law states that the frequency of a word type is inversely proportional to its rank (i.e. f × r = k, for some constant k). For example, the 50th most common word type should occur three times as frequently as the 150th most common word type.
  1. Write a function to process a large text and plot word frequency against word rank using pylab.plot. Do you confirm Zipf's law? (Hint: it helps to use a logarithmic scale). What is going on at the extreme ends of the plotted line?
I wrote the following codes to make a plot. However, it pops up  ValueError: x and y must have same first dimension.  Any tips for the following codes? thanks.

>>> def zif(text):
fdist = nltk.FreqDist(text)
import pylab
word = fdist.keys()
x = []
y = []
index = []
for i in range(0,len(set(text)),1):
x.extend(word[i:i+1])
y.extend('fdist[word[i:i+1]]')
index.extend('i+1')
pylab.plot(index, y,'b')
pylab.title('zipf law')
pylab.xlabel('word rank')
pylab.ylabel('word frequency')
pylab.show()

>>> zif(text)

Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    zif(text)
  File "<pyshell#34>", line 12, in zif
    pylab.plot(index, y,'b')
  File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2458, in plot
    ret = ax.plot(*args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 3848, in plot
    for line in self._get_lines(*args, **kwargs):
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 323, in _grab_next_args
    for seg in self._plot_args(remaining, kwargs):
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 300, in _plot_args
    x, y = self._xy_from_xy(x, y)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 240, in _xy_from_xy
    raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension 

Thanks indeed.

Tarik Naeem

unread,
Sep 3, 2012, 1:23:34 AM9/3/12
to nltk-...@googlegroups.com
That was helpful..

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/nltk-users/-/UYjFHCSpuC0J.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.

peter ljunglöf

unread,
Sep 3, 2012, 3:36:17 AM9/3/12
to nltk-...@googlegroups.com
Why do you use string quotes in the calls to y.extend and index.extend?

/Peter

John H. Li

unread,
Sep 3, 2012, 3:44:06 AM9/3/12
to nltk-...@googlegroups.com
Many thanks.  Now the solution has been found: 
For the previous codes, I replace y.extend(fdist[word[i:i+1]]) with y.extend([fdist[word[i]]]) and index.extend([i+1]). Then it works. I come to realize the difference between y.extend('90') , y.extend(['90']),  y.extend(90) and y.extend([90]).  The correct codes are as follows:

>>> import nltk
>>> text = nltk.corpus.brown.words(categories = 'news')
>>> def zif(text):
 fdist = nltk.FreqDist(text)
 import pylab
 word = fdist.keys()
 x = []
 y = []
 index = []
 for i in range(0,len(set(text)),1):
  x.extend([word[i]])
  y.extend([fdist[word[i]]])
  index.extend([i+1])
 pylab.plot(index, y,'-bo')
 pylab.title('zipf law')
 pylab.xlabel('word rank')
 pylab.ylabel('word frequency')
 pylab.show()

 
>>> zif(text)

--



Reply all
Reply to author
Forward
0 new messages