sim-shootout - error message

150 views
Skip to first unread message

Sam Marer

unread,
Jan 20, 2015, 6:59:11 AM1/20/15
to gen...@googlegroups.com
When executing run_all.sh from sim-shootout (https://github.com/piskvorky/sim-shootout) [python 2.7.6, gensim 0.10.3] I get the following error:

Traceback (most recent call last):
  File "./prepare_shootout.py", line 132, in <module>
    corpus = ShootoutCorpus(gensim.utils.smart_open(preprocessed_file))
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/textcorpus.py", line 61, in __init__
    self.dictionary.add_documents(self.get_texts())
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/dictionary.py", line 119, in add_documents
    for docno, document in enumerate(documents):
  File "./prepare_shootout.py", line 92, in get_texts
    lines = gensim.corpora.textcorpus.getstream(self.input)  # open file/reset stream to its start
AttributeError: 'module' object has no attribute 'getstream'


Radim Řehůřek

unread,
Jan 20, 2015, 9:32:59 AM1/20/15
to gen...@googlegroups.com
Hi Sam,

sim-shootout used an older version of gensim (~current at the time I wrote the blog posts).

This particular `textcorpus.getstream` function was refactored into `utils.file_or_filename` at a later date:

So I think you can either go back to an older gensim (~0.9.1), or modify line 92 in prepare_shootout.py manually, to use the new, refactored name.

And then, if you feel like doing a good deed, open a pull request against sim-shooutout with the fixed, updated version :)

Best,
Radim

huangz...@hotmail.com

unread,
Jan 20, 2015, 10:51:21 AM1/20/15
to gen...@googlegroups.com
Dear Radim
       I also met the same problem today when use the latest gensim. 
       I think go back to use the older version is not easy, so I decide to modify line 92 in prepare_shootout.py manually. 


 def get_texts(self):
        lines = utils.file_or_filename(self.input)  # open file/reset stream to its start
        for lineno, line in enumerate(lines):
            yield line.split('\t')[1].split()  # return tokens (ignore the title before the tab)
        self.length = lineno + 1

the program get error:

Traceback (most recent call last):
  File "prepare_shootout.py", line 134, in <module>
    corpus = ShootoutCorpus(gensim.utils.smart_open(preprocessed_file))
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.10.3-py2.7-linux-x86_64.egg/gensim/corpora/textcorpus.py", line 61, in __init__
    self.dictionary.add_documents(self.get_texts())
  File "/usr/local/lib/python2.7/dist-packages/gensim-0.10.3-py2.7-linux-x86_64.egg/gensim/corpora/dictionary.py", line 119, in add_documents

    for docno, document in enumerate(documents):
  File "prepare_shootout.py", line 95, in get_texts
    for lineno, line in enumerate(lines):
TypeError: 'GeneratorContextManager' object is not iterable

 so, how can I modify the code now?

Thank you very much!




在 2015年1月20日星期二 UTC+8下午10:32:59,Radim Řehůřek写道:

Radim Řehůřek

unread,
Jan 20, 2015, 4:36:17 PM1/20/15
to gen...@googlegroups.com
Hello,

sorry about that! Apparently that new replacement is not 100% backward compatible...

I think the easiest solution then is to just copy&paste the old `getstream` code into prepare_shootout.py. Link to the code is in my previous post.

Again, if you could make the sim-shoout out repo work with up-to-date latest gensim, and open a pull request, that would be great!

Cheers,
Radim

huangz...@hotmail.com

unread,
Jan 20, 2015, 9:12:49 PM1/20/15
to gen...@googlegroups.com
Dear Radim:
     I go back to use old version of gensim, however, there reports the errors about this package and cython. so I give up. 
are very helpful, but how to combine them to get the wikipedia word2vec wordembedding is not obvious. Is there another tutorials to get wikipedia word2vec word embedding? 
Thanks a lot.

在 2015年1月21日星期三 UTC+8上午5:36:17,Radim Řehůřek写道:

Sam Marer

unread,
Jan 21, 2015, 5:42:54 AM1/21/15
to gen...@googlegroups.com
Hi Radim,

thank you for your help!

I tried with rolling back to gensim 0.9.1. The "AttributeError: 'module' object has no attribute 'getstream'  is no longer displayed, however
AttributeError: 'closing' object has no attribute 'seek' appears.

Traceback (most recent call last):
  File "./prepare_shootout.py", line 132, in <module>
    corpus = ShootoutCorpus(gensim.utils.smart_open(preprocessed_file))
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/textcorpus.py", line 77, in __init__
    self.dictionary.add_documents(self.get_texts())
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/dictionary.py", line 94, in add_documents

    for docno, document in enumerate(documents):
  File "./prepare_shootout.py", line 92, in get_texts
    lines = gensim.corpora.textcorpus.getstream(self.input)  # open file/reset stream to its start
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/textcorpus.py", line 53, in getstream
    result.seek(0)
AttributeError: 'closing' object has no attribute 'seek'

Best,
Sam

Radim Řehůřek

unread,
Jan 21, 2015, 9:19:36 AM1/21/15
to gen...@googlegroups.com
Hmm, then the shootout version of gensim must have been 0.8.9 (not 0.9.1).

Please let me know how that worked (or if you could update the sim-shootout repo to work with latest gensim),
Radim

Sam Marer

unread,
Jan 21, 2015, 10:55:37 AM1/21/15
to gen...@googlegroups.com
with gensim 0.8.9 it looks different once again:


Traceback (most recent call last):
  File "./prepare_shootout.py", line 132, in <module>
    corpus = ShootoutCorpus(gensim.utils.smart_open(preprocessed_file))
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/textcorpus.py", line 75, in __init__
    self.dictionary.add_documents(self.get_texts())
  File "/usr/local/lib/python2.7/dist-packages/gensim/corpora/dictionary.py", line 90, in add_documents

    for docno, document in enumerate(documents):
  File "./prepare_shootout.py", line 95, in get_texts

    self.length = lineno + 1
UnboundLocalError: local variable 'lineno' referenced before assignment

Cheers,
Sam

Radim Řehůřek

unread,
Jan 21, 2015, 5:09:55 PM1/21/15
to gen...@googlegroups.com
That means you're giving it an empty corpus (no lines at all).

What does the log say?

-rr

Sam Marer

unread,
Jan 22, 2015, 10:55:28 AM1/22/15
to gen...@googlegroups.com
Typo..  :p
solved. Thank you!
-sm 
Reply all
Reply to author
Forward
0 new messages