Attempting to add a new model to gensim

966 views
Skip to first unread message

Shivani

unread,
Aug 9, 2011, 5:25:36 PM8/9/11
to gensim
Hello gensim fans and Randim,

I am stuck at a very strange place in my attempt to import add the
unigram model support to gensim

I wrote unigram.py by looking at the logentropy model
I then added it along with a __init__.py into a folder called gensim
my home directory

I then added the path to python

>import sys
>sys.path.insert(0,'/home/shivani/mycode/python/gensim/')
>uni = UnigramModel(corpus=myC, id2word=myCorpus.dictionary)

where myCorpus is the non-serialized version of myC

I do not get any messages output when I do this.

Further when I try to create a similarity index using

>index = similarities.Similarity(uni[myC])

I get the following error

#is_corpus, bow = utils.is_corpus(bow)
AttributeError: 'module' object has no attribute 'is_corpus

Any ideas?

Shivani

Radim

unread,
Aug 10, 2011, 9:53:18 PM8/10/11
to gensim
Hello Shivani,

my name is Radim, not Randim.

> I am stuck at a very strange place in my attempt to import add the
> unigram model support to gensim

It could be an importing problem. Avoid creating packages named
`gensim`, as that will clash with the existing gensim package. Just
import your module directly, do not modify sys.path.

If you want to copy your module inside your local gensim installation,
a good place would be into gensim/models (where logentropy is).

> I get the following error
>
>     #is_corpus, bow = utils.is_corpus(bow)
> AttributeError: 'module' object has no attribute 'is_corpus

There was a PEP8 change in variable names in 0.8: isCorpus became
is_corpus etc. It looks like you're mixing two codebases, 0.8 and
0.7.8, which is not a good idea. Try installing the newer gensim with
`easy_install -U gensim`. Just make sure you know where your local
modifications reside, so you don't overwrite your code by mistake :)

HTH,
Radim

Shivani

unread,
Aug 16, 2011, 2:44:12 PM8/16/11
to gensim
I re-installed gensim and integrated my changes into a newer version
of my unigram model

I am unable to install it though by using the above mentioned
procedure

I get the following error

uni = gensim.models.unigrammodel.UnigramModel(myC)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'unigrammodel'

Any leads?

Thanks a lot,
Shivani

Radim

unread,
Aug 16, 2011, 2:48:25 PM8/16/11
to gensim
How do you import your module? It should be something like `import
gensim.models.unigrammodel`. Or you can edit the __init__.py file of
gensim/model, to import your module automatically when you do `import
gensim`.

HTH,
Radim

Shivani

unread,
Aug 16, 2011, 4:05:56 PM8/16/11
to gensim
Thanks Radim,

I did modify the __init__.py in the gensim/models folder as well. I
added the following line

from unigrammodel import UnigramModel

I reinstalled gensim with this updated model and I got the following
error when trying to create a unigram model.

uni = gensim.models.unigrammodel.UnigramModel(corpus=myC)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'gensim' is not defined


I still don't know what I should change ... Maybe install gensim with
a "develop" option?

Shivani

Radim

unread,
Aug 18, 2011, 5:36:35 PM8/18/11
to gensim
I'm afraid I don't follow. Why did you reinstall gensim? You'll have
to import modules before using them, or you get the NameError.

Can someone help here?

Radim

Shivani

unread,
Aug 19, 2011, 3:55:14 PM8/19/11
to gensim
Thanks Radim, for your email.
I undid the changes to the gensim and re-installed it, along with
other tools, and I am back to square one.

How about you give me a brief idea of how I can try adding a new model
to gensim and then later contributing it.

This question not only goes for adding the unigram model, but also for
the new similarity metric based on KL divergence that I plan to add.


Shivani

Radim

unread,
Aug 19, 2011, 4:45:39 PM8/19/11
to gensim
Hello Shivani,

to add a model `my_model.py` to gensim, copy that file to the gensim/
models folder. It is the same folder where lsimodel.py,
logentropy_model.py etc. reside. Then when you want to use the model,
you first do `from gensim.models import my_model`, and then you can
use `my_model`.

Hope that helps,
Radim

Shivani

unread,
Aug 20, 2011, 12:33:21 PM8/20/11
to gensim
so I don't need to re-install gensim after adding the unigram_model.py
to the gensim/models folder?
what about the __init__.py file.. does it need any updating?

Shivani

Radim

unread,
Aug 20, 2011, 3:17:40 PM8/20/11
to gensim
Nope :-) No need to reinstall gensim.

Python can load .py modules dynamically, so you can just import the
file you need. Gensim is pure python, it doesn't compile anything, so
just import the module&go.

Also see http://docs.python.org/tutorial/modules.html and
http://effbot.org/zone/import-confusion.htm for more info.

The only gotcha can happen if you modify an imported module while you
are inside a Python shell, "under Python's hands" so to speak. Then
you'd have reload the changed module, or exit and re-enter the shell,
etc. But if you run scripts from command line, you don't have to worry
about this at all.

Best,
Radim

Shivani

unread,
Aug 20, 2011, 4:20:26 PM8/20/11
to gensim
Hello Randim,

Thanks so much for being so patient with me as I learn how to do this.

The declaration/defn of this class is as follows in a file called
unigram.py

logger = logging.getLogger('gensim.models.unigram_model')
class UnigramModel(interfaces.TransformationABC):
""class description....

I tried what you suggested, just added this file to the gensim/models
folder which is located in a folder "/home/shivani/researchtools"
This folder is a local folder where I downloaded and ran "python
setup.py install" from to intially install gensim.

I tried the following in my python (not command line)

>>from gensim.models import unigram
>>> myC
<gensim.corpora.mmcorpus.MmCorpus object at 0x9c4958c>
>>> uni = gensim.models.UnigramModel(corpus=myC, id2word=myCorpus.dictionary)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'UnigramModel'
>>> uni = gensim.models.unigram_model(corpus=myC, id2word=myCorpus.dictionary)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'unigram_model

:(

Shivani

Radim

unread,
Aug 20, 2011, 5:55:25 PM8/20/11
to gensim
Hia Shivani,

On Aug 20, 10:20 pm, Shivani <raoshiv...@gmail.com> wrote:
> Hello Randim,
>
> Thanks so much for being so patient with me as I learn how to do this.
>
> The declaration/defn of this class is as follows in a file called
> unigram.py
>
> logger = logging.getLogger('gensim.models.unigram_model')
> class UnigramModel(interfaces.TransformationABC):
> ""class description....
>
> I tried what you suggested, just added this file  to the gensim/models
> folder which is located in a folder "/home/shivani/researchtools"
> This folder is a local folder where I downloaded and ran "python
> setup.py install" from to intially install gensim.
>
> I tried the following in my python (not command line)
>
> >>from gensim.models import unigram
> >>> myC
>
> <gensim.corpora.mmcorpus.MmCorpus object at 0x9c4958c>>>> uni = gensim.models.UnigramModel(corpus=myC, id2word=myCorpus.dictionary)
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'UnigramModel'>>> uni = gensim.models.unigram_model(corpus=myC, id2word=myCorpus.dictionary)
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'unigram_model

try doing:

from gensim.models import unigram
uni = unigram.UnigramModel(corpus=myC, id2word=myCorpus.dictionary)

the `unigram` identifier is your module name; you need it to access
the objects you created in it.

HTH,
Radim

Shivani

unread,
Aug 23, 2011, 10:30:29 AM8/23/11
to gensim
I was able to access it by this method, but i had to re-install gensim
with this file in the folder gensim/models
Is there something else I could do to avoid this(re-installing
gensim)?

anyhow, I am now testing the unigram code and I will be bugging you
next about logging messages and debug mode soon

Thanks a ton again,

Shivani

Stephan Gabler

unread,
Aug 23, 2011, 11:05:09 AM8/23/11
to gen...@googlegroups.com

If you have a copy of gensim you are permanently working on (permanent changes)
you should probably not install it but just tell python where to find it.

You can either add it to you PYTHONPATH environment variable or create a
symlink in your in your site-packages directory.

Hope this helps. Please just ask again if you have no idea what I am talking about ;-)


stephan

Reply all
Reply to author
Forward
0 new messages