How to save LDA module (LDA.save())

3,612 views
Skip to first unread message

Yaniv Sheffer

unread,
Jun 9, 2017, 8:50:36 AM6/9/17
to gensim
Hi all,

Pretty simple questions but I'm having troubles with saving the LDA module I just trained... :\
Here is the function I'm using:

    def train(self, wordids_txt, tfidf_mm, store_to_file):
        logging.info("LDA: Starting training LDA module:")
        logging.info("wordids.txt path: " + wordids_txt)
        logging.info("tfidf.mm path: " + tfidf_mm)
        logging.info("Storing module to: " + store_to_file)
        
        # load id->word mapping (the dictionary), one of the results of step 2 above
        id2word = gensim.corpora.Dictionary.load_from_text(wordids_txt)

       # load corpus iterator
        mm = gensim.corpora.MmCorpus(tfidf_mm)
        # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output
        print(mm)
        # MmCorpus(4265002 documents, 100000 features, 668306661 non-zero entries)

        # extract 100 LDA topics, using 1 pass and updating once every 1 chunk (10,000 documents)
        self.lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1,
                                                       chunksize=10000, passes=1)
        # save model to file
        self.lda.save(store_to_file)


My questions are:
1) Do I need to manually create a file before?
2) What is the file type? txt?
3) If I have a C:\lda_model.txt, do I call self.lda.save("C:\lda_model") or self.lda.save("C:\lda_model.txt") 
4) Is there a problem saving it to C:\ drive? (I have admin premissions)

Chinmaya Pancholi

unread,
Jun 9, 2017, 3:39:53 PM6/9/17
to gensim
Hey Yaniv!

No, you don't need to manually create a file before saving your model and there is no specific file type (your file may even be called "lda_model_yaniv"). You would just need to call the `save` function like : my_lda_model.save("my_destination_file"). This test here (https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_ldamodel.py#L403) might help in understanding the save/load methods for Gensim's LDA model better. Also, if you are getting a particular error, posting that here could be really helpful in figuring out the problem in the code, if any.
In case you still face some problems, don't hesitate to pose further queries. I'd be happy to help. :) 

Ben Wolfley

unread,
Jun 2, 2020, 4:11:22 PM6/2/20
to Gensim
Hi Chinmaya,

I am trying to save and load my gensim LDA model using a relative path, but I keep getting the following error:

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\BenWo\\OneDrive\\Documents\\Python Scripts\\AnswersDocLookup\\LDAModel'

I tried running as an administrator and get the same result. Do you know what is causing this?

Here is the extra error info:



---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-3-50be626ed50b> in <module>
----> 1 lda_reload = gensim.models.LdaModel.load(os.getcwd())
      2 NewDir = os.getcwd()
      3 lda_reload.save(NewDir)

~\anaconda3\lib\site-packages\gensim\models\ldamodel.py in load(cls, fname, *args, **kwargs)
   1636         """
   1637         kwargs['mmap'] = kwargs.get('mmap', None)
-> 1638         result = super(LdaModel, cls).load(fname, *args, **kwargs)
   1639 
   1640         # check if `random_state` attribute has been set after main pickle load

~\anaconda3\lib\site-packages\gensim\utils.py in load(cls, fname, mmap)
    424         compress, subname = SaveLoad._adapt_by_suffix(fname)
    425 
--> 426         obj = unpickle(fname)
    427         obj._load_specials(fname, mmap, compress, subname)
    428         logger.info("loaded %s", fname)

~\anaconda3\lib\site-packages\gensim\utils.py in unpickle(fname)
   1379 
   1380     """
-> 1381     with open(fname, 'rb') as f:
   1382         # Because of loading from S3 load can't be used (missing readline in smart_open)
   1383         if sys.version_info > (3, 0):

~\anaconda3\lib\site-packages\smart_open\smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
    187         buffering=buffering,
    188         encoding=encoding,
--> 189         errors=errors,
    190     )
    191     if fobj is not None:

~\anaconda3\lib\site-packages\smart_open\smart_open_lib.py in _shortcut_open(uri, mode, ignore_ext, buffering, encoding, errors)
    360         open_kwargs['errors'] = errors
    361 
--> 362     return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
    363 
    364 

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\BenWo\\OneDrive\\Documents\\Python Scripts\\AnswersDocLookup\\LDAModel'

Ben Reaves

unread,
Jun 2, 2020, 5:00:11 PM6/2/20
to gen...@googlegroups.com
Is that file open in another program? Maybe another instance of your program is using (or holding open) that file? 

This is the same error I get if I'm trying to write an xlsx (Excel) file while another version of Excel is viewing it - just viewing.

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/d45a1a01-0516-416b-b2a5-cd6ad099f0fb%40googlegroups.com.


--
_____________________________________________________________________
Ben Reaves

--

Gordon Mohr

unread,
Jun 2, 2020, 6:12:12 PM6/2/20
to Gensim
This seems likely to be a Windows-specific error related to the Microsoft 'OneDrive' cloud-storage software. Try loading/saving somewhere that's not 'OneDrive'.

- Gordon

Ben Wolfley

unread,
Jun 2, 2020, 7:07:34 PM6/2/20
to Gensim
Hi Ben,

I don't think I have this open in any other file. The directory was just pointing to a folder. I even restarted my computer and was still getting the same error.

However, I just realized that I may be using the save and load methods incorrectly. I used lda_model.save(LDADir), where LDADir is my desired directory. Is this the right way to do this?
To unsubscribe from this group and stop receiving emails from it, send an email to gen...@googlegroups.com.

Ben Wolfley

unread,
Jun 2, 2020, 7:09:34 PM6/2/20
to Gensim
Hi Gordon,

I tried saving it locally and got the same error:

I'm wondering if I am using the save method incorrectly. I used lda_model.save(LDADir) where LDADir is a directory where I want to save the model. Is this the right way to this?

Here is the error from the local script:

TypeError                                 Traceback (most recent call last)
~\anaconda3\lib\site-packages\gensim\utils.py in save(self, fname_or_handle, separately, sep_limit, ignore, pickle_protocol)
    691         try:
--> 692             _pickle.dump(self, fname_or_handle, protocol=pickle_protocol)
    693             logger.info("saved %s object", self.__class__.__name__)

TypeError: file must have a 'write' attribute

During handling of the above exception, another exception occurred:

PermissionError                           Traceback (most recent call last)
<ipython-input-5-b130cc769f15> in <module>
      1 lda_reload = gensim.models.LdaModel.load(temp_file)
      2 # NewDir = os.getcwd()
----> 3 lda_reload.save(LDADir)

~\anaconda3\lib\site-packages\gensim\models\ldamodel.py in save(self, fname, ignore, separately, *args, **kwargs)
   1603         else:
   1604             separately = separately_explicit
-> 1605         super(LdaModel, self).save(fname, ignore=ignore, separately=separately, *args, **kwargs)
   1606 
   1607     @classmethod

~\anaconda3\lib\site-packages\gensim\utils.py in save(self, fname_or_handle, separately, sep_limit, ignore, pickle_protocol)
    693             logger.info("saved %s object", self.__class__.__name__)
    694         except TypeError:  # `fname_or_handle` does not have write attribute
--> 695             self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
    696 
    697 

~\anaconda3\lib\site-packages\gensim\utils.py in _smart_save(self, fname, separately, sep_limit, ignore, pickle_protocol)
    547                                        compress, subname)
    548         try:
--> 549             pickle(self, fname, protocol=pickle_protocol)
    550         finally:
    551             # restore attribs handled specially

~\anaconda3\lib\site-packages\gensim\utils.py in pickle(obj, fname, protocol)
   1361 
   1362     """
-> 1363     with open(fname, 'wb') as fout:  # 'b' for binary, needed on Windows
   1364         _pickle.dump(obj, fout, protocol=protocol)
   1365 

~\anaconda3\lib\site-packages\smart_open\smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
    187         buffering=buffering,
    188         encoding=encoding,
--> 189         errors=errors,
    190     )
    191     if fobj is not None:

~\anaconda3\lib\site-packages\smart_open\smart_open_lib.py in _shortcut_open(uri, mode, ignore_ext, buffering, encoding, errors)
    360         open_kwargs['errors'] = errors
    361 
--> 362     return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
    363 
    364 

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\BenWo\\Documents\\Python Files\\AnswersDocLookup\\LDAModel'

Ben Reaves

unread,
Jun 2, 2020, 9:08:58 PM6/2/20
to gen...@googlegroups.com
> lda_model.save(LDADir), where LDADir is my desired directory. Is this the right way to do this?

I use 
ldamodel.save('filename.gensim')

I think it should be a filename, not a directory name.
 
Sometimes I use ldamodel.save('dir/filename.gensim') but dir must exist before I save it.



To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/8789ad4d-65f2-448a-b8a7-ef968a3b66f3%40googlegroups.com.

Gordon Mohr

unread,
Jun 2, 2020, 9:19:09 PM6/2/20
to Gensim
Yes, it needs to be a writeable filename, in a directory that already exists – not a directory name. 

- Gordon
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+unsubscribe@googlegroups.com.

Radim Řehůřek

unread,
Jun 3, 2020, 6:26:02 AM6/3/20
to Gensim
Yeah. That's why I'm -1 on obfuscating the usage examples with "testing logic": instead of a clear, unambiguous file path, our docs show some from gensim.test.utils import datapath shenanigans.

The variable is called temp_file and fname so it's not that bad – it's pretty obvious save() expects a file name. But still an extra cognition step to fail inexperienced / inattentive developers.

-rr

Ben Wolfley

unread,
Jun 3, 2020, 10:46:25 AM6/3/20
to gen...@googlegroups.com
That worked. Thank you all for your help.


-rr

To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.


--
_____________________________________________________________________
Ben Reaves

--

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/c83fff5f-a260-4419-bb25-8b22eade21ff%40googlegroups.com.


--
Ben Wolfley 
simulation engineer
 
ben.w...@flexsim.com
tel:801‑224‑6914  ext 118
FlexSim - problem solved. in > f t
FlexSim Software Products, Inc. | 1577 N Technology Way | Building A, Suite 2300 | Orem, UT 84097 | fax: 801‑224‑6984 | 

Reply all
Reply to author
Forward
0 new messages