Error on Save of Large Model: OSError: [Errno 22] Invalid Argument (OSX, Python 3)

1,695 views
Skip to first unread message

Waylon Flinn

unread,
Aug 27, 2014, 3:31:47 PM8/27/14
to gen...@googlegroups.com
On save of a large LDA model I get the following error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-52-0bf1242ddb5a> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', "board_lda.save(data_dir + 'board_corpus_tokenized_tfidf_400.lda')")

/usr/local/lib/python3.4/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2160             magic_arg_s = self.var_expand(line, stack_depth)
   2161             with self.builtin_trap:
-> 2162                 result = fn(magic_arg_s, cell)
   2163             return result
   2164 

/usr/local/lib/python3.4/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)

/usr/local/lib/python3.4/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/usr/local/lib/python3.4/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
   1123         if mode=='eval':
   1124             st = clock2()
-> 1125             out = eval(code, glob, local_ns)
   1126             end = clock2()
   1127         else:

<timed eval> in <module>()

/usr/local/lib/python3.4/site-packages/gensim/interfaces.py in save(self, *args, **kwargs)
     60         warnings.warn("corpus.save() stores only the (tiny) iteration object; "
     61             "to serialize the actual corpus content, use e.g. MmCorpus.serialize(corpus)")
---> 62         super(CorpusABC, self).save(*args, **kwargs)
     63 
     64     def __len__(self):

/usr/local/lib/python3.4/site-packages/gensim/utils.py in save(self, fname, separately, sep_limit, ignore)
    286             self.__dict__['__scipys'] = scipys
    287             self.__dict__['__ignoreds'] = ignoreds
--> 288             pickle(self, fname)
    289         finally:
    290             # restore the attributes

/usr/local/lib/python3.4/site-packages/gensim/utils.py in pickle(obj, fname, protocol)
    665     """Pickle object `obj` to file `fname`."""
    666     with smart_open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
--> 667         _pickle.dump(obj, fout, protocol=protocol)
    668 
    669 

OSError: [Errno 22] Invalid argument



This appears to be related to this issue with numpy on OSX using python 3: https://github.com/numpy/numpy/issues/3858

It is suggested in the discussion on github that the problem is at the OS level and upgrading to Mavericks fixes the problem. Some one says later that this doesn't address the issue.

The problem appears to be that very large writes fail in python 3 on OSX. Chunking is suggested as a possible workaround. Does anyone have a quick-fix at the gensim level in the event that upgrading the OS does not fix the issue? 


OS: OS X 10.8.5
python: '3.4.1 (default, May 19 2014, 13:08:55) \n[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]'
numpy: '1.8.2'
gensim: '0.10.1'

Radim Řehůřek

unread,
Aug 28, 2014, 2:15:49 PM8/28/14
to gen...@googlegroups.com
Hello Waylon,

oh wow, Python 3 failing to write large files (on OS X)... fun stuff!

And thanks for the report, I'm sure others who use py3k will hit this sooner or later too :-)

I don't think it's gensim's place to be fixing around this, but locally, in your install, maybe you can change the pickle to use a different protocol? For example, use 0 (no compression), instead of the default -1, in gensim.utils.pickle.

I have no idea if that'd help, or what the real reason for this OS/Python clusterfuck is, sorry.

Radim
Reply all
Reply to author
Forward
0 new messages