2015-09-11 16:53:00,751 : INFO : training DTM with args --ntopics=100 --model=dtm --mode=fit --initialize_lda=false --corpus_prefix=/Users/jurica/Documents/workspace/eclipse/TopicModeling/dtmPrefix/train --outname=/Users/jurica/Documents/workspace/eclipse/TopicModeling/dtmPrefix/train_out --alpha=0.01 --lda_max_em_iter=10 --lda_sequence_min_iter=6 --lda_sequence_max_iter=20 --top_chain_var=0.005 --rng_seed=0
Traceback (most recent call last):
File "/Users/jurica/Documents/workspace/eclipse/TopicModeling/topicModelingExample.py", line 158, in <module>
dtm = DtmModel('binaryFiles/dtm-darwin64', corpus=mm, id2word=id2word, time_slices=time_slices, prefix='/Users/jurica/Documents/workspace/eclipse/TopicModeling/dtmPrefix/')
File "/Users/jurica/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/gensim/models/wrappers/dtmmodel.py", line 121, in __init__
self.train(corpus, time_slices, mode, model)
File "/Users/jurica/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/gensim/models/wrappers/dtmmodel.py", line 197, in train
self.em_steps = np.loadtxt(self.fem_steps())
File "/Users/jurica/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/lib/npyio.py", line 738, in loadtxt
fh = iter(open(fname, 'U'))
IOError: [Errno 2] No such file or directory: '/Users/jurica/Documents/workspace/eclipse/TopicModeling/dtmPrefix/train_out/em_log.dat'
So, basically, it says the file does not exist, which is true. I don't get why it doesn't exist... The dtmmodel.py file, which represents the implementation of DTM, doesn't offer any mechanism where that file would be created and stored to the location defined via prefix argument of DtmModel class (btw. prefix works with absolute paths and not relative ones! that took two hours of my table).
So the line that causes the issue is 197 in dtmmodel.py:
self.em_steps = np.loadtxt(self.fem_steps())
and it tries to open a file defined via the self.fem_steps() parameter (in my case /Users/jurica/Documents/workspace/eclipse/TopicModeling/dtmPrefix/train_out/em_log.dat). The only other time self.fem_steps() is mentioned is when the path is defined (line 141 in dtmmodel.py) but it is not used in any of the save file actions implemented in the dtmmodel.py...
The code I am using is very simple:
id2word = gensim.corpora.Dictionary.load_from_text('topcModeling/wikipages_tfidf.mm_wordids.txt')
mm = gensim.corpora.MmCorpus('topcModeling/wikipages_tfidf.mm_tfidf.mm')
time_slices = [20000,20000,20000,20000,20000,20000,20000,20000,20000,6557]
prefix = os.path.dirname(os.path.abspath(__file__)) + '/dtmPrefix/'
#print prefix
if not os.path.exists('topcModeling/DTM_WikiDump.model'):
dtm = DtmModel('binaryFiles/dtm-darwin64', corpus=mm, id2word=id2word, time_slices=time_slices, prefix=prefix)
dtm.save('topcModeling/DTM_WikiDump.model')
else:
dtm = DtmModel.load('topcModeling/DTM_WikiDump.model')
doc_dtm = dtm[doc_bow]
doc_dtm.print_topics(20)
So, basically, it is trying to open a file that does not exist and is never created in the code. Any suggestions, ideas, other angles would be welcome.
Best,
Jurica
1) Thanks for pointing out the renaming error in the tutorial. Now corrected
2) Does your dtm_path point to the binary or to the folder? I had this error when pointing to the folder. Expected value is exactly the dtm binary.
3) About "init_from_lda=False" mode. When I tried running the binary directly with "init_from_lda=False", I got "Error opening file /tmp/a65419_train_out/initial-lda-ss.dat. Failing." It is because DTM is trying load an existing LDA model from that file. When the file is provided it runs correctly.The same error happens when the command is run from gensim.
In PR #476 I added error forwarding via (backported) subprocess.check_output to DTM and Mallet wrappers so this will be more obvious in the future.