I am running through a simple LSA tutorial and have run into a problem with Coherence model, namely when I compute_coherence_values using multiprocessing.
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
Traceback (most recent call last):
File "/Users/J/Documents/codesource/python/projects/LSA/untitled0.py", line 158, in plot_graph
model_list, coherence_values = compute_coherence_values(dictionary, doc_term_matrix, doc_clean, stop, start, step)
File "/Users/J/Documents/codesource/python/projects/LSA/untitled0.py", line 149, in compute_coherence_values
coherence_values.append(coherencemodel.get_coherence())
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/models/coherencemodel.py", line 609, in get_coherence
confirmed_measures = self.get_coherence_per_topic()
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/models/coherencemodel.py", line 569, in get_coherence_per_topic
self.estimate_probabilities(segmented_topics)
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/models/coherencemodel.py", line 541, in estimate_probabilities
self._accumulator = self.measure.prob(**kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/topic_coherence/probability_estimation.py", line 156, in p_boolean_sliding_window
return accumulator.accumulate(texts, window_size)
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/topic_coherence/text_analysis.py", line 452, in accumulate
accumulators = self.terminate_workers(input_q, output_q, workers, interrupted)
File "/opt/anaconda3/lib/python3.7/site-packages/gensim/topic_coherence/text_analysis.py", line 525, in terminate_workers
input_q.put(None, block=True)
File "/opt/anaconda3/lib/python3.7/multiprocessing/queues.py", line 82, in put
if not self._sem.acquire(block, timeout):
My CoherenceModel is defined as coherencemodel = CoherenceModel(model=model, texts=doc_clean, dictionary=dictionary, coherence='c_v', processes = 5) / notice that there are 5 uncaught exceptions above, I guess one for each worker.
When I use 'processes = 1' (no multiprocessing) then the issue goes away.