Multi-Threading Error about "main"

424 views
Skip to first unread message

John Vaughan

unread,
Feb 8, 2020, 12:16:03 PM2/8/20
to Gensim
I built a basic .py file  with the below command and It goes into this strange behavior, were instead of multi-threading the building of the model, it starts running the program from the top over and over again..

If I take the same program and use just LdaModel it works fine, but is single threaded.

If I run the same program in Jupyter Notebook the multi-threading works..

It is only when I run the program from the command line..

I do not have a "main" function in my program..  If I did,  what would I put in it?   

It is just confusing to see the multi-threader try to run the entire program from the beginning instead of just this operation?

Is there an example using a main() if that is what I need to do?

lda_model_bagOfWords = gensim.models.LdaMulticore(bow_corpus, num_topics=50id2word=dictionary, passes=2workers=12)


Then I get this error


RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

Gordon Mohr

unread,
Feb 8, 2020, 5:23:46 PM2/8/20
to Gensim
I'm unfamiliar with this error, but some Googling suggests it may be a Windows-specific issue with some ways of running Python code. Are you on Windows? What Python version?

Exactly how are you running your `.py` file? (Is there a chance it's triggering a process which "freezes" things into a single Windows executable separate from your `.py` file, even if only as a temporary interim exe file?) 

Can you craft & share a minimal `.py` file, & command-line to execute it, which is sufficient to trigger the error with just a few lines of code & no outside data/files? 

Or: Let's ignore for a moment the part about a 'main module' (which is essentially just the 'first scope' in which you've started Python code execution). And also let's ignore any concerns about a `main()` function (which isn't explicitly mentioned in the error, or really required by Python, but sometimes used as a convention inside the 'main module' – see some SO answers at <https://stackoverflow.com/questions/419163/what-does-if-name-main-do>) for more related discussion.)

With those put aside, what if you just try to literally do what the error message text suggests: run `freeze_support()` (a function importable from Python `multiprocessing`) before any of your other code runs. Does that prevent the error?

- Gordon

John Vaughan

unread,
Feb 9, 2020, 9:20:10 AM2/9/20
to Gensim
Is is windows 10...  python 3.6  

I did make a small program that just sets up the data to run this...   

I open up an anaconda command line and just do   python  myProgram.py  

I was also researching on the internet, but just don't quite understand how this error can happen in the first place.    


I assumed the error message  "you are not using a fork to start your child process"  was something to do inside the multi-threaded function: gensim.models.LdaMulticore..  

for now, I am just running in single thread mode until I find an answer.

John Vaughan

unread,
Feb 9, 2020, 9:43:40 AM2/9/20
to Gensim
I just read through the post..   https://stackoverflow.com/questions/419163/what-does-if-name-main-do   from stack overflow..

It looks like the if __name___  is a way to hide code from a python instance that is essentially trying to import my file like it is a library of some sort.    As I did not design my .py file as a library, I have my logic kind of strewn all about the file as it runs top to bottom.

Most likely the way the multithreader works it is trying to import my .py file like a library.   Then the way python works, it is just re-executes all the code again.     This off course makes for some type of infinite loop.   Because my code includes a call to a multi-threading app that again tries to run all my code again..

I will try hiding everything within this if statement and see if that works....  

That way if the multi-threader imports my program, nothing will execute....

if __name__ == '__main__':
    functionA()
    functionB()

John Vaughan

unread,
Feb 9, 2020, 10:03:18 AM2/9/20
to Gensim
It worked....

I put all my logic in an If Block...

if __name__ == '__main__':
import bla bla bla
data frame bla bla bla
Reply all
Reply to author
Forward
0 new messages