Distributed gensim: Facing problems

237 views
Skip to first unread message

Physics

unread,
Mar 26, 2016, 7:10:05 PM3/26/16
to gensim
Hi all,

I'm trying to use gensim in distributed mode.

I have set up all necessary steps (pyro4 naming server, workers, etc.) but now I'm facing the follwoing errormessage:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The program crashes when it reaches the following line:

lda = models.LdaModel(corpus, id2word=dictionary, num_topics=num_topics_var, chunksize=1, distributed=True)

I'm not sure what I do wrong. It seems to me that someone else is facing this problem as well:

http://stackoverflow.com/questions/35900953/trouble-running-gensim-lda

Best regards,

Physics
Message has been deleted

jhop

unread,
Mar 31, 2016, 5:18:00 PM3/31/16
to gensim
Are you also using python 3 as the user at StackOverflow?

Physics

unread,
Mar 31, 2016, 5:27:33 PM3/31/16
to gensim
Hi,

actually the post at StackOverflow is not mine. I just noticed that the problem described seems to be the same as mine.

I am using python 2.7.10 64-bit, though.

Best regards,
Physics

Physics

unread,
Mar 31, 2016, 5:34:15 PM3/31/16
to gensim
What I was able to find out so far is that in LDA_dispatcher.py by printing the value for distributed in **modelparams equals False, although I passed True as the argument.

Setting distributed as True manually in the dispatcher code makes the errormessage go away, but then I face other problems. Only one worker actually starts... can't tell what I am doing wrong.

Best regards,
Physics

jhop

unread,
Apr 1, 2016, 8:24:23 PM4/1/16
to gensim
Please reply with all of your code -- or a distilled version of it -- leading up to lda = models.LdaModel(corpus, id2word=dictionary, num_topics=num_topics_var, chunksize=1, distributed=True)

Also, given what you just wrote, I suspect that your Pyro4 nameserver has not properly registered your dispatcher and workers, and there can be many reasons why that may have failed to work.

There is some guidance in the gensim tutorial documentation for querying the Pyro4 nameserver to get a list of all components that have registered. Please give that a try.

jhop

unread,
Apr 1, 2016, 10:32:06 PM4/1/16
to gensim
Also, please refer to this link regarding how python handles * and ** arguments:
http://stackoverflow.com/questions/36901/what-does-double-star-and-star-do-for-python-parameters

So, **model_params is not "True" or "False" in an abstract sense -- to consider it that way is a conceptual error. It's meant to represent a dictionary where the keys of the dictionary are parameter names, and the value associated with each key is the value of the parameter.
Reply all
Reply to author
Forward
0 new messages