You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gen...@googlegroups.com
Does anyone have pointers on how to use gensim on distributed mode without relying on broadcast?
My current setup doesn't let me do broadcast and I can only get 2 workers on my machine. Starting more workers on other machines don't seem to help.
Thanks
Arvind Kalyan
unread,
May 20, 2012, 2:29:35 AM5/20/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gen...@googlegroups.com
On Sat, May 19, 2012 at 11:21 PM, Arvind Kalyan <bas...@gmail.com> wrote:
Does anyone have pointers on how to use gensim on distributed mode without relying on broadcast?
My current setup doesn't let me do broadcast and I can only get 2 workers on my machine. Starting more workers on other machines don't seem to help.
As in, those workers are not recognized when I launch my job. It only identifies (atmost) 2 workers on the local machine.
Radim Řehůřek
unread,
May 20, 2012, 5:14:53 PM5/20/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gensim
Hello Arvind,
hmm, that is not possible at the moment. The distributed code relies
on Pyro nameserver, and the nameserver is located via broadcasting.
But changing that shouldn't be difficult -- in `gensim.utils.getNS()`,
replace `Pyro4.locateNS()` by a direct ip address lookup =
`Pyro4.locateNS(host, port)`. I think that should be enough (but
haven't tested).
Let me know how that went,
Radim
Arvind Kalyan
unread,
May 20, 2012, 5:53:07 PM5/20/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gen...@googlegroups.com
Hi Radim,
Thanks for your response. I did exactly that earlier today but I didn't see any difference in behavior even after hardcoding my host/port params.
I was happy with the results I obtained on smaller sample datasets we have running on 2 workers; great work there! But to benchmark against our current implementations we need this to scale for a really large dataset and run on servers that are not necessarily on the same broadcast network and so forth. I might probably revisit sometime later - hopefully gensim and/or pyro would have evolved a little bit more and be more deterministic.
Thanks and best regards,
Arvind
Radim Řehůřek
unread,
May 23, 2012, 1:12:05 PM5/23/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gensim
Hello Arvind,
> I was happy with the results I obtained on smaller sample datasets we have
> running on 2 workers; great work there! But to benchmark against our
> current implementations we need this to scale for a really large dataset
> and run on servers that are not necessarily on the same broadcast network
> and so forth. I might probably revisit sometime later - hopefully gensim
> and/or pyro would have evolved a little bit more and be more deterministic.
Extending the cluster discovery beyond a broadcast domain should be
straightforward (though apparently not as straightforward as I thought
above!), so I might get to it soon.