--
You received this message because you are subscribed to the Google Groups "Swivel Embeddings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/a2397971-6446-48f5-84f4-a404fc2af4ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hello, and sorry for the delayed response!If I understand correctly, you're running Swivel in the models repo on a single machine with 8 GPUs? If so, that's impressive! There was recently a change to support multiple GPUs, and it's possible that the hogwild parameter updates are causing problems.I've been running a version of Swivel that makes use of the distributed facilities of Tensorflow (e.g., tf.Supervisor and Supervisor.managed_session). This allows the gradient updates being coordinated through the parameter server, and may ameliorate the problem.I'll see about getting that version pushed into the repository.chris
On Wed, Mar 15, 2017 at 11:03 PM, 이정규 <swea...@gmail.com> wrote:
Hello
I'm running the Swivel algorithm on my data. However, in single gpu, learning can be done without any problem, but when swivel is executed with 8-gpus, loss and weight become larger during the learning process, and learning can not be done properly.At first I posted on the issue of `tensorflow/model` respository, but I was guided to come here. This is the issue that I posted on which I described the symptoms in detail.
--
You received this message because you are subscribed to the Google Groups "Swivel Embeddings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddi...@googlegroups.com.
To post to this group, send email to swivel-e...@googlegroups.com.
Thank you very much for your reply.As you said, I'm working on a single machine with 8 gpus.I also speculated that, at the beginning of the learning, there was a lot of parameter updates with a high learning rate, which would cause problems.If we reduce the learning_rate, the phenomenon disappears and we are working on it (of course, the convergence rate is slower).Does "gradient updates being coordinated" mean an alternative least square update?
I am very much looking forward to the repository push you mentioned.
--
You received this message because you are subscribed to the Google Groups "Swivel Embeddings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/CANYn0-rfJSt5G5UymLKmEuL7RZF7y8VJBdws%3DKp5PRXY1ZwE_w%40mail.gmail.com.