Is it possible to make swivel.py use multiple GPUs?

23 views
Skip to first unread message

va...@sourced.tech

unread,
Mar 2, 2017, 1:02:03 PM3/2/17
to Swivel Embeddings
Hi!

I am running https://github.com/tensorflow/models/tree/master/swivel on a 2 GPU monster and quite not satisfied with it's performance. There are two major issues:

1. nvidia-smi reports that only one GPU is loaded.
2. The load value fluctuates between 9% and 25%.

From my previous experience with TF, there is not much can be done with the poor resource load in (2). I wonder is it possible to make it use >1 GPUs? Swivel is so parallel in nature, there surely must be some way. At least two pinned sessions as the last resort. What do you think?

Chris Waterson

unread,
Mar 2, 2017, 1:29:29 PM3/2/17
to va...@sourced.tech, Swivel Embeddings
Probably what needs to happen here is to switch to a distributed model that uses a parameter server process and launches two worker processes.  This is much easier to do now than it was when Swivel was first implemented; specifically, we ought to be able to use tf.ReplicaDeviceSetter and tf.train.Supervisor to get the model and training loop going; something like:

with tf.Graph().as_Default():
  with tf.device(tf.ReplicaDeviceSetter(1, merge_devices=True)):
    # Create swivel model; embeddings should be placed on CPU. Matmul, etc. on GPU.
    # ...
    sv = tf.train.Supervisor(
      logdir='/path/to/train/dir', is_chief=(FLAGS.task == 0))

    with sv.managed_session('') as session:
      while not sv.should_stop():
        session.run(train_op)

This probably will require a fairly heavy-handed refactor of swivel.py, so much so that it might be worth just starting from scratch ("swivel-par.py"?) Some additional machinery needs to be involved, too, to make sure that you start parameter server and worker processes correctly.

I've been meaning to get to this, but just haven't made time, sigh...

--
You received this message because you are subscribed to the Google Groups "Swivel Embeddings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swivel-embeddings+unsubscribe@googlegroups.com.
To post to this group, send email to swivel-embeddings@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swivel-embeddings/be769bd7-2543-4f2b-b084-a04ea53bedf9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gmar...@gmail.com

unread,
Mar 2, 2017, 1:50:43 PM3/2/17
to Swivel Embeddings
Chris, thank you very much for the feedback!

I have the will, the time and the resources to fulfill this suggestion :) I used to work on https://velesnet.ml, which contained the parameter server and friends, so not scared.

Just curious, what happened? Does this fall within NDA?

Reply all
Reply to author
Forward
0 new messages