How to run parts of backprop on CPU?

140 views
Skip to first unread message

vic...@victorzhong.com

unread,
Dec 27, 2016, 8:51:57 PM12/27/16
to Discuss
I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

Yaroslav Bulatov

unread,
Dec 27, 2016, 8:57:23 PM12/27/16
to vic...@victorzhong.com, Discuss
if you call `tf.gradients` inside `with tf.device("/cpu:0")` block, it should get placed on CPU

On Tue, Dec 27, 2016 at 5:51 PM, <vic...@victorzhong.com> wrote:
I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a3a6ddc3-6092-4c6a-b9f1-5e5b627585b9%40tensorflow.org.

vic...@victorzhong.com

unread,
Dec 27, 2016, 9:07:09 PM12/27/16
to Discuss, vic...@victorzhong.com
Thanks for the quick response Yaroslav!

Just to confirm:

Normally, I don't explicitly compute gradients, but instead use a tensorflow optimizer (e.g. Adam). However, because I'd like to force part of the gradient computation onto the CPU, I'll have to change how to use the optimizer. Looking at https://www.tensorflow.org/api_docs/python/train/optimizers, it seems that instead of

optimizer_op = optimizer.minimize(loss)

I'll have to do something like

with tf.device('/cpu:0'):
    grads
= optimizer.compute_gradients(loss, intermediate_vars)
grads
+= optimizer.compute_gradients(intermediate_vars, remaining_vars)
optimize_op
= optimizer.apply_gradients(grads)


Is this approach more or less what you had in mind?

Happy holidays!
Victor


On Tuesday, 27 December 2016 17:57:23 UTC-8, Yaroslav Bulatov wrote:
if you call `tf.gradients` inside `with tf.device("/cpu:0")` block, it should get placed on CPU
On Tue, Dec 27, 2016 at 5:51 PM, <vic...@victorzhong.com> wrote:
I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Yaroslav Bulatov

unread,
Dec 27, 2016, 10:57:20 PM12/27/16
to vic...@victorzhong.com, Discuss
Or you can just do the whole part where you create optimizer inside with.device
But at that point, it's not clear if GPU is helping at all, you can disable it by doing

import os
os.environ["CUDA_VISIBLE_DEVICES"]=""
import tensorflow


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

vic...@victorzhong.com

unread,
Dec 27, 2016, 11:38:43 PM12/27/16
to Discuss, vic...@victorzhong.com
Actually I figured it out - it's much simpler than I anticipated:

optimizer.minimize(loss, colocate_gradients_with_ops=True)

I just need to make sure that the forward pass for the part in question is on CPU and this takes care of the backward pass. The rest of the model remains on GPU.

Sanqiang Zhao

unread,
Dec 13, 2018, 11:46:50 PM12/13/18
to Discuss, vic...@victorzhong.com
Awesome, it makes eligible to make BERT fit my 12G GPU :P
Reply all
Reply to author
Forward
0 new messages