How to run parts of backprop on CPU?

vic...@victorzhong.com

unread,

Dec 27, 2016, 8:51:57 PM12/27/16

to Discuss

I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

Yaroslav Bulatov

unread,

Dec 27, 2016, 8:57:23 PM12/27/16

to vic...@victorzhong.com, Discuss

if you call `tf.gradients` inside `with tf.device("/cpu:0")` block, it should get placed on CPU

On Tue, Dec 27, 2016 at 5:51 PM, <vic...@victorzhong.com> wrote:

I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a3a6ddc3-6092-4c6a-b9f1-5e5b627585b9%40tensorflow.org.

vic...@victorzhong.com

unread,

Dec 27, 2016, 9:07:09 PM12/27/16

to Discuss, vic...@victorzhong.com

Thanks for the quick response Yaroslav!

Just to confirm:

Normally, I don't explicitly compute gradients, but instead use a tensorflow optimizer (e.g. Adam). However, because I'd like to force part of the gradient computation onto the CPU, I'll have to change how to use the optimizer. Looking at https://www.tensorflow.org/api_docs/python/train/optimizers, it seems that instead of

optimizer_op = optimizer.minimize(loss)

I'll have to do something like

with tf.device('/cpu:0'):
    grads = optimizer.compute_gradients(loss, intermediate_vars)
grads += optimizer.compute_gradients(intermediate_vars, remaining_vars)
optimize_op = optimizer.apply_gradients(grads)

Is this approach more or less what you had in mind?

Happy holidays!

Victor

On Tuesday, 27 December 2016 17:57:23 UTC-8, Yaroslav Bulatov wrote:

if you call `tf.gradients` inside `with tf.device("/cpu:0")` block, it should get placed on CPU

On Tue, Dec 27, 2016 at 5:51 PM, <vic...@victorzhong.com> wrote:

I'm attempting to run a very large model that does not fit into GPU memory. One very memory intensive part of this model is computing the softmax for each time step in a sequence. I've moved this computation to the CPU, however it seems that the backprop for this part of the graph is still on the GPU. Is there a way to force part of the backprop graph onto the CPU as well?

--
You received this message because you are subscribed to the Google Groups "Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Yaroslav Bulatov

unread,

Dec 27, 2016, 10:57:20 PM12/27/16

to vic...@victorzhong.com, Discuss

Or you can just do the whole part where you create optimizer inside with.device

But at that point, it's not clear if GPU is helping at all, you can disable it by doing

import os

os.environ["CUDA_VISIBLE_DEVICES"]=""

import tensorflow

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/81a85028-b495-4c5d-887f-0810e3408e8a%40tensorflow.org.

vic...@victorzhong.com

unread,

Dec 27, 2016, 11:38:43 PM12/27/16

to Discuss, vic...@victorzhong.com

Actually I figured it out - it's much simpler than I anticipated:

optimizer.minimize(loss, colocate_gradients_with_ops=True)

I just need to make sure that the forward pass for the part in question is on CPU and this takes care of the backward pass. The rest of the model remains on GPU.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/81a85028-b495-4c5d-887f-0810e3408e8a%40tensorflow.org.

Sanqiang Zhao

unread,

Dec 13, 2018, 11:46:50 PM12/13/18

to Discuss, vic...@victorzhong.com

Awesome, it makes eligible to make BERT fit my 12G GPU :P

Reply all

Reply to author

Forward