Swap out memory from GPU to CPU

1,635 views
Skip to first unread message

Paul Voigtlaender

unread,
Jan 31, 2017, 4:51:11 AM1/31/17
to Discuss
Hi,

I noticed, that the RNNs in tensorflow have an option to swap out memory from the GPU to the CPU.
Is it also possible to do this for a feed-forward network, e.g. swapping out the memory of the lower conv layers?

Or alternatively, maybe I can split the graph into multiple parts and evaluate part by part? But how can I then backpropagate the gradients from one part into the next one?
(I'm talking about an approach like in https://arxiv.org/pdf/1611.08323.pdf
"we partition the computation graph into several subsequent blocks by manually placing cut points in the graph. We then compute the derivatives for each block individually. To this end, we perform one  (partial) forward pass per block and only store the feature maps for the block whose derivatives". This is done in theano)

I'd prefer the first option, as there the intermediate results don't need to be computed again.

Yaroslav Bulatov

unread,
Jan 31, 2017, 12:00:45 PM1/31/17
to Paul Voigtlaender, Discuss
You could swap them out, but you have to do it using client API (no automatic solution like for dynamic_rnn). IE, using persistent tensors you can move things between GPU/CPU using combination of GetSessionTensor/GetSessionHandle/DeleteSessionTensor.

You could also split graph in multiple parts. You backpropagate the gradients into new section of graph by feeding the backprop results from the previous session.run call. For instance see the example in function_test

It wraps backprop computation for a computation in a single TensorFlow function by using Defun and _symbolic_gradient. You can then feed this function the backprops from upstream graph and it'll produce new backprops

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/6d692076-6ab9-400c-b671-74c11151eb60%40tensorflow.org.

Paul

unread,
Feb 1, 2017, 4:22:35 AM2/1/17
to Discuss
Thank you for your answer.

Is there any documentation available for the first method, i.e. GetSessionTensor/GetSessionHandle/DeleteSessionTensor?


Am Dienstag, 31. Januar 2017 18:00:45 UTC+1 schrieb Yaroslav Bulatov:
You could swap them out, but you have to do it using client API (no automatic solution like for dynamic_rnn). IE, using persistent tensors you can move things between GPU/CPU using combination of GetSessionTensor/GetSessionHandle/DeleteSessionTensor.

You could also split graph in multiple parts. You backpropagate the gradients into new section of graph by feeding the backprop results from the previous session.run call. For instance see the example in function_test

It wraps backprop computation for a computation in a single TensorFlow function by using Defun and _symbolic_gradient. You can then feed this function the backprops from upstream graph and it'll produce new backprops
On Tue, Jan 31, 2017 at 1:51 AM, Paul Voigtlaender <p.voigt...@gmail.com> wrote:
Hi,

I noticed, that the RNNs in tensorflow have an option to swap out memory from the GPU to the CPU.
Is it also possible to do this for a feed-forward network, e.g. swapping out the memory of the lower conv layers?

Or alternatively, maybe I can split the graph into multiple parts and evaluate part by part? But how can I then backpropagate the gradients from one part into the next one?
(I'm talking about an approach like in https://arxiv.org/pdf/1611.08323.pdf
"we partition the computation graph into several subsequent blocks by manually placing cut points in the graph. We then compute the derivatives for each block individually. To this end, we perform one  (partial) forward pass per block and only store the feature maps for the block whose derivatives". This is done in theano)

I'd prefer the first option, as there the intermediate results don't need to be computed again.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Yaroslav Bulatov

unread,
Feb 1, 2017, 10:00:09 AM2/1/17
to Paul, Yuan Yu, Discuss
cc: yuan

This super-useful feature is only sparsely documented at the moment. The only official source of documentation are the tests -- https://github.com/tensorflow/tensorflow/blob/64edd34ce69b4a8033af5d217cb8894105297d8a/tensorflow/python/kernel_tests/session_ops_test.py

Here's an example of saving tensor as persistent tensor:

For moving persistent tensors from CPU to GPU you'll need to have a sequence of ops pinned to devices: tf.get_session_tensor(CPU) -> tf.identity (GPU) -> tf.get_session_handle(GPU)

If you "del" the session handle, the system will automatically call delete_session_tensor on it once there are 10 tensors waiting to be deleted. You may call it manually -- https://github.com/tensorflow/tensorflow/blob/27711108b5fce2e1692f9440631a183b3808fa01/tensorflow/python/ops/session_ops.py#L202

Note that "delete_session_tensor" is hidden (https://github.com/tensorflow/tensorflow/blob/a0d784bdd31b27e013a7eac58a86ba62e86db299/tensorflow/python/ops/hidden_ops.txt), so it's not available from tf. root namespace, and you have to access it from session_ops namespace

Also note that there's something like 200 usec overhead for calling session.run, so you don't want to issue tens of thousands of individual .run calls


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Paul

unread,
Feb 1, 2017, 4:03:20 PM2/1/17
to Discuss
thanks for the links, I will take a look at them.

I just got another idea. Can I maybe just wrap the part of the network, which I want to swap out in a call to tf.map_fn or tf.scan with a sequence of only one element as  an input and it will swap out the memory of this part of the network for me?
Reply all
Reply to author
Forward
0 new messages