Why is this GpuFromHost call generated?

14 views
Skip to first unread message

Haining Yu

unread,
Jul 31, 2017, 2:42:25 PM7/31/17
to theano-users
Hi,

I am running a RNN/GRU model for a fairly large dataset with the goal of sequence prediction. When I profile my code, I found one GpuFromHost takes ~30% of computation time. See part of profiling results below:

<% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name>  
  30.2%    73.0%     462.776s       3.71e-01s   1248   221                     GpuFromHost(Subtensor{:int64:}.0)
    input 0: dtype=float32, shape=(512, 1024, 2048), strides=(-4096, 4, 2097152) 
    output 0: dtype=float32, shape=(512, 1024, 2048), strides=(2097152, 2048, 1) 

theano.printing.debugprint shows that the call is generated in gradient calculation; see snippet below. There is also a HostFromGpu a couple of layers below.

 | | | | |GpuFromHost [id FN] ''   221
 | | | |   |Subtensor{:int64:} [id FO] ''   220
 | | | |     |Subtensor{::int64} [id FP] ''   219
 | | | |     | |InplaceDimShuffle{1,2,0} [id FQ] ''   218
 | | | |     | | |Reshape{3} [id FR] ''   217
 | | | |     | |   |CrossentropyCategorical1HotGrad [id FS] ''   216
 | | | |     | |   | |Elemwise{Second}[(0, 0)] [id FT] ''   215
 | | | |     | |   | | |CrossentropyCategorical1Hot [id FU] ''   209
 | | | |     | |   | | | |HostFromGpu [id FV] ''   206

I have heard about the cost of using GpuFromHost (and its counterpart HostFromGpu) and had moved almost all data to GPU (via shared variables). So I don't understand why the call is needed. In particular I don't understand:

1. If all my data are on GPU and theano is optimized for GPU, why is the GpuFromHost even generated?
2. Is the call generated because the memory is too large? The call tries to move 512 x 1024 x 2048 x 4 = 4.2GB memory. But my Tesla K80 should have 12GB memory thus the need to move seems remote on the surface. Overall memory consumption seems OK under profiling.
3. Does the call have anything to do with CrossentropyCategorical1Hot? I assume CrossentropyCategorical1Hot  has been optimized for GPU. But the code shows that a HostFromGPU is called before CrossentropyCategorical1Hot is applied. I am not sure if CrossentropyCategorical1Hot has any memory requirement (e.g., c-contiguous).
4. Should I try any GPU assertion to debug the root cause of the problem?

Any hint is appreciated.

Thank you,
Haining 

Frédéric Bastien

unread,
Aug 9, 2017, 2:37:20 PM8/9/17
to theano-users

My guess is that you use the old GPU backend. Can you confirm you use the Theano flag device=gpu? And that also you have float64 in the graph. The old backend don't support them. I suggest that you install the just released 0.10 beta and that you use the new backend with device=cuda.

Also,you can use the flag warn_float64=pdb to find where you have them and make sore they are float32. This will be faster.

Fred


--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Haining Yu

unread,
Aug 9, 2017, 2:48:35 PM8/9/17
to theano...@googlegroups.com
Thank you Fred.

Yes I am using device=gpu0. I will switch to the new backend and test again.

On float64, do you mean int64? If yes, am puzzled by that too. In my code I never explicit cast to int64. Instead I use tensor.ivector() to index matrices and cast them explicitly into int32. For example:

x = T.ivector()

z = T.cast(y, dtype='int32')

Do you think these things cause the problem?

Thank you,
Haining

Haining Yu on Gmail

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/CjR0L_KroOU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users+unsubscribe@googlegroups.com.

Frédéric Bastien

unread,
Aug 9, 2017, 5:38:26 PM8/9/17
to theano...@googlegroups.com
Hi,

do you use float? I was meaning float32. The old back-end only suport float32. So if you use float64 or int32, nothing will compute on the GPU.

The new back-end support many dtypes including float64 and int*. So it should work better.

Note, if you do operation between float32 and int32, the result is float64. This is the normal c/numpy casting rules. float32 and int16 return float32. So if you end up with float64, it is frequently that case.
Fred

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/CjR0L_KroOU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.

Haining Yu

unread,
Aug 10, 2017, 9:00:52 AM8/10/17
to theano-users
I don't see any float64 in debugprint result. 

Inspecting the code, I am just using floatX e.g. 
self.x = theano.shared(name='gx', value=x1.astype(theano.config.floatX))

I did use int32 to cast various indices. In profiling it seems to be converted into int64.

Will make all the changes based on your suggestion and test one more time. 

Thanks again.
Reply all
Reply to author
Forward
0 new messages