Slow GPU performance with scan, slinalg, nlinalg

Skip to first unread message

Paul Baggenstoss

Nov 18, 2019, 3:41:01 PM11/18/19
to theano-dev
   I am a big Theano fan, although somewhat new to it.
 I am experiencing slow GPU performance for a Theano program. To be more specific,
it is about as fast as with CPU.   Because I have other programs
that get huge performance speed-ups relative to CPU, I know it is not an installation issue.
If Theano was paralelizing the operations over the independent samples
in a batch, then I should see increasing speed as I increase the batch size.
Using nvidia-smi, I checked that the GPU is not over-loaded.
I suspect it has to do with using  of the following functions:


and there are used inside 'scan' loops.  Generally, I scan over the samples in a batch,
(i.e. the first index of the 'sequences' tensors are the samples in a batch. ).
I also use tensor.jacobian inside a scan loop.
Without getting into details of the program, is there any obvious reason that
Theano does not seem to be paralellizing my code over the GPU?

Many thanks in advance,


Paul Baggenstoss

Nov 19, 2019, 10:02:18 AM11/19/19
to theano-dev

I found the problem - My installation of scikit-cuda failed.
I needed to re-install using conda
Reply all
Reply to author
0 new messages