Slow GPU performance with scan, slinalg, nlinalg

13 views
Skip to first unread message

Paul Baggenstoss

unread,
Nov 18, 2019, 3:41:01 PM11/18/19
to theano-dev
Hello,
   I am a big Theano fan, although somewhat new to it.
 I am experiencing slow GPU performance for a Theano program. To be more specific,
it is about as fast as with CPU.   Because I have other programs
that get huge performance speed-ups relative to CPU, I know it is not an installation issue.
If Theano was paralelizing the operations over the independent samples
in a batch, then I should see increasing speed as I increase the batch size.
Using nvidia-smi, I checked that the GPU is not over-loaded.
I suspect it has to do with using  of the following functions:

   slinalg.Solve
   slinalg.Cholesky
   nlinalg.MatrixInverse
   nlinalg.Det

and there are used inside 'scan' loops.  Generally, I scan over the samples in a batch,
(i.e. the first index of the 'sequences' tensors are the samples in a batch. ).
I also use tensor.jacobian inside a scan loop.
Without getting into details of the program, is there any obvious reason that
Theano does not seem to be paralellizing my code over the GPU?

Many thanks in advance,

Paul

Paul Baggenstoss

unread,
Nov 19, 2019, 10:02:18 AM11/19/19
to theano-dev

I found the problem - My installation of scikit-cuda failed.
I needed to re-install using conda
Reply all
Reply to author
Forward
0 new messages