I am a big Theano fan, although somewhat new to it.
I am experiencing slow GPU performance for a Theano program. To be more specific,
it is about as fast as with CPU. Because I have other programs
that get huge performance speed-ups relative to CPU, I know it is not an installation issue.
If Theano was paralelizing the operations over the independent samples
in a batch, then I should see increasing speed as I increase the batch size.
Using nvidia-smi, I checked that the GPU is not over-loaded.
I suspect it has to do with using of the following functions:
and there are used inside 'scan' loops. Generally, I scan over the samples in a batch,
(i.e. the first index of the 'sequences' tensors are the samples in a batch. ).
I also use tensor.jacobian inside a scan loop.
Without getting into details of the program, is there any obvious reason that
Theano does not seem to be paralellizing my code over the GPU?
Many thanks in advance,