I noticed today that my (personal) unit tests were running 15% slower
than before. With some bisection, I narrowed it down to commit
c4a975e14f7c. As of that commit, my functions execute 15% slower --
although *building* the functions happens nearly twice as fast as
before, which is nice.
I did a before-and-after profile mode run. Looking at my topmost
time-consuming ops (according to the "Single Op-wise summary"), two
changes popped out:
theano.sandbox.cuda.basic_ops.GpuElemwise got a little faster, running
at about 90% of the prior speed
theano.sandbox.cuda.basic_ops.GpuFromHost got WAY slower, running at
over 580% of the prior speed (i.e. taking 5.8x as long as before)
This increased my time spent transferring data between host and device
from 3.4% to 16.1%.
It's not yet obvious to me from the diff why this happened, so I'd
love it if anyone has any insight into it they could share...
-josh
Hum, so I'm the one responsible for that commit...
I suppose the newly added checks when assigning a new value to a
Container is responsible for the slowdown. Maybe some checks are
duplicated, or executed but not needed. I'll have a look, and try to
reproduce your problem.
Thanks for reporting it,
--
Pascal
I just took another look, and it looks like the ops haven't slowed
down, but rather multiplied. I should have posted the full lines
before; my apologies. Here they are:
Before:
3.1% 91.7% 5.226s 154.162s 3.08e-04s 16982 1 <class
'theano.sandbox.cuda.basic_ops.GpuFromHost'>
After:
15.9% 72.4% 30.541s 139.177s 2.52e-04s 121252 1 <class
'theano.sandbox.cuda.basic_ops.GpuFromHost'>
Note the number of calls skyrockets by 7x, with time per call actually
decreasing.
I'm working on extracting/finding a reproducing test case for this now...
-josh
Ok, reproducing this is really easy. Grab the deep learning tutorials,
and run logistic_sgd.py. I get the following results...
BEFORE changeset c4a975e14f7c:
[snip]
The code for file logistic_sgd.py ran for 6.7s
[snip]
AFTER (using current hg tip):
[snip]
The code for file logistic_sgd.py ran for 249.5s
[snip]
Profile mode results...
BEFORE:
https://gist.github.com/708238
AFTER:
https://gist.github.com/708242
It looks like this commit might have broken theano.shared, causing
data to get re-copied to the gpu each time it gets used.
-josh
Oops. I haven't had the time until now, but I'm looking into it.
Thanks for your help,
--
Pascal
It should be fix with
http://trac-hg.assembla.com/theano/changeset/4702%3Aa376fe18147d. The
logistic_sgd.py sample is working just as before, and I'll push a new
test to check if shared() breaks again.i
Can you confirm that it's working again with your test too?
Thanks,
--
Pascal
Fixed for my test as well. Awesome. Thanks!
Josh
Fred