Performance regression in c4a975e14f7c (4634) due to GpuFromHost

17 views
Skip to first unread message

Josh Bleecher Snyder

unread,
Nov 18, 2010, 6:29:51 PM11/18/10
to thean...@googlegroups.com
Hi all,

I noticed today that my (personal) unit tests were running 15% slower
than before. With some bisection, I narrowed it down to commit
c4a975e14f7c. As of that commit, my functions execute 15% slower --
although *building* the functions happens nearly twice as fast as
before, which is nice.

I did a before-and-after profile mode run. Looking at my topmost
time-consuming ops (according to the "Single Op-wise summary"), two
changes popped out:

theano.sandbox.cuda.basic_ops.GpuElemwise got a little faster, running
at about 90% of the prior speed
theano.sandbox.cuda.basic_ops.GpuFromHost got WAY slower, running at
over 580% of the prior speed (i.e. taking 5.8x as long as before)

This increased my time spent transferring data between host and device
from 3.4% to 16.1%.

It's not yet obvious to me from the diff why this happened, so I'd
love it if anyone has any insight into it they could share...

-josh

Pascal Lamblin

unread,
Nov 19, 2010, 9:24:36 AM11/19/10
to thean...@googlegroups.com
On Thu, Nov 18, 2010, Josh Bleecher Snyder wrote:
> It's not yet obvious to me from the diff why this happened, so I'd
> love it if anyone has any insight into it they could share...

Hum, so I'm the one responsible for that commit...

I suppose the newly added checks when assigning a new value to a
Container is responsible for the slowdown. Maybe some checks are
duplicated, or executed but not needed. I'll have a look, and try to
reproduce your problem.

Thanks for reporting it,
--
Pascal

Josh Bleecher Snyder

unread,
Nov 20, 2010, 5:23:48 PM11/20/10
to thean...@googlegroups.com
>> It's not yet obvious to me from the diff why this happened, so I'd
>> love it if anyone has any insight into it they could share...
>
> Hum, so I'm the one responsible for that commit...
>
> I suppose the newly added checks when assigning a new value to a
> Container is responsible for the slowdown. Maybe some checks are
> duplicated, or executed but not needed. I'll have a look, and try to
> reproduce your problem.

I just took another look, and it looks like the ops haven't slowed
down, but rather multiplied. I should have posted the full lines
before; my apologies. Here they are:

Before:
3.1% 91.7% 5.226s 154.162s 3.08e-04s 16982 1 <class
'theano.sandbox.cuda.basic_ops.GpuFromHost'>

After:
15.9% 72.4% 30.541s 139.177s 2.52e-04s 121252 1 <class
'theano.sandbox.cuda.basic_ops.GpuFromHost'>

Note the number of calls skyrockets by 7x, with time per call actually
decreasing.

I'm working on extracting/finding a reproducing test case for this now...

-josh

Josh Bleecher Snyder

unread,
Nov 20, 2010, 5:45:09 PM11/20/10
to thean...@googlegroups.com


Ok, reproducing this is really easy. Grab the deep learning tutorials,
and run logistic_sgd.py. I get the following results...


BEFORE changeset c4a975e14f7c:

[snip]
The code for file logistic_sgd.py ran for 6.7s
[snip]


AFTER (using current hg tip):

[snip]
The code for file logistic_sgd.py ran for 249.5s
[snip]


Profile mode results...


BEFORE:

https://gist.github.com/708238

AFTER:

https://gist.github.com/708242


It looks like this commit might have broken theano.shared, causing
data to get re-copied to the gpu each time it gets used.


-josh

Pascal Lamblin

unread,
Nov 20, 2010, 8:03:25 PM11/20/10
to thean...@googlegroups.com
On Sat, Nov 20, 2010, Josh Bleecher Snyder wrote:
> Ok, reproducing this is really easy. Grab the deep learning tutorials,
> and run logistic_sgd.py. I get the following results...
>
> It looks like this commit might have broken theano.shared, causing
> data to get re-copied to the gpu each time it gets used.

Oops. I haven't had the time until now, but I'm looking into it.
Thanks for your help,
--
Pascal

Pascal Lamblin

unread,
Nov 20, 2010, 9:21:37 PM11/20/10
to thean...@googlegroups.com
On Sun, Nov 21, 2010, Pascal Lamblin wrote:
> > It looks like this commit might have broken theano.shared, causing
> > data to get re-copied to the gpu each time it gets used.
>
> Oops. I haven't had the time until now, but I'm looking into it.
> Thanks for your help,

It should be fix with
http://trac-hg.assembla.com/theano/changeset/4702%3Aa376fe18147d. The
logistic_sgd.py sample is working just as before, and I'll push a new
test to check if shared() breaks again.i

Can you confirm that it's working again with your test too?

Thanks,
--
Pascal

Josh Bleecher Snyder

unread,
Nov 20, 2010, 11:54:27 PM11/20/10
to thean...@googlegroups.com


Fixed for my test as well. Awesome. Thanks!

Josh

Frédéric Bastien

unread,
Nov 22, 2010, 12:49:43 PM11/22/10
to thean...@googlegroups.com
As I said on the ticket Josh created, their is already code for that
in the Deep Learning Tutorial that print this in the buildbot output.
We should reformat that to put it in the title.

Fred

Reply all
Reply to author
Forward
0 new messages