time to compute gradient in reverse mode

59 views
Skip to first unread message

Lewis Fishgold

unread,
Mar 13, 2013, 1:27:29 PM3/13/13
to alg...@googlegroups.com
Hi Sebastian,

I am trying to compute the gradient of a real-valued cost function (for a small neural network) with respect to a 200 element vector using algopy and it takes 162 times as long as evaluating the function, which is much slower than I expected. (I was expecting a small constant factor around 10). To spare you the details of the neural network code, I created a simplified benchmark problem which is below. I've attached a graph of the slowdown factor as a function of number of parameters. For this problem, the factor is around 40 for small problems (with tens of params) and drops down to around 10 when the number of params is near 1000. 

I am running the bleeding edge (as of March 6, 2013) version of algopy, scipy 0.12, and numpy 1.5.1.

Do these results sound about right, or am I using algopy incorrectly? Thanks for your help.

Lewis

import numpy as np
import algopy 
import time
from matplotlib import pyplot as plt

#original function
def fNP(X,W):
    return np.sum(np.sum(np.exp(np.dot(X,W))))

#function modified to use algopy primitives where possible
def f(X,W):
    return algopy.sum(algopy.sum(np.exp(algopy.dot(X,W))))

sizes = range(5,30,2)
gradFactors = []
for size in sizes:
    #make computation graph
    cg = algopy.CGraph()
    X = np.random.normal(0,1,(size,size))
    W = algopy.Function(np.random.normal(0,1,(size,size)))
    out = f(X,W)
    cg.trace_off()
    cg.independentFunctionList = [W]
    cg.dependentFunctionList = [out]
    
    X = np.random.normal(0,1,(size,size))
    W = np.random.normal(0,1,(size,size))
    
    #time computing gradient
    numReps = 100
    start = time.time()
    for i in xrange(numReps):
        g = cg.gradient([W])
    end = time.time()
    gradTime = end-start 
    
    #time original function eval
    start = time.time()
    for i in xrange(numReps):
        eval = fNP(X,W)
    end = time.time()
    evalTime = end-start 
    
    gradFactor = gradTime / evalTime
    gradFactors.append(gradFactor)
    
numParams = [s*s for s in sizes]
plt.plot(numParams,gradFactors)
plt.xlabel("num params")
plt.ylabel("(time for gradient) / (time for eval)")
plt.show()
grad-slowdown.png

Sebastian Walter

unread,
Mar 15, 2013, 5:30:40 AM3/15/13
to alg...@googlegroups.com
Hello Lewis,

For small arrays you measure the overhead of algopy's reverse mode.

I have yet to run a profiling on the code,
but I assume that the main culprits for the overhead are

1) All nodes of the graph have to be initialized to zero. This
operation takes already more time than a normal function evaluation.
2) During the reverse mode the computational graph (in the attachment)
is walked in reverse direction.

This is done in a Python for loop

for i,f in enumerate(self.functionList[::-1]):
try:
f.__class__.pullback(f)
except Exception, e:
err_str = '\npullback of node %d
failed\n\n'%(len(self.functionList) - i - 1)
err_str +='tried to evaluate the pullback of %s(*args)
with\n'%(f.func.__name__)

for narg, arg in enumerate(f.args):
if hasattr(arg, 'x'):
err_str += 'type(arg[%d].x) = \n%s\n'%(narg,
type(arg.x) )
else:
err_str += 'type(arg[%d]) = \n%s\n'%(narg, type(arg) )
if isinstance(arg.x, algopy.UTPM):
err_str += 'arg[%d].x.data.shape =
\n%s\n'%(narg, arg.x.data.shape)
err_str += 'arg[%d].xbar.data.shape =
\n%s\n'%(narg, arg.xbar.data.shape)

err_str += '\n%s'%traceback.format_exc()
raise Exception(err_str)
# print self

As you can see, this is a Python for loop that access Python list elements.
Additionally, the code is evaluated in a try-except statement.

If speed is an issue one could try to optimize this loop.
Most likely one would have to refactor AlgoPy a little,
perhaps also replacing some parts with compiled code.

If you are into neural networks you should also have a look at Theano or Casadi.
They are tuned for speed and may already provide what you need.

cheers,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups
> "algopy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to algopy+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
fishgold.png

Sebastian Walter

unread,
Mar 15, 2013, 6:11:07 AM3/15/13
to alg...@googlegroups.com
Hi again,

I ran some profiling with Robert Kerns line profiler

sudo pip install line_profiler
kernprof.py -l fishgold.py
python -m line_profiler fishgold.py.lprof

It appears that my hunch was incorrect: it's not the for loop and the
list access that uses the most time.

For the profiling I haved set sizes = [1] in your test script.

CGraph.gradient calls CGraph.pullback which needs 78.3% of the time
CGraph.pullback calls Function.pullback which needs 75.9% of the time
Function.pullback spends 26.1% on to find f =
eval('__import__("algopy.utpm").utpm.'+F.x.__class__.__name__+'.pb_'+func_name)
and 37.6% f(*args, **kwargs ) to call the actual function

The line f = eval('__import__("algopy.utpm").utpm.'+F.x.__class__.__name__+'.pb_'+func_name)
is probably easy to get rid of,
which would lead to a 15% speedup.


----------- start profiling output -----------------

Timer unit: 1e-06 s

File: /home/swalter/projects/algopy/algopy/tracer/tracer.py
Function: pullback at line 114
Total time: 0.101605 s

Line # Hits Time Per Hit % Time Line Contents
==============================================================
114 @profile
115 def
pullback(self, xbar_list):
116 """
117 Apply the
pullback of the cotangent element,
118
119 e.g. for::
120
121 y = y(x)
122
123 compute::
124
125 ybar
dy(x) = xbar dx
126
127 """
128
129 100 402 4.0 0.4 import time
130
131 100 140 1.4 0.1 if
len(self.dependentFunctionList) == 0:
132 raise
Exception('You forgot to specify which variables are dependent!\n'\
133
' e.g. with cg.dependentFunctionList = [F1,F2]')
134
135 # initial all
xbar to zero
136 700 871 1.2 0.9 for f in
self.functionList:
137 # print
'f=',f.func.__name__
138 600 17572 29.3 17.3 f.xbar_from_x()
139
140 # print
'before pullback',self
141
142 200 346 1.7 0.3 for nf,f in
enumerate(self.dependentFunctionList):
143 100 97 1.0 0.1 try:
144 100 3369 33.7 3.3
f.xbar[...] = xbar_list[nf]
145 except
Exception, e:
146
err_str = 'tried to initialize the bar value of
cg.dependentFunctionList[%d], but some error occured:\n'%nf
147
err_str += 'the assignment: f.xbar[...] = xbar_list[%d]\n'%(nf)
148
err_str += 'where f.xbar[...].shape =%s and
type(f.xbar)=%s\n'%(str(f.xbar[...].shape), str(type(f.xbar)))
149
err_str += 'and xbar_list[%d].shape =%s and
type(xbar_list[%d])=%s\n'%(nf, str(xbar_list[nf].shape), nf,
str(type(xbar_list[nf])))
150
err_str += 'results in the error\n%s\n'%e
151
err_str += 'traceback:\n%s'%traceback.format_exc()
152
153 raise
Exception(err_str)
154
155 # print self
156 700 1041 1.5 1.0 for i,f in
enumerate(self.functionList[::-1]):
157 600 602 1.0 0.6 try:
158 600 77165 128.6 75.9
f.__class__.pullback(f)
159 except
Exception, e:
160
err_str = '\npullback of node %d failed\n\n'%(len(self.functionList) -
i - 1)
161
err_str +='tried to evaluate the pullback of %s(*args)
with\n'%(f.func.__name__)
162
163 for
narg, arg in enumerate(f.args):
164
if hasattr(arg, 'x'):
165
err_str += 'type(arg[%d].x) = \n%s\n'%(narg, type(arg.x) )
166 else:
167
err_str += 'type(arg[%d]) = \n%s\n'%(narg, type(arg) )
168
if isinstance(arg.x, algopy.UTPM):
169
err_str += 'arg[%d].x.data.shape = \n%s\n'%(narg, arg.x.data.shape)
170
err_str += 'arg[%d].xbar.data.shape = \n%s\n'%(narg,
arg.xbar.data.shape)
171
172
err_str += '\n%s'%traceback.format_exc()
173 raise
Exception(err_str)

File: /home/swalter/projects/algopy/algopy/tracer/tracer.py
Function: gradient at line 188
Total time: 0.13721 s

Line # Hits Time Per Hit % Time Line Contents
==============================================================
188 @profile
189 def gradient(self, x):
190 """ computes
the gradient of a function f: R^N --> R
191
192 g =
gradient(self, x_list)
193
194 Parameters
195 ----------
196
197 x: array_like
or list of array_like
198
199
200 Example 1
201 ---------
202
203 import algopy
204
205 def f(x):
206 return x**2
207
208 cg = algopy.CGraph()
209 x = algopy.Function(3.)
210 y = f(x)
211 cg.trace_off()
212
cg.independentFunctionList = [x]
213
cg.dependentFunctionList = [y]
214 print cg.gradient(7.)
215
216
217 Example 2
218 ---------
219
220 import algopy
221
222 def f(x1, x2):
223 return x1*x2
224
225 cg = algopy.CGraph()
226 x1 =
algopy.Function(3.)
227 x2 =
algopy.Function(5.)
228 y = f(x1, x2)
229 cg.trace_off()
230
cg.independentFunctionList = [x1, x2]
231
cg.dependentFunctionList = [y]
232 print
cg.gradient([1.,2.])
233 """
234
235 100 752 7.5 0.5 if
self.dependentFunctionList[0].ndim != 0:
236 raise
Exception('you are trying to compute the gradient of a non-scalar
valued function')
237
238 100 139 1.4 0.1 if isinstance(x, list):
239 100 117 1.2 0.1 if
isinstance(x[0], numpy.ndarray) or isinstance(x[0], list):
240 100 98 1.0 0.1 x_list = x
241 else:
242
x_list = [numpy.asarray(x)]
243 else:
244 x_list = [x]
245
246 100 103 1.0 0.1 utpm_x_list = []
247 200 223 1.1 0.2 for xi in x_list:
248 100 782 7.8 0.6 element =
numpy.asarray(xi).reshape((1,1) + numpy.shape(xi))
249 100 698 7.0 0.5
utpm_x_list.append(algopy.UTPM(element))
250
251 100 24146 241.5 17.6
self.pushforward(utpm_x_list)
252
253 100 1552 15.5 1.1 ybar =
self.dependentFunctionList[0].x.zeros_like()
254 100 439 4.4 0.3 ybar.data[0,:] = 1.
255 100 107504 1075.0 78.3 self.pullback([ybar])
256
257 100 127 1.3 0.1 if isinstance(x, list):
258 200 530 2.6 0.4 return
[x.xbar.data[0,0] for x in self.independentFunctionList]
259 else:
260 return
self.independentFunctionList[0].xbar.data[0,0]

File: /home/swalter/projects/algopy/algopy/tracer/tracer.py
Function: pullback at line 826
Total time: 0.052645 s

Line # Hits Time Per Hit % Time Line Contents
==============================================================
826 @classmethod
827 @profile
828 def pullback(cls, F):
829 """
830 compute the
pullback of the Function F
831
832 e.g. if y = f(x)
833 compute xbar as::
834
835 ybar dy =
ybar df(x) = ybar df/dx dx = xbar dx
836
837 More specifically:
838
839 (y1,y2) =
f(x1,x2,const)
840
841 where v is a
constant argument.
842
843 Examples:
844
845 1) (Q,R) = qr(A)
846 2) Q =
getitem(qr(A),0)
847
848 This function
assumes that for each function f there is a corresponding function::
849
850
pb_f(y1bar,y2bar,x1,x2,y1,y2,out=(x1bar, x2bar))
851
852 The Function
F contains information about its arguments, F.y and F.ybar.
853 Thus,
pullback(F) computes F.args[i].xbar
854 """
855
856 600 841 1.4 1.6 func_name =
F.func.__name__
857
858 # STEP 1:
extract arguments
859 600 693 1.2 1.3 args = []
860 600 654 1.1 1.2 argsbar = []
861 1900 2235 1.2 4.2 for a in F.args:
862 1300 1810 1.4 3.4 if
isinstance(a, cls):
863 700 1028 1.5 2.0
args.append(a.x)
864 700 932 1.3 1.8
argsbar.append(a.xbar)
865 else:
866 600 734 1.2 1.4 args.append(a)
867 600 698 1.2 1.3
argsbar.append(None)
868
869 600 1056 1.8 2.0 if
isinstance(F.x,tuple):
870 # case if
the function F has several outputs, e.g. (y1,y2) = F(x)
871 args =
list(F.xbar) + args + list(F.x)
872 f =
eval('__import__("algopy.utpm").utpm.'+F.x[0].__class__.__name__+'.pb_'+func_name)
873
874 600 953 1.6 1.8 elif
type(F.x) == type(None):
875 # case if
the function F has no output, e.g. None = F(x)
876 f =
eval('__import__("algopy.utpm").utpm.'+F.args[0].x.__class__.__name__+'.pb_'+func_name)
877
878 600 2436 4.1 4.6 elif
numpy.isscalar(F.x) or isinstance(F.x, numpy.ndarray):
879 100 129 1.3 0.2 return
lambda x: None
880
881 500 718 1.4 1.4 elif
isinstance(F.x, algopy.UTPM):
882
883 # case if
the function F has output, e.g. y1 = F(x)
884 500 1028 2.1 2.0 args =
[F.xbar] + args + [F.x]
885
886 # print '-------'
887 # print func_name
888 # print F.x
889 # print F.xbar
890
891 # get the
pullback function
892
893 500 13735 27.5 26.1 f =
eval('__import__("algopy.utpm").utpm.'+F.x.__class__.__name__+'.pb_'+func_name)
894
895 elif
func_name == '__getitem__' or func_name == 'getitem':
896 return
lambda x: None
897 # raise
NotImplementedError('should implement that')
898
899 # elif
func_name == '__getitem__':
900 # return
lambda x: None
901 # raise
NotImplementedError('should implement that')
902
903 # STEP 2:
call the pullback function
904 500 1122 2.2 2.1 kwargs =
{'out': list(argsbar)}
905
906 # print
'func_name = ',func_name
907 # print
'calling pullback function f=',f
908 # print 'args = ',args
909 # print
'kwargs = ',kwargs
910
911 500 19797 39.6 37.6 f(*args, **kwargs )
912
913 # STEP 3:
restore buffer values (i.e. if this is the pullback of the setitem
function)
914
915 500 1490 3.0 2.8 if is_set(F.setitem):
916 # print
'restoring value'
917 # print
'F.setitem=', F.setitem
918
F.args[0].x[F.setitem[0]] = F.setitem[1]
919
920 # print
'F.args=',F.args
921
922 500 556 1.1 1.1 return F

----------- end profiling output -----------------


I'm curious: For small problems the total time is small anyway.
Why is performance important to you? How many times do you want to
compute the gradient?


Sebastian

Sebastian Walter

unread,
Mar 15, 2013, 6:12:51 AM3/15/13
to alg...@googlegroups.com
Should better have attached the profiling output as text file.
profiling.txt

Lewis Fishgold

unread,
Mar 15, 2013, 5:49:27 PM3/15/13
to alg...@googlegroups.com
Thanks for looking into this for me. I'm interested in using automatic differentiation for machine learning, which involves computing the gradient of a cost function thousands of times, and often there are only tens of parameters. It may turn out that algopy is still fast enough for what I want to do -- I'll have to try it out. I just wanted to make sure the slowdown wasn't due to incorrect usage. Do you think that pyadolc would be significantly faster for my use case? I'll also take a look at Theano. 

Lewis

Sebastian Walter

unread,
Mar 16, 2013, 7:38:21 AM3/16/13
to alg...@googlegroups.com
On Fri, Mar 15, 2013 at 10:49 PM, Lewis Fishgold <lew...@gmail.com> wrote:
> Thanks for looking into this for me. I'm interested in using automatic
> differentiation for machine learning, which involves computing the gradient
> of a cost function thousands of times, and often there are only tens of
> parameters.

You could try to use the Hessian matrix information and use some
optimization algorithm with super-linear convergence. If things go
well this could significantly reduce the number of gradient
evaluations, say from 200 to 10.

> It may turn out that algopy is still fast enough for what I want
> to do -- I'll have to try it out. I just wanted to make sure the slowdown
> wasn't due to incorrect usage. Do you think that pyadolc would be
> significantly faster for my use case? I'll also take a look at Theano.

Pyadolc is faster than AlgoPy and can be even faster than a pure
Python function call if the function of interest is not vectorized.

E.g. the function

def f(x, y):
for i in range(10000):
y += x[i]*x[i+1]

would probably run faster in Pyadolc than in pure Python since the for
loop and the getitem setitem are rather expensive operations.

If you are really concerned about speed, then you should consider
implementing your cost function in Fortran and then use Tapenade
(http://www-sop.inria.fr/tropics/) to differentiate it.
There is little out there that can compete with compiled and optimized
fortran/c code.
Reply all
Reply to author
Forward
0 new messages