I haven't investigated issue #1 much. For issue #2, I know it's size
dependent. It doesn't happen if I shrink either dimension of W at all
(that's why the values in s are so weird). Issue #2 also goes away if
I set f to output updates.values() rather than updating grad.
Any ideas?
Thanks,
Ian
import numpy as np
from pylearn2.utils import sharedX
from theano import function
import theano
import gc
s = [399,219]
before = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
W = sharedX(np.zeros((s[0],s[1])))
gc.collect()
gc.collect()
gc.collect()
after = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
print before[0] - after[0]
print s[0]*s[1]*4
grad = sharedX(np.zeros(W.get_value().shape))
updates = { grad : W}
f = function([], updates = updates())
before = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
f()
gc.collect(); gc.collect(); gc.collect()
after = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
assert after[0] >= before[0]
On Wed, Feb 1, 2012 at 10:34 PM, Ian Goodfellow
#script to demonstrate that theano leaks memory on the gpu
import numpy as np
from pylearn2.utils import sharedX
from theano import function
import theano
import gc
s = [400,8000]
print 'first shared'
before = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
W = sharedX(np.zeros((s[0],s[1])))
gc.collect()
gc.collect()
gc.collect()
after = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
diff = before[0] - after[0]
expected_diff = s[0]*s[1]*4
if diff > expected_diff:
print "W uses ",str(float(diff)/float(expected_diff))," times more
memory than needed."
print "(",str(float(diff-expected_diff)/(1024. ** 2))," megabytes)"
print 'second shared'
grad =sharedX(np.zeros(W.get_value().shape))
gc.collect()
gc.collect()
gc.collect()
after_after = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
diff = after_after[0] - after[0]
if diff > expected_diff:
print "grad uses ",str(float(diff)/float(expected_diff))," times
more memory than needed."
updates = { grad : W}
f = function([], updates = updates)
from theano.printing import debugprint
debugprint(f)
print 'call'
before = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
f()
gc.collect(); gc.collect(); gc.collect()
after = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()
cuda_array = grad.get_value(borrow=True, return_internal_type = True)
addr = cuda_array.gpudata
print 'final storage address: %x' % addr
assert after[0] >= before[0]
On Wed, Feb 1, 2012 at 10:47 PM, Ian Goodfellow
A CudaNdarray which I will call A is allocated and used as the value
for grad. During the call to f, a second CudaNdarray which I will call
B is allocated and used as the updated value for grad.
If I make a reference to A in my script right after initializing grad,
I can then call sys.getrefcount and gc.get_referrers on it at the end
of the script. The expected result is the get_referrers should list
only locals() and getrefcount should return 1. The actual result is
that get_referrers returns only locals() as expected but getrefcount
returns 3.
I have some questions about how to proceed now:
-Is it possible that if I crawl through all the sub-fields of f or of
theano I will find some reference to A that was not found by
get_referrers? or does this necessarily indicate that we have failed
to call a python C api decref macro somewhere?
I don't really understand the gc documentation on the subject of what
get_referrers will find or not find: "This function will only locate
those containers which support garbage collection; extension types
which do refer to other objects but do not support garbage collection
will not be found." What does it mean exactly to "support garbage
collection"?
-If it is possible that I might find some referrers by crawling
through all the sub-fields of f and/or theano, how exactly do I do
this? Iterating through all the elements of dir(obj) and calling
getattr(dir,field) will evaluate properties so I get stuck following
infinite loops of .T called on .T.
-Is there any good way of instrumenting the python C api incref/decref
macros so I can trace where they are getting called?
what exactly is defaults for? in this case it is a list containing two
tuples. Each tuple contains (False, False, CudaNdarray). One of the
CudaNdarrays is the one that is getting leaked.
Such strolls down memory lane are ripe opportunities for writing comments...
*ahem* :P
David
The shared flag is necessary because before any input with value=None
was flagged as required to be passed in by the client.
Unfortunately, my test script now only executes successfully if I run
it in 'FAST_COMPILE' mode.
If I run it in any other mode, it has an error inside run_cthunk so I
can't tell where the error actually originates.
If I run it on CPU, I get the error:
ValueError: ('expected an ndarray, not None', <TensorType(float32, matrix)>)
which makes me think some piece of code somewhere is trying to read
the value out of defaults, or the In.value field.
If I run it on GPU, I get the more inexplicable:
TypeError: ('Argument not a CudaNdarray', <CudaNdarrayType(float32, matrix)>)
Can anyone give me a high level level explanation of what is going on?
It seems to me that shared variables should get initialized once, when
you initialize them, and that the call to the theano function
shouldn't be trying to read their initial value. Moreover, I can't
think of any reason why different modes should read the initial value
from different locations.
Also, if anyone can give me any tips for figuring out the actual code
location of the failure inside run_cthunk that would be very helpful.
Can anyone give me a high level level explanation of what is going on?
It seems to me that shared variables should get initialized once, when
you initialize them, and that the call to the theano function
shouldn't be trying to read their initial value. Moreover, I can't
think of any reason why different modes should read the initial value
from different locations.
The memory leak was caused by FunctionMaker always saving the initial
value of all shared variables in the function's defaults field. I need
to remove this redundant storage, but it appears that some linkers
depend on it in ways that are unclear for reasons that are unclear.
It is true that there is an additional memory leak which I may be able
to find using your suggestion, but I want to fix this one before I
look for the second one.
On Thu, Feb 2, 2012 at 10:32 PM, Ian Goodfellow
from theano import shared
import numpy as np
import sys
shape = (400,8000)
x = shared(np.zeros(shape))
val = x.get_value(borrow=True,return_internal_type=True)
del x
print sys.getrefcount(val)
On Fri, Feb 3, 2012 at 10:34 AM, Ian Goodfellow