Hunting down memory leak

306 views
Skip to first unread message

Valentin Churavy

unread,
Mar 15, 2014, 3:21:08 PM3/15/14
to juli...@googlegroups.com
Hello there,

I recently encountered a memory leak while using (the rather fantastic OpenCL.jl package)


Effectively what happens is that I lose ~4kB of memory per loop step doing the following. 


import OpenCL; const cl = OpenCL
function leaky()
          device, ctx, queue = cl.create_compute_context()
          # Allocate buffer
          a = cl.Buffer(Float64, ctx, :rw, 1024*1024)
          cl.flush(queue)
          cl.release!(queue)
end

mem_before =  Base.gc_bytes()

for i in 1:100000
   leaky()
end

gc()
gc() //Just to make sure

mem_after = Base.gc_bytes()

println("Memory leaked  $((mem_after-mem_before)/1024/100000) kB per step")

So I starting to look into who is at fault here (options ranging from me, OpenCL, julia and cosmic radiation) and I would appreciate any advice.

Thats what I have done so far:
the GC_FINAL_STATS, MEMPROFILE,  OBJPROFILE flags there. After activating them in options.h and rebuilding julia it seems that the References of the created objects are properly garbage collected (I can see them pop up in the objprofile output and then disappear again after a while), but the memory usage continues to grow and julia only frees it when its terminates.

While writing this up I tested something very simple and now I am astounded...

function leaky()
  a = Array(Float64, 1024, 1024)
  a = nothing
  return nothing
end

precompile(leaky, ())

mem_before = Base.gc_bytes()

leaky()

gc(); gc(); gc()

mem_after = Base.gc_bytes()

println("Memory leaked  $((mem_after-mem_before)/1024) kB")

Waiting for several methods doesn't change a thing the memory is never freed.

So any advice what either I am doing wrong or what might go wrong is highly appreciated.

Best,
Valentin

Stefan Karpinski

unread,
Mar 15, 2014, 4:30:49 PM3/15/14
to Julia Dev
The gc_bytes function doesn't indicate how much memory is currently allocated. It's a monotonically increasing counter, indicating the total number of bytes allocated since the beginning of your program execution, whether it has been reclaimed or not. In other words gc_bytes never goes down (although it will wrap around eventually). The rationale for gc_bytes is that you can subtract before and after values to figure out how much allocation happened while you executed something.

Jake Bolewski

unread,
Mar 15, 2014, 4:49:32 PM3/15/14
to juli...@googlegroups.com
This almost certainly an error with OpenCL.jl.  I haven't had time to dig into this yet but the first place I would look is making sure OpenCL.jl objects are not begin retained incorrectly by the OpenCL runtime.  An OpenCL runtime maintains a per object refcount so that calling release! only decrements the refcount.  Freeing of the object only happens when the refcount goes to zero.  What is probably happening is the one of OpenCL.jl 's many objects refcount is not aligning with julia's "refcount" of the object, so the object appears to be gc'd by julia but never free'd by the runtime.

Jake

Valentin Churavy

unread,
Mar 15, 2014, 4:52:45 PM3/15/14
to juli...@googlegroups.com
Great to know. That at least explains the second case.

For my original problem tough...  There I can reliable observe the unlimited growth of memory with an external tool that is measuring the spaced used by julia. But it atleast hints more at at problem with either OpenCL or OpenCL.jl 

Jake Bolewski

unread,
Mar 15, 2014, 5:04:22 PM3/15/14
to juli...@googlegroups.com
I know AMD's CodeXL tool allows you to look at the refcounts of OpenCL's objects, I don't know if something similar exists for other platforms.  You should be able to get at the refcount of an object by looking at at it's info flag, but I don't know if I exported this for all objects (you could use the low level api in any case).  See http://dhruba.name/2012/08/14/opencl-cookbook-creating-contexts-and-reference-counting/ as an example.

I should be able to track this down sometime soon, but if you figure it out before me all the better :-)

Valentin Churavy

unread,
Mar 16, 2014, 12:00:56 AM3/16/14
to juli...@googlegroups.com
Hej Jake,

I reduced it to the minimal leaking case.

Function leaky()
CTX cl.create_some_context()
Return nothing
End

Is there a particular reason why the constructors of context set the retain to true? If I set it to false the memory leak goes away. Apparently there are platforms where the ocl memory can only be freed iff the reference count of all objects is zero. http://bloerg.net/2013/01/15/opencl-resource-management.html

Best,
Valentin

Reply all
Reply to author
Forward
0 new messages