Re: [julia-users] Suspending Garbage Collection for Performance...good idea or bad idea?

3,801 views
Skip to first unread message

Stefan Karpinski

unread,
Feb 20, 2013, 4:08:29 PM2/20/13
to Julia Users
If you disable GC all the time, then you'll never free anything and run out of memory.


On Wed, Feb 20, 2013 at 4:05 PM, nathan hodas <nho...@gmail.com> wrote:
I noticed that my performance was suffering due to garbage collection. so I used the macro
macro nogc(ex)
  quote
    gc_disable
()
   
local val = $ex
    gc_enable
()
    val
 
end
end

It gives great results, speeding up my code by ~20x -- but I'm wondering if there are any side-effects.  What could go wrong? Why wouldn't I want to do this all the time?



nathan hodas

unread,
Feb 20, 2013, 4:09:55 PM2/20/13
to julia...@googlegroups.com
Does the gc resume and clean up after the function in the @nogc macro? Or is this a memory leak?

Stefan Karpinski

unread,
Feb 20, 2013, 4:10:42 PM2/20/13
to Julia Users
For what it's worth, PHP somewhat unofficially takes this approach, opting to just leak memory left and right, under the assumption that it's running in a short-lived process that just constructs a web page and then exits, letting the OS free all of the allocated memory. Likewise, Apache uses a similar approach, using memory pools which can be released all at once, e.g. at the end of serving a page. It's a good model for serving web pages, but not going for anything longer running.

nathan hodas

unread,
Feb 20, 2013, 4:13:07 PM2/20/13
to julia...@googlegroups.com
so anything between a gc_suspend and gc_resume is never cleaned up?

Stefan Karpinski

unread,
Feb 20, 2013, 4:14:10 PM2/20/13
to Julia Users
No, that macro does the right thing, although you really want to put a try-catch around it. Something like this:

macro nogc(ex)
  quote
    try
      local val
      gc_disable()
      local val = $(esc(ex))
    finally
      gc_enable()
    end
    val
  end
end

Also note that I wrapped the expression `ex` in an `esc` call for proper macro hygiene.

Stefan Karpinski

unread,
Feb 20, 2013, 4:14:34 PM2/20/13
to Julia Users
Yes, that turns GC off.

Stefan Karpinski

unread,
Feb 20, 2013, 4:15:35 PM2/20/13
to Julia Users
To be clear: once GC resumes, it will find and collect all garbage, regardless of when it was created.

nathan hodas

unread,
Feb 20, 2013, 4:21:57 PM2/20/13
to julia...@googlegroups.com
That's a great observation.  it would certainly by a problem is gc_enable() wasn't called.  

Thanks
Nathan

Stefan Karpinski

unread,
Feb 20, 2013, 4:29:31 PM2/20/13
to Julia Users
Oops: the inner "local" keyword shouldn't be there.

Diego Javier Zea

unread,
Feb 20, 2013, 4:48:51 PM2/20/13
to julia...@googlegroups.com
Maybe this macro can be on base or in a Package and can be explained on Performance tips ?

Tim Holy

unread,
Feb 20, 2013, 6:44:24 PM2/20/13
to julia...@googlegroups.com
The other thing you should check is whether you're allocating more than you
need to. I find that I can often reuse bits of memory, and that can
dramatically decrease the need for gc. In the long run that may help you a
_lot_ more than temporarily disabling gc, because at some point you'll need to
turn it on again.

There are examples of this kind of thing scattered all over the Julia code
tree. Just because it was rather easy for me to find :-), here's the patch I
pushed to Zip in iterator.jl today:
https://github.com/JuliaLang/julia/commit/89ece095e8ea1fa166074306927c6cc5f90060f7
It got rid of two array allocations per iteration, by reusing one array and
pre-allocating "scratch space" inside the type for the other.

You can also reuse bits of memory by writing functions with a syntax like
this:
function myfunc(out::Array, arg1, arg2)
where out is pre-allocated storage for the output. This helps if you'll be
calling the same function many times and always need an array of the same type
and dimensions. Our matrix multiplication routines can be used in this way.

--Tim

nathan hodas

unread,
Feb 21, 2013, 1:13:45 PM2/21/13
to julia...@googlegroups.com
It's true that @nogc is not a panacea.  For my particular function, it produces a robust 20x speed up, even after subsequent collection.  For other seemingly similar functions, it has no effect at all. I don't use any temporary arrays *that I'm aware of*, but it seems the iterators of a Dict are doing something in the background.  

Stefan Karpinski

unread,
Feb 21, 2013, 1:35:34 PM2/21/13
to Julia Users
That's good information to have. I'll look into it.

Stefan Karpinski

unread,
Feb 21, 2013, 1:37:09 PM2/21/13
to Julia Users
Can you post some example code? Are you just iterating the Dict object with a for loop?

nathan hodas

unread,
Feb 21, 2013, 6:17:59 PM2/21/13
to julia...@googlegroups.com
Here's the code that benefits from @nogc: 

Notice the iteration over a Dict and a Set. iscorrect() checks a field of the Attempt type. I can tell by running this particular that the garbage collection is running during the for loop.
function meantime(userdata::Dict{Int,Set{Attempt}})
    usertimes = Dict{Int,Float64}()
    sizehint(usertimes,length(userdata))
    for (uid,attempts) in collect(userdata)
        s = 0.0;
        c = 0.0;
        ic = 0.0;
        for a in attempts
                ic = iscorrect(a)
                s += (a.tend - a.tstart)*ic;
                c += ic;
        end
        usertimes[uid] = s/c;
    end
    usertimes
end

This code has no benefit from @nogc, regardless of the kernel function k1:

function dostuff(input1,input2)
    output = similar(input1)
    len = length(input1)
    for i = 1:len
        x = input1[i]
        for j = 1:len
            y = input2[j]
            output[i] += k1(x,y)
        end
    end
    output
end

Tim Holy

unread,
Feb 22, 2013, 6:11:32 AM2/22/13
to julia...@googlegroups.com
Have you played with SProfile in the Profile package? It's rather good at
highlighting which lines, in your code and in base/, trigger the gc. Note that
in my experience the gc does not seem to be triggered necessarily on big
allocations; for example, even allocating an array as
Array(Int, (3,5))
rather than
Array(Int, 3, 5)
can trigger the gc (I see lots of gc() calls coming from our Lapack code for
this reason).

Because I don't really know how the gc works, I'm not certain that kind of
thing actually reflects a problem; perhaps it was just going to have to call gc
on the next heap-allocation event, and (3,5) just happened to be the lucky
candidate. But there's an open issue about this:
https://github.com/JuliaLang/julia/issues/1976

Docs are here: https://github.com/timholy/Profile.jl
I think they're slightly out of date, but only in very minor ways.

--Tim
> > <ste...@karpinski.org<javascript:>>
> > > wrote:
> >> That's good information to have. I'll look into it.
> >>
> >>
> >> On Thu, Feb 21, 2013 at 1:13 PM, nathan hodas
> >> <nho...@gmail.com<javascript:>>>
> >> > wrote:
> >>> It's true that @nogc is not a panacea. For my particular function, it
> >>> produces a robust 20x speed up, even after subsequent collection. For
> >>> other seemingly similar functions, it has no effect at all. I don't use
> >>> any
> >>> temporary arrays *that I'm aware of*, but it seems the iterators of a
> >>> Dict
> >>> are doing something in the background.
> >>>
> >>> On Wednesday, February 20, 2013 3:44:24 PM UTC-8, Tim Holy wrote:
> >>>> The other thing you should check is whether you're allocating more than
> >>>> you
> >>>> need to. I find that I can often reuse bits of memory, and that can
> >>>> dramatically decrease the need for gc. In the long run that may help
> >>>> you a
> >>>> _lot_ more than temporarily disabling gc, because at some point you'll
> >>>> need to
> >>>> turn it on again.
> >>>>
> >>>> There are examples of this kind of thing scattered all over the Julia
> >>>> code
> >>>> tree. Just because it was rather easy for me to find :-), here's the
> >>>> patch I
> >>>> pushed to Zip in iterator.jl today:
> >>>> https://github.com/JuliaLang/**julia/commit/**
> >>>> 89ece095e8ea1fa166074306927c6c**c5f90060f7<https://github.com/JuliaLang
> >>>> /julia/commit/89ece095e8ea1fa166074306927c6cc5f90060f7> It got rid of

Eric Forgy

unread,
Dec 15, 2014, 8:43:09 PM12/15/14
to julia...@googlegroups.com
Hi,

I'm new to Julia and mentioned it to a friend who is more into systems than mathematical models and he mentioned his current "crush" is Rust, which is also built on LVVM. I may have totally missed the point, but IF I understand, Rust does away with garbage collection by "borrow blocking" at compile time. The question popped into my head whether we could turn off GC in Julia and check for problems at compile time. A google later, brought me to this thread. Is that a totally misguided idea?

Best regards,
Eric

PS: You can tell I'm coming in with almost no background knowledge about compilers (or real languages for that matter), but am having fun learning. LVVM was developed at my alma mater (PhD in ECE - Computational Electromagnetics - from UIUC 2002). Go Illini! :)

John Myles White

unread,
Dec 15, 2014, 11:15:36 PM12/15/14
to julia...@googlegroups.com
This is taking the thread off-topic, but conceptually such things are possible. But Rust has a very different set of semantics for memory ownership than Julia has and is doing a lot more analysis at compile-time than Julia is doing. So Julia would need to change a lot to be more like Rust. I've come to really adore Rust, so I'd like to see us borrow some ideas, but my personal sense is that Julia and Rust simply serve different niches and shouldn't really move towards one another too much lest each language wind up forsaking what makes it useful.

 -- John

Stefan Karpinski

unread,
Dec 16, 2014, 4:24:08 PM12/16/14
to Julia Users
I would love to figure out a way to bring the kind of automatic resource and memory release that Rust has to Julia, but the cost is some fairly finicky static compiler analysis that is ok for Rust's target demographic but fairly clearly unacceptable for Julia general audience. What we'd need is a more dynamic version of something like that. One idea I've had is to indicate ownership and enforce it at run-time rather than compile time – and eliminate run-time checks when we can prove that they aren't needed. This could look something like having "owned references" to values versus "borrowed references" and check that the object still exists if the borrowed reference is used and raise an error if it doesn't. When an owned reference to the thing goes out of scope, immediately finalize it. I'm not sure how to indicate this, but it would be great to be able to write:

function foo(...)
    fh = open(file)
    # do stuff with fh
end # <= fh is automatically before the function returns
# it's a runtime error to access the fh object after this

Similarly, you could have something like this:

function bar(...)
    a = @local Array(Int,10)
    # do stuff with a    
end # <= a is automatically freed before the function returns
# it's a runtime error to access the a object after this

Given these semantics, it would be relatively easy to alloca the actual memory of the array, and only heap allocate the object itself, which could then reference the stack allocated memory. This is tough to implement, especially efficiently, but I have a bit of a suspicion that in Julia mutable objects – and this only makes sense for mutable objects that are inherently associated with a particular place in memory – are rarely performance critical in Julia.

Michael Louwrens

unread,
May 9, 2015, 3:46:18 PM5/9/15
to julia...@googlegroups.com
Have you had any further thought on this? It seems like it could be quite useful for the cases where one intentionally disables the GC for performance reasons - though you guys are incredibly busy! I also read about having the compiler automatically insert frees where it can in Julia and was wondering if that fits at all into this?

Stefan Karpinski

unread,
May 10, 2015, 2:20:42 PM5/10/15
to julia...@googlegroups.com
No, there hasn't been any change on this. It's unclear if anything from the Rust model can actually be leveraged in a dynamic language.

Steven Sagaert

unread,
May 11, 2015, 8:09:41 AM5/11/15
to julia...@googlegroups.com
Isn't that similar to smart pointers/automatic resource management in C++?

Steven Sagaert

unread,
May 11, 2015, 8:21:58 AM5/11/15
to julia...@googlegroups.com

Rust isn't the only language to use such ideas. Basically it's region based memory management http://en.wikipedia.org/wiki/Region-based_memory_management. Real time Java uses this. For a recent development next to Rust, check out ParaSail https://forge.open-do.org/plugins/moinmoin/parasail/. There's even now a version based on LLVM so closer to home for Julia than Rust.

Steven Sagaert

unread,
May 11, 2015, 8:26:45 AM5/11/15
to julia...@googlegroups.com
I guess you can approximate/emulate the region based memory management in a GC'd language by dividing the heap into many small region and run GC over all these regions regularly. That's what the G1 GC in Java does and see also Azul Zing for soft real time & high performance Java.

Michael Louwrens

unread,
May 11, 2015, 6:03:20 PM5/11/15
to julia...@googlegroups.com
I am starting to read "Region-Based Memory Management for a Dynamically-Typed Language" it proposes a second inference system, region inference.

I will read it fully in the morning but just scanning through their results they compare their regions to a GC. The GC library uses far more heap in most cases, though the region based system needs optimisation to be competitive.

At one point they do state a combination of region based memory management and GC would be interesting.

For a prototype implementation, being 2x-3x slower while mostly using far less memory is quite successful. The "Div" benchmark from Gabriel Scheme benchmarks was the most impressive in terms of heap usage using 32.2KB vs. 1219.6 for the GC'd version. In a memory constrained system this would be an interesting thing to look at, the outliers are a bit of a concern though. The "Tak" and "Destruct" benchmarks use almost 10x the amount of heap the GC did.

If anything it was an interesting read. The emulated region based management sounds quite interesting in fact. Will go read up on the two Steven Sagaert mentioned. Haven't read too much about G1 and nothing at all on Azul Zing!

Steven Sagaert

unread,
May 12, 2015, 3:35:11 AM5/12/15
to julia...@googlegroups.com
if you take each heap region and give them their own garbage collector and assign each thread/"proces"/fiber one, then you get another somewhat related approach for soft real time performance in GC'd languages (it's both for performance and safety & parallelism) like you have in Erlang/Elixir (dynamically typed & JITed/interpreted) or Nim (AOT, statically typed).

Jan Kybic

unread,
May 12, 2015, 6:59:31 PM5/12/15
to julia...@googlegroups.com


On Tuesday, December 16, 2014 at 10:24:08 PM UTC+1, Stefan Karpinski wrote:
I would love to figure out a way to bring the kind of automatic resource and memory release that Rust has to Julia, but the cost is some fairly finicky static compiler analysis that is ok for Rust's target demographic but fairly clearly unacceptable for Julia general audience. What we'd need is a more dynamic version of something like that. One idea I've had is to indicate ownership and enforce it at run-time rather than compile time – and eliminate run-time checks 
.... 
Given these semantics, it would be relatively easy to alloca the actual memory of the array, and only heap allocate the object itself, which could then reference the stack allocated memory. This is tough to implement, especially efficiently, but I have a bit of a suspicion that in Julia mutable objects – and this only makes sense for mutable objects that are inherently associated with a particular place in memory – are rarely performance critical in Julia.



In my own area (image processing and numerical calculations), using high-level vector and array operations in the inner loop is very detrimental to performance in Julia, especially if allocation of storage space for temporary results is needed. In order to get a "decent" performance, I usually need to (i) manually unroll the matrix operactions (@devec works but not always) and (ii) preallocate the temporaries and the output outside of the loop. The can be done by hand but it is rather tedious and leads to ugly code. It would be good if the compiler could allocate the temporaries on the stack or in a region, so that they could be deallocated quickly. The compiler might need some help, such as annotating a particular object as "alias-free", which would probably lead to "rust-like" annotations. But if it brings performance, I would not mind.

Yours,

Jan


cdm

unread,
May 12, 2015, 7:57:32 PM5/12/15
to julia...@googlegroups.com

epic ...

Michael Louwrens

unread,
May 13, 2015, 2:15:20 AM5/13/15
to julia...@googlegroups.com
Indeed! Steven has given me quite a few things to look up and read on. Then still to read the current Julia GC!

Steven Sagaert

unread,
May 13, 2015, 3:34:34 AM5/13/15
to julia...@googlegroups.com
In java for high performance using the "regular JVM" not specialized JVM like Azul Zing, sometimes memory is managed manually off-heap. This is done for high performance IO & for in-memory data grids. Google "java off-heap" if you want to know more about this.

Páll Haraldsson

unread,
May 13, 2015, 12:58:24 PM5/13/15
to julia...@googlegroups.com
On Monday, May 11, 2015 at 10:03:20 PM UTC, Michael Louwrens wrote:
I am starting to read "Region-Based Memory Management for a Dynamically-Typed Language" it proposes a second inference system, region inference.

Interesting, I just scanned the paper down to Table 1. I see GC is in all ten benchmark cases faster than "Region" based. The heap size is usually larger (not always! Can be order of magnitude larger for "Region") for GC, so there is/can be a time-space trade-off.

Should I expect GC (I assume this GC is similar to Julia's) to always be faster for manual memory management (such as in C/C++)? This "region"-based is not the same/similar as in C/C++?

GC has to do the same allocations (and deallocations - except in a degenerate case - closing program..) as manual memory management - at I expect the same speed, noting:

GC *seems* to have an overhead as it has to scan the heap, but that is overblown as a drawback, right? As with larger memories that overhead will be arbitrarily reduced? [Not taking caching effects into account.. The memory itself will not cost you anything (in a single application scenario) as RAM burns energy whether it is "used" or not.] And compared to C/C++ you would use less redundant copying. How much necessary copying is there, really..?


I do not worry to much about hard real-time, GC seems to only have drawbacks with stuttering (and not so much with better GC algorithms). Even for games that are only soft-real-time wouldn't it be ok? Already as is (in 0.4)? It is not clear to me that something other than GC would be helpful there (one of the benchmarks was ray-tracing), as you could force GC on vblank, and in next-gen ray-tracing, vblank/fps isn't that important..

Besides, if you only work on datastructures that are static/updated-in-place, there shouldn't be much GC activity?

-- 
Palli.

Steven Sagaert

unread,
May 15, 2015, 5:17:45 AM5/15/15
to julia...@googlegroups.com
I'd say that manual memory management is usually going to be faster than GC unless you have really bad manual management and a very good GC. The best a good GC can hope for is to be close to manual management. That's one of the reasons the majority of systems software is still in C/C++ (memory layout is another).  Note that not all GC's are created equal: the performance of GC's can range from terrible to excellent depending on the algo & the tuning. The Oracle JVM alone has several GC algos built-in each with its advantages & disadvantages.

Michael Louwrens

unread,
Aug 1, 2015, 5:10:55 PM8/1/15
to julia-users
I haven't tested the new generational GC so this may all be moot but:

I had a recent thought. Wouldn't a manual free method that frees the object and tells the GC that it this object doesn't exist anymore be useful in some cases?
I can imagine anywhere where you have temporary objects allocated it would be helpful or when you have large arrays/arrays of objects which you know after a certain point will never be referenced again.

This would probably end up being abused as a premature optimization and potentially cause crashes if freed memory is accessed I do realize. There also will not be too many valid uses of it. I do feel that it could have some useful functionality however. Those that use gc_disable would then be able to free memory.

Just a thought! Even if it is shot down the explanation will be here and others with a similar thought will hopefully be able to find it (If a similar one exists - I could not find it. My Google-fu is a bit lacking though, just link me to the discussion if so thanks!)

Yichao Yu

unread,
Aug 1, 2015, 5:59:25 PM8/1/15
to Julia Users
Havening looked at memory management in julia recently, my impression
is that these manually memory management techniques cannot improve the
performance on top of a GC.

I'll first show the evidence with numbers.

```
julia> f_gc(n) = for i in 1:n
Ref(1)
end
f_gc (generic function with 1 method)

julia> f_malloc(n) = for i in 1:n
ptr = ccall(:malloc, Ptr{Void}, (Csize_t,), 16)
ccall(:free, Void, (Ptr{Void},), ptr)
end
f_malloc (generic function with 1 method)

julia> @time f_gc(1)
0.002025 seconds (154 allocations: 10.496 KB)

julia> @time f_malloc(1)
0.002245 seconds (4 allocations: 160 bytes)

julia> gc(); @time f_malloc(100_000_000)
2.179464 seconds (4 allocations: 160 bytes)

julia> gc(); @time f_gc(100_000_000)
0.527786 seconds (100.00 M allocations: 1.490 GB, 10.38% gc time)
```

I suppose `f_malloc` is what people would want to transform `f_gc`
into (the size 16 includes the tag on 64bit platform, changing to
smaller number doesn't change the time by a lot)

The timing above clearly shows that the gc is faster than manual
memory management. You can even check the code_llvm output to make
sure julia didn't generate bad code for the malloc version. (Note that
if you write the malloc in c, it will likely be much faster but that's
because the compiler can remove the call to malloc and free if it can
figure out the result pointer doesn't escape, and it should in this
simple case. This optimization is also possible to do in julia and is
an on going effort e.g. https://github.com/JuliaLang/julia/pull/12205)

As surprising as it seems for people from c/c++, there is good reasons
behind it. I'm going to describe my understanding of it but others
with more experience on memory management should correct me if I make
any mistakes (likely ;-p ).

To understand this issue, it is important to know that freeing memory
can be as expensive as allocating memory. You may expect that the
garbage collector don't need to do anything we you free a piece of
memory but that is actually not the case. By definition, a free memory
block means that the program should be able to allocate in it, which
means that the allocation should be able to find it. Therefore, what
needs to happen when you manually free a piece of memory is that the
underlying system will push it in some data structure and pop it out
on allocation. Unless your program is using more and more memory, none
of these needs any system call and the push and pop operation are
likely similarly expensive.

With this in mind, the reason a GC is faster is that it can do these
expensive operations in a more effecient way. Building a free list (as
in julia GC) is expensive partly because of the memory access pattern
and since the GC can build the free list in one go, it can do this in
a linear order that is more cache efficient than what would happen for
manual memory management. It usually also have more knowledge about
the object layout etc which can enable further optimization.

Michael Louwrens

unread,
Aug 1, 2015, 7:45:10 PM8/1/15
to julia-users
Haha!

I was expecting you or @carnival (sorry, don't know the name) to respond. Been following your issues/PRs as they are quite interesting to read!

Those results are quite surprising! I was expecting the GC to perform far worse than that in comparison. You have successfully scratched my itch on wondering if it would have been an improvement!

Thanks for the numbers and the explanation, it is much appreciated. The mind does tend to wonder when your code has been running for 12 hours.
Reply all
Reply to author
Forward
0 new messages