Tips on reducing intermediate garbage?

667 views
Skip to first unread message

Andrei Zh

unread,
Jul 20, 2014, 11:41:19 AM7/20/14
to julia...@googlegroups.com
Recently I found that my application spends ~65% of time in garbage collector. I'm looking for ways to reduce amount of memory produced by intermediate results.
For example, I found that "A * B" may be changed to "A_mul_B!(out, A, B)" that uses preallocated "out" buffer and thus almost eliminates additional memory allocation. But my application still produces lots of garbage on operations like matrix addition/subtraction, multiplication by scalar, etc.

Are there any other tricks that allow to decrease memory usage? 

Keith Campbell

unread,
Jul 20, 2014, 1:12:50 PM7/20/14
to julia...@googlegroups.com
Dahua Lin's post at http://julialang.org/blog/2013/09/fast-numeric/
might be helpful.

Andrei

unread,
Jul 21, 2014, 7:33:30 AM7/21/14
to julia...@googlegroups.com
Great write up! After some experiments I was able to reduce GC time from 65%  to only 15% and see opportunities to do even better. Most important things for me were:

 1. Some BLAS functions (especially "gemm!", which is pretty flexible).
 2. Manual devectorization (@devec didn't work for my case).

I see one disadvantage of using these tools, however - they are much harder to read. Are there any plans for automatic code optimization on compiler level?
 

Tim Holy

unread,
Jul 21, 2014, 7:43:23 AM7/21/14
to julia...@googlegroups.com
On Monday, July 21, 2014 02:33:26 PM Andrei wrote:
> I see one disadvantage of using these tools, however - they are much harder
> to read. Are there any plans for automatic code optimization on compiler
> level?

There are already many optimizations in place. But there's always more you
could do.

--Tim

Andrei

unread,
Jul 21, 2014, 8:51:14 AM7/21/14
to julia...@googlegroups.com
Could you please point me to where these optimizations take place? I see some other transformations (like escape analysis, for example) happening in codegen, are there any other places I should look at?

Jake Bolewski

unread,
Jul 21, 2014, 8:57:29 AM7/21/14
to julia...@googlegroups.com
Julia-syntax.scm (code lowering to ssa form) and type inference in base (type propagation, data flow analysis, inlining) are other places where julia performs compiler optimizations.

Tim Holy

unread,
Jul 21, 2014, 9:10:34 AM7/21/14
to julia...@googlegroups.com
codegen is a big one, as are inference.jl, gf.c, and cgutils.cpp. But there
are optimization sprinkled throughout (e.g., ccall.cpp).

You might be interested in this:
https://github.com/JuliaLang/julia/issues/3440

Most of the optimizations so far are low level; most of the higher-level stuff
tends to be macros in packages (@devec being a prime example, I'm working on
another now). The fact that @devec didn't work for you is evidence that this
is nontrivial (I bet that Dahua would be interested in contributions that
improve it). In the longer run, it might be interesting to experiment with
LLVM's Polly, but I'm not very clear on how far that project has gotten in
practice.

--Tim

Alessandro "Jake" Andrioni

unread,
Jul 21, 2014, 9:26:35 AM7/21/14
to julia...@googlegroups.com
InplaceOps.jl is another package that can help: it substitutes some
matrix operations with their mutable BLAS-based equivalents.

Stefan Karpinski

unread,
Jul 21, 2014, 4:04:10 PM7/21/14
to Julia Users
Automatic, general loop fusion is something that we want to make possible and Jeff and I have been discussion quite a bit lately. There's a few ideas that seem promising, but they won't happen immediately. The combination of doing better escape analysis and loop fusion should help these problems quite a bit and bring high-level vectorized Julia code closer to the performance of manually devectorized, loop fused code.

Andrei

unread,
Jul 21, 2014, 5:54:11 PM7/21/14
to julia...@googlegroups.com
Thanks a lot for all your answers! Now I need to take a break to learn all these cool stuff and get prepared to such a bright future :)

Viral Shah

unread,
Jul 22, 2014, 5:18:15 AM7/22/14
to julia...@googlegroups.com
There is also the GC improvement patch that is waiting for the 0.3 release, which should help improve the GC performance. With better escape analysis, it should be possible to reuse the garbage from vectorized expressions in a loop in the next iteration, and significantly reduce GC load. It should also give much better performance, and stuff like loop fusion should make it possible to avoid devectorization in many cases.

-viral
Reply all
Reply to author
Forward
0 new messages