Memory Optimizations for Lazy Data Structures on the JVM

18 views
Skip to first unread message

Malcolm Sharpe

unread,
Oct 26, 2007, 7:43:44 PM10/26/07
to CAL Language Discussion
Hi,

We've encountered an interesting Java memory usage problem when
running CAL programs. It is potentially an issue for any JVM language,
including Java, but appears most often in a lazy language such as CAL.
Open Quark 1.7.0 includes a partial fix and an experimental full fix.

When lazy data structures are used on the JVM, it sometimes happens
that references to parts of them are unnecessarily retained as local
variables, which can cause out-of-memory errors. The motivating CAL
library is Open Quark's Parser library, which is a port of Parsec
(http://research.microsoft.com/users/daan/download/papers/parsec-
paper.pdf). Parsec depends on GHC-specific memory behaviour for its
memory efficiency. Since CAL runs on the JVM, we cannot use the
techniques that GHC uses, so we have developed fixes that work for JVM
languages.

The idea is to null out local variables once they are no longer in
use. The difficulty is that it is tricky to null out local variables
when they are passed as arguments to methods. In fact, once a method
is compiled by Sun's Java JIT, the GC will not follow these local
variables. See Agesen et al., 1998 for details. However, the JIT is
only applied to methods that are run frequently, and it sometimes only
takes one run of a method to exhaust available memory. For this
reason, we needed fixes that would always work.

One solution we found is to use a helper method, lastRef, to aid in
nulling local variables when they are used as arguments. lastRef
simply returns its first argument, so by calling lastRef(x, x=null),
we can use the value of x and set it to null. This is a source-level
solution, so it has the advantage of keeping our Java source output
and our bytecode output in sync.

Additionally, the lastRef fix solves a related problem, which is the
necessity to null out fields after passing them as arguments but
before the execution of the called method. This problem seems likely
to occur in any lazy language implemented on the JVM.

Another solution we found is to automatically insert null assignments
directly into the Java bytecode. At the bytecode level, no trick is
needed to set local variables to null, since we can simply insert the
assignment after pushing the local variable onto the stack but before
invoking the method. This solution can also null 'this', which is
impossible at the Java source level and sometimes helps memory usage.
Since the bytecode solution is automatic, it can be applied to
arbitrary Java class files in addition to compiled CAL modules.

In Open Quark 1.7.0, we include a partial implementation of the
lastRef fix. Using lastRef to null out fields already gives speed
advantages for concurrency, making CAL even faster. As well, Open
Quark 1.7.0 includes a full, experimental implementation of the
bytecode-rewriting fix. It can be enabled by setting the
org.openquark.cal.machine.lecc.bytecode_space_optimization property.

- Malcolm Sharpe

[Agesen et al., 1998] Ole Agesen, David Detlefs, and J. Eliot B. Moss.
Garbage Collection and Local Variable Type-Precision and Liveness in
Java(tm) Virtual Machines. In Proceedings of the ACM SIGPLAN 1998
Conference on Programming Language Design and Implementation
(Montreal, Quebec, Canada, 1998), pp. 269-279.

Reply all
Reply to author
Forward
0 new messages