Clojure benchmark memory use and future improvements (Re: Clojure vs F# performance)

450 views
Skip to first unread message

John Fingerhut

unread,
Nov 22, 2010, 11:47:06 AM11/22/10
to clo...@googlegroups.com


On Mon, Nov 22, 2010 at 5:00 AM, Ralph <grku...@gmail.com> wrote:
On the Programming Languages Comparison Site (http://
shootout.alioth.debian.org/u64/benchmark.php?
test=all&lang=clojure&lang2=fsharp), if you run the Clojure vs. F#
comparison, Clojure scores about the same in speed as F# (but does use
more memory).

As a preview of things to come soon, I've been focusing on adding instrumentation to my clojure-benchmarks github repository that measure memory usage.  I realized some time ago that the reason many of the Clojure benchmark programs use so much memory is that the default JVM heap size is quite large.  Thus if a Clojure program allocates memory *at all* in the main loop, and if you run that main loop long enough, it will fill up the available heap before a garbage collection is triggered.  Such a program will be measured as using all of that memory, even though it didn't "need it".  The memory use measurements on the benchmark can in many cases be significantly reduced, with little or no increase in run time, by selecting an appropriate value for the JVM -Xmx maximum heap size parameter.

I have some Perl programs I've hacked together that on Windows + Cygwin, Mac OS X, and Linux, will run a program multiple times while varying the -Xmx heap size limit.  It first uses binary search to find the smallest heap size at which the program produces correct output, and then it does a "sweep" starting at that value X, and runs the program for several somewhat larger limits, to see if a small amount of additional heap space will reduce the GC time noticeably.

While doing that, it is saving all of the details of the runs in an XML file, and printing summaries of a few statistics on the console.  Below is an example of the console output running the k-nucleotide benchmark program on an input that is significantly smaller than the official benchmark input (I often run these programs on smaller input sizes during testing so I can get results without growing too much older in the process).  I give a pretty detailed explanation of what this output means below that, in case you are curious.

All of this is in the clojure-benchmarks github repo now, although I will be improving it some more for a while (parsing HotSpot GC log files better, parsing GC logs for JRockit, correcting the Max RSS calculations on Mac OS X 10.6.x, etc.)  I'll warn you that the command line options are pretty numerous, but mostly documented.  Testing so far has been limited to Mac OS X, Ubuntu Linux, and Windows + Cygwin.

https://github.com/jafingerhut/clojure-benchmarks

git clone git://github.com/jafingerhut/clojure-benchmarks.git

Andy Fingerhut


Example output from "tmcurve" program, with line numbers prepended for reference:

 1  % ../bin/tmcurve --verbose --jvm-gc-stats --delta 8 --sorted-summary --results-file results.xml --min 1 --precision 1 --input input/medium-input.txt --output output/medium-clj-1.2-output.txt --check 'diff --strip-trailing-cr --brief output/medium-expected-output.txt %o' java -server -Xmx%mbm -classpath /Users/andy/lein/swank-clj-1.2.0/lib/clojure-1.2.0.jar:./obj/clj-1.2 knucleotide 1
 2  Looking for smallest memory size that succeeds.
 3 
 4  Max heap=128 MB succeed  cpu=19.3 sec  Max RSS=125.4 MB  RSS-heap=-2.6 MB  GC tot space=2847.5 MB time=0.7 sec max live=63.1 MB
 5  Max heap=64 MB succeed  cpu=20.517 sec  Max RSS=103.9 MB  RSS-heap=39.9 MB  GC tot space=2860.7 MB time=1.0 sec max live=44.9 MB
 6  Max heap=32 MB succeed  cpu=62.947 sec  Max RSS=87.8 MB  RSS-heap=55.8 MB  GC tot space=2871.1 MB time=43.7 sec max live=25.0 MB
 7  Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
 8  check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
 9  Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-3JpxmO/5A7grwwKdK'
10  Max heap=16 MB fail  cpu=4.077 sec  Max RSS=58.7 MB  RSS-heap=42.7 MB  GC tot space=97.1 MB time=0.6 sec max live=11.6 MB
11  Now perform binary search on range [16,32]
12  Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
13  check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
14  Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-wzd8KX/_pSTbtcZc3'
15  Max heap=24 MB fail  cpu=4.456 sec  Max RSS=77.2 MB  RSS-heap=53.2 MB  GC tot space=97.1 MB time=0.8 sec max live=17.1 MB  new range [24,32]
16  Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
17  check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
18  Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-BNjL9y/tLMKZ_tXVq'
19  Max heap=28 MB fail  cpu=4.131 sec  Max RSS=72.2 MB  RSS-heap=44.2 MB  GC tot space=135.8 MB time=0.6 sec max live=21.9 MB  new range [28,32]
20  Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
21  check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
22  Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-sWmPCJ/S1SB7br3Hv'
23  Max heap=30 MB fail  cpu=4.063 sec  Max RSS=75.8 MB  RSS-heap=45.8 MB  GC tot space=135.8 MB time=0.5 sec max live=25.8 MB  new range [30,32]
24  Max heap=31 MB succeed  cpu=63.264 sec  Max RSS=76.4 MB  RSS-heap=45.4 MB  GC tot space=2871.9 MB time=44.5 sec max live=25.5 MB  new range [30,31]
25 
26  Succeeded at 31 MB, failed at 30 MB
27 
28  Sweep phase MB values to use: 31 39 47 55 63
29  Max heap=31 MB succeed  cpu=64.21 sec  Max RSS=80.0 MB  RSS-heap=49.0 MB  GC tot space=2868.8 MB time=45.6 sec max live=26.6 MB
30  Max heap=31 MB succeed  cpu=63.626 sec  Max RSS=79.9 MB  RSS-heap=48.9 MB  GC tot space=2875.1 MB time=44.4 sec max live=24.9 MB
31  Max heap=31 MB succeed  cpu=65.422 sec  Max RSS=80.0 MB  RSS-heap=49.0 MB  GC tot space=2869.7 MB time=46.3 sec max live=26.6 MB
32  Max heap=31 MB succeed  cpu=64.325 sec  Max RSS=75.3 MB  RSS-heap=44.3 MB  GC tot space=2875.1 MB time=45.8 sec max live=26.6 MB
33  Max heap=39 MB succeed  cpu=21.034 sec  Max RSS=83.4 MB  RSS-heap=44.4 MB  GC tot space=2861.7 MB time=1.4 sec max live=35.4 MB
34  Max heap=39 MB succeed  cpu=20.939 sec  Max RSS=82.5 MB  RSS-heap=43.5 MB  GC tot space=2858.2 MB time=1.5 sec max live=35.4 MB
35  Max heap=39 MB succeed  cpu=21.3 sec  Max RSS=90.6 MB  RSS-heap=51.6 MB  GC tot space=2863.2 MB time=1.4 sec max live=35.5 MB
36  Max heap=39 MB succeed  cpu=21.661 sec  Max RSS=91.1 MB  RSS-heap=52.1 MB  GC tot space=2864.3 MB time=1.5 sec max live=35.7 MB
37  Max heap=47 MB succeed  cpu=21.504 sec  Max RSS=98.9 MB  RSS-heap=51.9 MB  GC tot space=2854.2 MB time=1.9 sec max live=40.0 MB
38  Could not parse line 26 of file /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/Du7UdDBzIQ/qyTKNgBLn0.txt:
39  3.506: [GC-- 42528K->46648K(46656K), 0.0422814 secs]
40  /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/VBRRJnGxyt/P7aYOLF3di:1: parser error : Start tag expected, '<' not found
41  Exit 2

Line 1 is the invocation of one of my Perl programs, called tmcurve for "time memory curve".

Line 4 shows summary statistics for one run of the benchmark program.

 4  Max heap=128 MB succeed  cpu=19.3 sec  Max RSS=125.4 MB  RSS-heap=-2.6 MB  GC tot space=2847.5 MB time=0.7 sec max live=63.1 MB

Let's break down this summary output line into its parts:

    Max heap=128 MB - It was given a max heap (-Xmx) parameter of 128
    MB (-Xmx128m).

    succeed - The benchmark program's output was checked, and was
    correct.

    cpu=19.3 sec - It used 19.3 sec of total cpu time (the user+system
    cpu times reported by the Unix '/usr/bin/time' command added
    together).

    Max RSS=125.4 MB - The maximum memory ever resident in RAM during
    the process's run was Max RSS=125.4 MB, as reported by the
    getrusage() system call.

    RSS-heap=-2.6 MB - This is just RSS with heap subtracted out.
    This number can be positive, because the JVM has memory it needs
    to run that is not in the program's heap.

    GC - Everything after that is statistics collected from parsing
    the HotSpot logs you get with the -verbose:gc argument to the JVM.

    tot space=2847.5 MB - The total amount of memory collected by all
    GC invocations was 2847.5 MB.

    time=0.7 sec - This took a total of 0.7 sec

    max live=63.1 MB - The maximum heap size before any GC invocation
    was 63.1 MB.  The name "max live" is probably not the best name
    for that value, since objects in heap before a GC begins are often
    not all live.

Lines 4-24 show the results of multiple runs, while tmcurve is doing a
binary search to find the smallest heap size for which the program
gives correct output.

26  Succeeded at 31 MB, failed at 30 MB

Line 26 summarizes the results of the binary search.

28  Sweep phase MB values to use: 31 39 47 55 63

Line 28 shows the MB values that will be used in the "sweep" phase.
The amount that each value is larger than the previous one can be
controlled by tmcurve command line arguments.  In this case --delta 8
means that each value will be 8 MB larger than the one before.  Other
tmcurve options can be used to choose how many such values are used
(in this case 5), and how many times the benchmark program will be run
for each of these MB values (in this case 4).

29  Max heap=31 MB succeed  cpu=64.21 sec  Max RSS=80.0 MB  RSS-heap=49.0 MB  GC tot space=2868.8 MB time=45.6 sec max live=26.6 MB
30  Max heap=31 MB succeed  cpu=63.626 sec  Max RSS=79.9 MB  RSS-heap=48.9 MB  GC tot space=2875.1 MB time=44.4 sec max live=24.9 MB
31  Max heap=31 MB succeed  cpu=65.422 sec  Max RSS=80.0 MB  RSS-heap=49.0 MB  GC tot space=2869.7 MB time=46.3 sec max live=26.6 MB
32  Max heap=31 MB succeed  cpu=64.325 sec  Max RSS=75.3 MB  RSS-heap=44.3 MB  GC tot space=2875.1 MB time=45.8 sec max live=26.6 MB
33  Max heap=39 MB succeed  cpu=21.034 sec  Max RSS=83.4 MB  RSS-heap=44.4 MB  GC tot space=2861.7 MB time=1.4 sec max live=35.4 MB
34  Max heap=39 MB succeed  cpu=20.939 sec  Max RSS=82.5 MB  RSS-heap=43.5 MB  GC tot space=2858.2 MB time=1.5 sec max live=35.4 MB
35  Max heap=39 MB succeed  cpu=21.3 sec  Max RSS=90.6 MB  RSS-heap=51.6 MB  GC tot space=2863.2 MB time=1.4 sec max live=35.5 MB
36  Max heap=39 MB succeed  cpu=21.661 sec  Max RSS=91.1 MB  RSS-heap=52.1 MB  GC tot space=2864.3 MB time=1.5 sec max live=35.7 MB

Lines 29-32 show the results of 4 runs with the same maximum heap size
of 31 MB.  In all cases, we see that the GC time is about 2/3 of the
total cpu time.

Lines 33-36 show that when we increase the max heap size to 39 MB,
total GC time goes way down to 1.4 or 1.5 sec, and the total cpu time
thus goes down to about 21 sec.  Obviously for this program, you get
much lower time with just a little bit more memory than the minimum
required to succeed.

38  Could not parse line 26 of file /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/Du7UdDBzIQ/qyTKNgBLn0.txt:
39  3.506: [GC-- 42528K->46648K(46656K), 0.0422814 secs]
40  /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/VBRRJnGxyt/P7aYOLF3di:1: parser error : Start tag expected, '<' not found
41  Exit 2

Lines 38-41 show that I still have more work to do on parsing the
output of -verbose:gc :-) I am fairly conservative in my code for
parsing log files, spitting out an error and quitting rather than
silently guessing what the line might mean, so this is normal for me
when I have not yet seen all possible formats of log file lines.

Ken Wesson

unread,
Nov 22, 2010, 12:51:27 PM11/22/10
to clo...@googlegroups.com
On Mon, Nov 22, 2010 at 11:47 AM, John Fingerhut
<andy.fi...@gmail.com> wrote:
>     max live=63.1 MB - The maximum heap size before any GC invocation
>     was 63.1 MB.  The name "max live" is probably not the best name
>     for that value, since objects in heap before a GC begins are often
>     not all live.

You'll get a much better approximation to the concept of "max live"
from the maximum heap size right AFTER any GC invocation.

John Fingerhut

unread,
Nov 28, 2010, 6:56:52 AM11/28/10
to clo...@googlegroups.com
Thanks, Ken.  I've changed that in my program now.

I've also found that one reason why the Clojure benchmarks on the shootout web site were using so much more memory than the corresponding Java programs (e.g. up to about 350 Mbytes for the fannkuch-redux benchmark program) is a combination of several factors:

(1) the Clojure programs were generating garbage, i.e. allocating memory that later became unused.  Even though the total amount of live objects at any one time was well under 4 MBytes for the fannkuch-redux benchmark program, for example, it was not being collected soon enough to keep the amount of resident memory low.

(2) That combined with the default heap size and GC method used, when no special command line arguments were being used on the java command line, led to large measured memory usage.

By selecting an appropriate maximum heap size using the -Xmx command line option, the maximum resident memory used by several of the Clojure (and also JRuby) programs have been reduced significantly.  Several of the programs formerly had their memory use 30x the corresponding Java program.  Now the worst case ratio of (Clojure memory use / Java memory use) is about 3x, and there are still several programs for which I haven't yet run the experiments to find an appropriate max heap size to use.  After that happens, the average memory use of Clojure programs (and JRuby programs) will go down further.  Here is a link where you can see the results for the one-core 32-bit benchmark machine, for example:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=clojure

Does anyone know a way from within a Java/Clojure program to determine which GC algorithm is currently in use?  I'm curious what the default is when one is not specified on the command line, and accessing the one being used from inside of a program would be one good way to find out for sure.  There doesn't seem to be anything returned by Runtime.getRuntime().getProperties() to indicate that.

Andy


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Alex Osborne

unread,
Nov 28, 2010, 8:04:41 AM11/28/10
to clo...@googlegroups.com
John Fingerhut <andy.fi...@gmail.com> writes:

> Does anyone know a way from within a Java/Clojure program to determine
> which GC algorithm is currently in use? I'm curious what the default
> is when one is not specified on the command line, and accessing the
> one being used from inside of a program would be one good way to find
> out for sure. There doesn't seem to be anything returned by
> Runtime.getRuntime().getProperties() to indicate that.

You can get at least something via JMX:

(require '[clojure.contrib.jmx :as jmx])

(map #(subs (str %) 37)
(jmx/mbean-names "java.lang:type=GarbageCollector,*"))

Default on my PC: ("PS MarkSweep" "PS Scavenge")
With -XX:-UseParallelGC: ("Copy" "MarkSweepCompact")
With -XX:+UseG1GC: ("G1 Old Generation" "G1 Young Generation")

You can also see this same data by running JConsole (shipped in the
JDK's bin directory) and attaching to a running Java process.

Reply all
Reply to author
Forward
0 new messages