As a preview of things to come soon, I've been focusing on adding instrumentation to my clojure-benchmarks github repository that measure memory usage. I realized some time ago that the reason many of the Clojure benchmark programs use so much memory is that the default JVM heap size is quite large. Thus if a Clojure program allocates memory *at all* in the main loop, and if you run that main loop long enough, it will fill up the available heap before a garbage collection is triggered. Such a program will be measured as using all of that memory, even though it didn't "need it". The memory use measurements on the benchmark can in many cases be significantly reduced, with little or no increase in run time, by selecting an appropriate value for the JVM -Xmx maximum heap size parameter.
I have some Perl programs I've hacked together that on Windows + Cygwin, Mac OS X, and Linux, will run a program multiple times while varying the -Xmx heap size limit. It first uses binary search to find the smallest heap size at which the program produces correct output, and then it does a "sweep" starting at that value X, and runs the program for several somewhat larger limits, to see if a small amount of additional heap space will reduce the GC time noticeably.
While doing that, it is saving all of the details of the runs in an XML file, and printing summaries of a few statistics on the console. Below is an example of the console output running the k-nucleotide benchmark program on an input that is significantly smaller than the official benchmark input (I often run these programs on smaller input sizes during testing so I can get results without growing too much older in the process). I give a pretty detailed explanation of what this output means below that, in case you are curious.
All of this is in the clojure-benchmarks github repo now, although I will be improving it some more for a while (parsing HotSpot GC log files better, parsing GC logs for JRockit, correcting the Max RSS calculations on Mac OS X 10.6.x, etc.) I'll warn you that the command line options are pretty numerous, but mostly documented. Testing so far has been limited to Mac OS X, Ubuntu Linux, and Windows + Cygwin.
https://github.com/jafingerhut/clojure-benchmarksgit clone git://
github.com/jafingerhut/clojure-benchmarks.git
Andy Fingerhut
Example output from "tmcurve" program, with line numbers prepended for reference:
1 % ../bin/tmcurve --verbose --jvm-gc-stats --delta 8 --sorted-summary --results-file results.xml --min 1 --precision 1 --input input/medium-input.txt --output output/medium-clj-1.2-output.txt --check 'diff --strip-trailing-cr --brief output/medium-expected-output.txt %o' java -server -Xmx%mbm -classpath /Users/andy/lein/swank-clj-1.2.0/lib/clojure-1.2.0.jar:./obj/clj-1.2 knucleotide 1
2 Looking for smallest memory size that succeeds.
3
4 Max heap=128 MB succeed cpu=19.3 sec Max RSS=125.4 MB RSS-heap=-2.6 MB GC tot space=2847.5 MB time=0.7 sec max live=63.1 MB
5 Max heap=64 MB succeed cpu=20.517 sec Max RSS=103.9 MB RSS-heap=39.9 MB GC tot space=2860.7 MB time=1.0 sec max live=44.9 MB
6 Max heap=32 MB succeed cpu=62.947 sec Max RSS=87.8 MB RSS-heap=55.8 MB GC tot space=2871.1 MB time=43.7 sec max live=25.0 MB
7 Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
8 check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
9 Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-3JpxmO/5A7grwwKdK'
10 Max heap=16 MB fail cpu=4.077 sec Max RSS=58.7 MB RSS-heap=42.7 MB GC tot space=97.1 MB time=0.6 sec max live=11.6 MB
11 Now perform binary search on range [16,32]
12 Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
13 check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
14 Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-wzd8KX/_pSTbtcZc3'
15 Max heap=24 MB fail cpu=4.456 sec Max RSS=77.2 MB RSS-heap=53.2 MB GC tot space=97.1 MB time=0.8 sec max live=17.1 MB new range [24,32]
16 Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
17 check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
18 Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-BNjL9y/tLMKZ_tXVq'
19 Max heap=28 MB fail cpu=4.131 sec Max RSS=72.2 MB RSS-heap=44.2 MB GC tot space=135.8 MB time=0.6 sec max live=21.9 MB new range [28,32]
20 Files output/medium-expected-output.txt and output/medium-clj-1.2-output.txt differ
21 check output cmd 'diff --strip-trailing-cr --brief output/medium-expected-output.txt output/medium-clj-1.2-output.txt' exited with error status 1
22 Copy of output file 'output/medium-clj-1.2-output.txt' (length 0 bytes) saved as 'measureproc-sWmPCJ/S1SB7br3Hv'
23 Max heap=30 MB fail cpu=4.063 sec Max RSS=75.8 MB RSS-heap=45.8 MB GC tot space=135.8 MB time=0.5 sec max live=25.8 MB new range [30,32]
24 Max heap=31 MB succeed cpu=63.264 sec Max RSS=76.4 MB RSS-heap=45.4 MB GC tot space=2871.9 MB time=44.5 sec max live=25.5 MB new range [30,31]
25
26 Succeeded at 31 MB, failed at 30 MB
27
28 Sweep phase MB values to use: 31 39 47 55 63
29 Max heap=31 MB succeed cpu=64.21 sec Max RSS=80.0 MB RSS-heap=49.0 MB GC tot space=2868.8 MB time=45.6 sec max live=26.6 MB
30 Max heap=31 MB succeed cpu=63.626 sec Max RSS=79.9 MB RSS-heap=48.9 MB GC tot space=2875.1 MB time=44.4 sec max live=24.9 MB
31 Max heap=31 MB succeed cpu=65.422 sec Max RSS=80.0 MB RSS-heap=49.0 MB GC tot space=2869.7 MB time=46.3 sec max live=26.6 MB
32 Max heap=31 MB succeed cpu=64.325 sec Max RSS=75.3 MB RSS-heap=44.3 MB GC tot space=2875.1 MB time=45.8 sec max live=26.6 MB
33 Max heap=39 MB succeed cpu=21.034 sec Max RSS=83.4 MB RSS-heap=44.4 MB GC tot space=2861.7 MB time=1.4 sec max live=35.4 MB
34 Max heap=39 MB succeed cpu=20.939 sec Max RSS=82.5 MB RSS-heap=43.5 MB GC tot space=2858.2 MB time=1.5 sec max live=35.4 MB
35 Max heap=39 MB succeed cpu=21.3 sec Max RSS=90.6 MB RSS-heap=51.6 MB GC tot space=2863.2 MB time=1.4 sec max live=35.5 MB
36 Max heap=39 MB succeed cpu=21.661 sec Max RSS=91.1 MB RSS-heap=52.1 MB GC tot space=2864.3 MB time=1.5 sec max live=35.7 MB
37 Max heap=47 MB succeed cpu=21.504 sec Max RSS=98.9 MB RSS-heap=51.9 MB GC tot space=2854.2 MB time=1.9 sec max live=40.0 MB
38 Could not parse line 26 of file /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/Du7UdDBzIQ/qyTKNgBLn0.txt:
39 3.506: [GC-- 42528K->46648K(46656K), 0.0422814 secs]
40 /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/VBRRJnGxyt/P7aYOLF3di:1: parser error : Start tag expected, '<' not found
41 Exit 2
Line 1 is the invocation of one of my Perl programs, called tmcurve for "time memory curve".
Line 4 shows summary statistics for one run of the benchmark program.
4 Max heap=128 MB succeed cpu=19.3 sec Max RSS=125.4 MB RSS-heap=-2.6 MB GC tot space=2847.5 MB time=0.7 sec max live=63.1 MB
Let's break down this summary output line into its parts:
Max heap=128 MB - It was given a max heap (-Xmx) parameter of 128
MB (-Xmx128m).
succeed - The benchmark program's output was checked, and was
correct.
cpu=19.3 sec - It used 19.3 sec of total cpu time (the user+system
cpu times reported by the Unix '/usr/bin/time' command added
together).
Max RSS=125.4 MB - The maximum memory ever resident in RAM during
the process's run was Max RSS=125.4 MB, as reported by the
getrusage() system call.
RSS-heap=-2.6 MB - This is just RSS with heap subtracted out.
This number can be positive, because the JVM has memory it needs
to run that is not in the program's heap.
GC - Everything after that is statistics collected from parsing
the HotSpot logs you get with the -verbose:gc argument to the JVM.
tot space=2847.5 MB - The total amount of memory collected by all
GC invocations was 2847.5 MB.
time=0.7 sec - This took a total of 0.7 sec
max live=63.1 MB - The maximum heap size before any GC invocation
was 63.1 MB. The name "max live" is probably not the best name
for that value, since objects in heap before a GC begins are often
not all live.
Lines 4-24 show the results of multiple runs, while tmcurve is doing a
binary search to find the smallest heap size for which the program
gives correct output.
26 Succeeded at 31 MB, failed at 30 MB
Line 26 summarizes the results of the binary search.
28 Sweep phase MB values to use: 31 39 47 55 63
Line 28 shows the MB values that will be used in the "sweep" phase.
The amount that each value is larger than the previous one can be
controlled by tmcurve command line arguments. In this case --delta 8
means that each value will be 8 MB larger than the one before. Other
tmcurve options can be used to choose how many such values are used
(in this case 5), and how many times the benchmark program will be run
for each of these MB values (in this case 4).
29 Max heap=31 MB succeed cpu=64.21 sec Max RSS=80.0 MB RSS-heap=49.0 MB GC tot space=2868.8 MB time=45.6 sec max live=26.6 MB
30 Max heap=31 MB succeed cpu=63.626 sec Max RSS=79.9 MB RSS-heap=48.9 MB GC tot space=2875.1 MB time=44.4 sec max live=24.9 MB
31 Max heap=31 MB succeed cpu=65.422 sec Max RSS=80.0 MB RSS-heap=49.0 MB GC tot space=2869.7 MB time=46.3 sec max live=26.6 MB
32 Max heap=31 MB succeed cpu=64.325 sec Max RSS=75.3 MB RSS-heap=44.3 MB GC tot space=2875.1 MB time=45.8 sec max live=26.6 MB
33 Max heap=39 MB succeed cpu=21.034 sec Max RSS=83.4 MB RSS-heap=44.4 MB GC tot space=2861.7 MB time=1.4 sec max live=35.4 MB
34 Max heap=39 MB succeed cpu=20.939 sec Max RSS=82.5 MB RSS-heap=43.5 MB GC tot space=2858.2 MB time=1.5 sec max live=35.4 MB
35 Max heap=39 MB succeed cpu=21.3 sec Max RSS=90.6 MB RSS-heap=51.6 MB GC tot space=2863.2 MB time=1.4 sec max live=35.5 MB
36 Max heap=39 MB succeed cpu=21.661 sec Max RSS=91.1 MB RSS-heap=52.1 MB GC tot space=2864.3 MB time=1.5 sec max live=35.7 MB
Lines 29-32 show the results of 4 runs with the same maximum heap size
of 31 MB. In all cases, we see that the GC time is about 2/3 of the
total cpu time.
Lines 33-36 show that when we increase the max heap size to 39 MB,
total GC time goes way down to 1.4 or 1.5 sec, and the total cpu time
thus goes down to about 21 sec. Obviously for this program, you get
much lower time with just a little bit more memory than the minimum
required to succeed.
38 Could not parse line 26 of file /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/Du7UdDBzIQ/qyTKNgBLn0.txt:
39 3.506: [GC-- 42528K->46648K(46656K), 0.0422814 secs]
40 /var/folders/dO/dOizZatdFUiWd11ofxhhPU+++TI/-Tmp-/VBRRJnGxyt/P7aYOLF3di:1: parser error : Start tag expected, '<' not found
41 Exit 2
Lines 38-41 show that I still have more work to do on parsing the
output of -verbose:gc :-) I am fairly conservative in my code for
parsing log files, spitting out an error and quitting rather than
silently guessing what the line might mean, so this is normal for me
when I have not yet seen all possible formats of log file lines.