Stuart:
When I AOT compile your code with Clojure 1.4.0 -- after adding (:gen-class) to ns declaration to fit in with my clojure-benchmarks environment, in case that makes a difference -- I get these reflection warnings:
Reflection warning, revcomp.clj:50 - call to invokePrim can't be resolved.
Reflection warning, revcomp.clj:54 - call to invokePrim can't be resolved.
Those lines are the two calls to the function revcomp inside of function with-each-line-rc. As a consequence of the reflection, performance is pretty bad -- much slower than the current Alioth web site code. I give some measurements below. There are more reflection warnings if I AOT-compile with Clojure 1.3.0.
If I remove the two ^long type hints for the arguments of function revcomp, the reflection warnings go away, but the output of the program is incorrect. The goal of a reverse complement program is not to reverse each line of a DNA sequence, but each entire DNA sequence, where a DNA sequence includes all lines found after a line beginning with >, up to but not including the next such line, or the end of file.
I made some small changes to correct that (link below -- feel free to commit the changes if you like). Below are some measurements for several versions of the reverse complement program. All times are elapsed seconds, not total CPU time, and give only the fastest time of 3 different runs.
Stuart's program with reflection warnings left in the code: 31.3 sec
Stuart's program with reflection warnings eliminated, but output is incorrect: 13.9 sec
Stuart's program with reflection warnings eliminated and Andy's modifications to give correct output: 13.4 sec
revcomp.clojure-4.clojure is the current fastest Clojure program on the Alioth web site (link below)
revcomp.clojure-4.clojure: 4.8 sec (Alioth 64-bit quad-core elapsed time: 4.25 sec)
revcomp.java-3.java is the current fastest Java program on the Alioth web site, at least for the quad-core benchmark machines, and I am running on a quad-core machine, too (link below)
revcomp.java-3.java: 1.5 sec (Alioth 64-bit quad-core elapsed time: 1.31 sec)
More details about the performance numbers, if you are curious:
AOT compilation was done in all cases, and the compilation time is not included in what is reported.
All runs were with the same input file used by the Alioth web site for reporting performance of reverse complement programs. It uses N=25,000,000. The file is approximately 250 megabytes in size.
All numbers are using Oracle JDK 1.7.0_04 for Linux, and Clojure 1.4.0, which is what the Alioth web site is currently using. My machine is similar but not identical to the Alioth 64-bit quad-core benchmark machine, so comparing my absolute numbers with theirs give similar but not identical results. It is best to compare the numbers from my machine to each other, which use the same machine/OS/JDK/Clojure-version, but different programs.
Link to my edited version of Stuart's program:
https://github.com/jafingerhut/clojure-benchmarks/blob/master/revcomp/revcomp.clj-14-fixes.clj
Link to revcomp.clojure-4.clojure 64-bit quad-core results and source code:
http://shootout.alioth.debian.org/u64q/program.php?test=revcomp&lang=clojure&id=4
Link to revcomp.java-3.java 64-bit quad-core results and source code:
http://shootout.alioth.debian.org/u64q/program.php?test=revcomp&lang=java&id=3
Andy