Anyone want to take a crack at making the fasta shootout program faster?

87 views
Skip to first unread message

Andy Fingerhut

unread,
Feb 3, 2011, 1:22:49 AM2/3/11
to clo...@googlegroups.com
I've done a pass through most of the Clojure programs on the shootout
web site recently, making some of them faster, and choosing -Xmx
command line arguments when running them to keep the memory usage down
to a reasonable level -- not always the smallest heap size that works,
mind you -- just one that avoids exorbitantly large memory usage.

http://shootout.alioth.debian.org

The Clojure program for the "fasta" problem, with source code, AOT
compilation command, and execution command given on this web page:

http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=clojure&id=3

still takes about 6x to 8x more time than the best Java 6 -server
program here, depending upon which of the four machines it is run on:

http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=java&id=3

I'm sure the Clojure program can be made faster, e.g. by doing fewer
calls to write to the output file, with more bytes per call. Most of
the time seems to be file writing and generating random numbers in gen-
random!, at least on my systems where I've done testing and
profiling. I'm also seeing a fair amount of time spent in calls to
java.lang.Double.valueOf, according to the built-in profiler that
comes with the Hotspot JVM.

Note: The web site is Clojure 1.2 only right now, so don't expect a
tweaked-out program using things that only work in Clojure 1.3 to work
there yet. Other tips:

+ Test your program by comparing its output to the Java program with
the same command line args. The Java program isn't guaranteed to be
bug free, but it has been tested a fair amount.

+ If you decide to submit, please take some time to select a -Xmx max
heap size argument that gives a good tradeoff between low memory use
and not too much extra time spent doing GC because of it. I've got
some scripts to help automate finding such a value if you are
interested, and I'd even volunteer to run them for you if you send me
your program. Still, it is straightforward to do by hand, too -- no
rocket science involved.

You are of course free to work on any of the problems you wish, but I
just wanted to point out this one as being what I consider a bit
stubborn to squeeze down the run time.

Andy

Bill James

unread,
Feb 6, 2011, 8:01:41 PM2/6/11
to Clojure
On Feb 3, 12:22 am, Andy Fingerhut <andy.finger...@gmail.com> wrote:
> I've done a pass through most of the Clojure programs on the shootout  
> web site recently, making some of them faster, and choosing -Xmx  
> command line arguments when running them to keep the memory usage down  
> to a reasonable level -- not always the smallest heap size that works,  
> mind you -- just one that avoids exorbitantly large memory usage.
>
> http://shootout.alioth.debian.org
>
> The Clojure program for the "fasta" problem, with source code, AOT  
> compilation command, and execution command given on this web page:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=clo...
>
> still takes about 6x to 8x more time than the best Java 6 -server  
> program here, depending upon which of the four machines it is run on:
>
> http://shootout.alioth.debian.org/u32/program.php?test=fasta&lang=jav...
>
> I'm sure the Clojure program can be made faster, e.g. by doing fewer  
> calls to write to the output file, with more bytes per call.  Most of  
> the time seems to be file writing and generating random numbers in gen-
> random!, at least on my systems where I've done testing and  
> profiling.  I'm also seeing a fair amount of time spent in calls to  
> java.lang.Double.valueOf, according to the built-in profiler that  
> comes with the Hotspot JVM.
>
> Note: The web site is Clojure 1.2 only right now, so don't expect a  
> tweaked-out program using things that only work in Clojure 1.3 to work  
> there yet.


This program is considerably faster on my computer:


(set! *warn-on-reflection* true)


(def *width* 60)
(def *lookup-size* 222000)


(def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"
"GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"
"CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"
"ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"
"GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"
"AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"
"AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"))

(def *codes* "acgtBDHKMNRSVWY")

(def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02
0.02 0.02 0.02 0.02 0.02 0.02 0.02])

(def *homosapiens* [0.3029549426680 0.1979883004921
0.1975473066391 0.3015094502008])



(defn find-index [f coll]
(first (keep-indexed #(if (f %2) %1) coll)))




(def random-seed (int-array [42]))
(let [ IM (int 139968)
IA (int 3877)
IC (int 29573)
scale (double (/ *lookup-size* IM))
]
(defn gen-random-fast []
(let [ new-seed (unchecked-remainder (unchecked-add (unchecked-
multiply
(aget (ints random-seed) 0) IA) IC) IM) ]
(aset (ints random-seed) 0 new-seed)
(int (* new-seed scale)))))



;; Takes a vector of probabilities.
(defn make-cumulative [v]
(vec (map #(reduce + (subvec v 0 %)) (range 1 (inc (count v))))))

;; Takes a vector of cumulative probabilities.
(defn make-lookup-table [v]
(let [ lookup-scale (- *lookup-size* 0.0001)
tmp (map
(fn [n] (find-index #(<= (/ n lookup-scale) %) v))
(range *lookup-size*)) ]
(int-array tmp)))



(defn cycle-bytes [source source-size n ^java.io.BufferedOutputStream
ostream]
(let [ source-size (int source-size)
width (int *width*)
width+1 (int (inc width))
buffer-size (int (* width+1 4096))
buffer (byte-array buffer-size (byte 10))
next-i (fn[i]
(unchecked-remainder (unchecked-add (int i) width) source-
size))
next-j (fn[j]
(let [j (+ j width+1)]
(if (= j buffer-size)
(do (.write ostream buffer) 0)
j)))
]
(loop [i (int 0) j (int 0) n (int n)]
(System/arraycopy source i buffer j width)
(if (> n width)
(recur (int (next-i i)) (int (next-j j)) (- n width))
(do
(aset buffer (+ j n) (byte 10))
(.write ostream buffer 0 (+ j n 1)))))))



(defn fasta-repeat [n ^java.io.BufferedOutputStream ostream]
(let [ source (.getBytes (str *alu* *alu*)) ]
(cycle-bytes source (count *alu*) n ostream)))



(defn fasta-random [probs n ^java.io.BufferedOutputStream ostream]
(let [ codes (.getBytes (str *codes*))
lookup-table (ints (make-lookup-table (make-cumulative
probs)))
width (int *width*)
buffer (byte-array 222000)
seeds (int-array 222000)
first-seed (aget (ints random-seed) 0)
]
(loop [i (int 0)]
(aset seeds i (aget (ints random-seed) 0))
(aset buffer i
(aget codes
(aget lookup-table
(gen-random-fast))))
(if (= (aget (ints random-seed) 0) first-seed)
(do
(System/arraycopy buffer 0 buffer (inc i) *width*)
(cycle-bytes buffer (inc i) n ostream)
(aset (ints random-seed) 0 (aget seeds (mod n (inc i)))))
(recur (unchecked-inc i))))))



(defn write-line [s ^java.io.BufferedOutputStream stream]
(.write stream (.getBytes (str s "\n"))))



(let [ ostream (java.io.BufferedOutputStream. System/out)
arg (first *command-line-args*)
n (read-string (or arg "25000000"))

start-time (System/currentTimeMillis)
]

(write-line ">ONE Homo sapiens alu" ostream)
(fasta-repeat (* n 2) ostream)

(write-line ">TWO IUB ambiguity codes" ostream)
(fasta-random *iub* (* n 3) ostream)

(write-line ">THREE Homo sapiens frequency" ostream)
(fasta-random *homosapiens* (* n 5) ostream)

(.flush ostream)

(binding [*out* *err*]
(prn n)
(print (/ (- (System/currentTimeMillis) start-time) 1000.0))
(println " seconds"))

)

Andy Fingerhut

unread,
Feb 7, 2011, 11:52:40 AM2/7/11
to clo...@googlegroups.com
Nice. I get about 1/3 of the run time on the two systems I've tested
on, too, including one that isn't terribly far off in hardware from
the 64-bit benchmark machine used by the shootout web site.

I'll contact you off list about how this gets submitted to the web site.

Thanks!
Andy

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient
> with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Bill James

unread,
Feb 7, 2011, 3:37:20 PM2/7/11
to Clojure
Since this program eliminates so much of the work, it's disappointing
that
there wasn't more of a speedup.

Can you (or anyone) speed it up even further?

Andy Fingerhut

unread,
Feb 7, 2011, 4:47:26 PM2/7/11
to clo...@googlegroups.com


I spent some time doing some small tweaks.

On one machine I have, your version runs in as little as 4.43 sec of
user + system CPU time (minimum time over about half a dozen separate
runs).

My tweaked version below runs in as little as 2.55 sec.

An AOT compiled "Hello, world!" program in Clojure on this same
machine takes 1.37 sec. (Clojure 1.2.0 for all of these results).

There might be something still left to shave off, but not a lot.

Thanks,
Andy


(ns fasta
(:gen-class))

(set! *warn-on-reflection* true)


(def *width* 60)
(def *lookup-size* 222000)


(def *alu* (str "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG"
"GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA"
"CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT"
"ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA"
"GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG"
"AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC"
"AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"))

(def *codes* "acgtBDHKMNRSVWY")

(def *iub* [0.27 0.12 0.12 0.27 0.02 0.02 0.02 0.02
0.02 0.02 0.02 0.02 0.02 0.02 0.02])

(def *homosapiens* [0.3029549426680 0.1979883004921
0.1975473066391 0.3015094502008])


(defn find-index [f coll]
(loop [i (int 0)
s (seq coll)]
(if (f (first s))
i
(recur (unchecked-inc i) (rest s)))))


(def random-seed (int-array [42]))
(let [IM (int 139968)
IA (int 3877)
IC (int 29573)
scale (double (/ *lookup-size* IM))]
(defn gen-random-fast []
(let [new-seed (unchecked-remainder
(unchecked-add (unchecked-multiply
(aget (ints random-seed) 0) IA)
IC) IM)]
(aset (ints random-seed) 0 new-seed)
(int (* new-seed scale)))))


;; Takes a vector of probabilities.
(defn make-cumulative [v]
(vec (map #(reduce + (subvec v 0 %)) (range 1 (inc (count v))))))


;; Takes a vector of cumulative probabilities.
(defn make-lookup-table [v]

(let [sz (int *lookup-size*)
lookup-scale (- sz 0.0001)
^ints a (int-array sz)]
(dotimes [i sz]
(aset a i (int (find-index #(<= (/ i lookup-scale) %) v))))
a))


(defn cycle-bytes [source source-size n
^java.io.BufferedOutputStream ostream]
(let [source-size (int source-size)
width (int *width*)
width+1 (int (inc width))
buffer-size (int (* width+1 4096))

buffer (byte-array buffer-size (byte 10))]


(loop [i (int 0)
j (int 0)
n (int n)]
(System/arraycopy source i buffer j width)
(if (> n width)

(recur (int (unchecked-remainder
(unchecked-add i width) source-size))
(int (let [j (unchecked-add j width+1)]
(if (== j buffer-size)
(do (.write ostream buffer) (int 0))
j)))
(unchecked-subtract n width))


(do
(aset buffer (+ j n) (byte 10))

(.write ostream buffer (int 0) (+ j n 1)))))))


(defn fasta-repeat [n ^java.io.BufferedOutputStream ostream]
(let [source (.getBytes (str *alu* *alu*))]
(cycle-bytes source (count *alu*) n ostream)))


(defn fasta-random [probs n ^java.io.BufferedOutputStream ostream]
(let [codes (.getBytes (str *codes*))
lookup-table (ints (make-lookup-table
(make-cumulative probs)))
width (int *width*)
buffer (byte-array 222000)
seeds (int-array 222000)
first-seed (aget (ints random-seed) 0)]
(loop [i (int 0)]
(aset seeds i (aget (ints random-seed) 0))
(aset buffer i
(aget codes
(aget lookup-table
(gen-random-fast))))

(if (== (aget (ints random-seed) 0) first-seed)


(do
(System/arraycopy buffer 0 buffer (inc i) *width*)
(cycle-bytes buffer (inc i) n ostream)
(aset (ints random-seed) 0 (aget seeds (mod n (inc i)))))
(recur (unchecked-inc i))))))


(defn write-line [s ^java.io.BufferedOutputStream stream]
(.write stream (.getBytes (str s "\n"))))


(defn -main [& args]
(let [n (Integer/parseInt (nth args 0))
ostream (java.io.BufferedOutputStream. System/out)


start-time (System/currentTimeMillis)]
(write-line ">ONE Homo sapiens alu" ostream)
(fasta-repeat (* n 2) ostream)
(write-line ">TWO IUB ambiguity codes" ostream)
(fasta-random *iub* (* n 3) ostream)
(write-line ">THREE Homo sapiens frequency" ostream)
(fasta-random *homosapiens* (* n 5) ostream)

(.flush ostream)))

Bill James

unread,
Feb 7, 2011, 8:57:21 PM2/7/11
to Clojure
On Feb 7, 3:47 pm, Andy Fingerhut <andy.finger...@gmail.com> wrote:

> (defn find-index [f coll]
>    (loop [i (int 0)
>           s (seq coll)]
>      (if (f (first s))
>        i
>        (recur (unchecked-inc i) (rest s)))))
>


I didn't realize how slow my version of this was.
This change alone shaves off about half a second
on my computer.

This function needs to be built into the language.

Bill James

unread,
Feb 10, 2011, 4:17:16 AM2/10/11
to Clojure
http://shootout.alioth.debian.org/u32/benchmark.php?test=fasta&lang=all

The fastest program shown here is in Java and runs in 1.72 seconds.
However, at the bottom of the page (under "interesting alternative"
programs) there is a C++ program that runs in 0.25 seconds; it seems
to work basically the same way that my program does.

So this Clojure program will probably be relegated to the bottom of
the page.

Isaac Gouy

unread,
Feb 10, 2011, 3:47:42 PM2/10/11
to Clojure
On Feb 10, 1:17 am, Bill James <w_a_x_...@yahoo.com> wrote:
> http://shootout.alioth.debian.org/u32/benchmark.php?test=fasta〈=all
>
> The fastest program shown here is in Java and runs in 1.72 seconds.
> However, at the bottom of the page (under "interesting alternative"
> programs) there is a C++ program that runs in 0.25 seconds; it seems
> to work basically the same way that my program does.
>
> So this Clojure program will probably be relegated to the bottom of
> the page.


Should it be relegated to the bottom of the page?

Andy Fingerhut

unread,
Feb 10, 2011, 4:02:33 PM2/10/11
to clo...@googlegroups.com
It would be easier for submitters to answer that question if it was
more obvious *why* a program is in the "interesting alternative"
program section. Even a brief note in comments at the top of such
programs explaining the reason for their alternative status would be
enlightening.

Thanks,
Andy

Bill James

unread,
Feb 10, 2011, 4:41:02 PM2/10/11
to Clojure
The C++ program was evidently deprecated because it did not go through
the process of generating a random character for each character that
it output. The author realized that the pseudo-random-number-generator
had a cycle-length less than 200_000.

Both the C++ program and the Clojure program use a much more efficient
algorithm than the other programs. That may violate the rules.

Andy Fingerhut

unread,
Feb 10, 2011, 7:49:20 PM2/10/11
to clo...@googlegroups.com

If that was the reason it was deprecated, then it makes sense. It
would be nice if the rules for that problem made it more explicit that
this was not allowed.

I made a modified version of the program that calls the random number
generator for every separate character output (for the two sequences
that should be randomly generated), and unfortunately the run time is
back up to about the same as the current fastest program on the web
site, which unfortunately is 6x to 8x slower than the best Java program.

I've published that program at the link below. It contains comments
marking two lines in the function gen-random-fast where the Hotspot
JVM profiler tells me that the CPU is spending lots of time in
java.lang.Integer.valueOf (Clojure 1.2.0). I can't seem to get rid of
these calls to Integer.valueOf even after trying about half a dozen
different variations of type-hinting. Can anyone else see a way to
change the program to avoid those calls?

http://github.com/jafingerhut/clojure-benchmarks/blob/master/fasta/fasta.clj-10.clj

Thanks,
Andy

Ken Wesson

unread,
Feb 10, 2011, 9:37:11 PM2/10/11
to clo...@googlegroups.com
On Thu, Feb 10, 2011 at 7:49 PM, Andy Fingerhut
<andy.fi...@gmail.com> wrote:
> I've published that program at the link below.  It contains comments marking
> two lines in the function gen-random-fast where the Hotspot JVM profiler
> tells me that the CPU is spending lots of time in java.lang.Integer.valueOf
> (Clojure 1.2.0).  I can't seem to get rid of these calls to Integer.valueOf
> even after trying about half a dozen different variations of type-hinting.
>  Can anyone else see a way to change the program to avoid those calls?
>
> http://github.com/jafingerhut/clojure-benchmarks/blob/master/fasta/fasta.clj-10.clj

I don't get any reflection warnings with (gen-random-fast) on 1.2. Do you?

Andy Fingerhut

unread,
Feb 10, 2011, 9:49:14 PM2/10/11
to clo...@googlegroups.com

Nope. No reflection warnings. But the profiler does show significant
amounts of CPU time spent in java.lang.Integer.valueOf. Maybe it is
mistaken or misleading output from the profiler, but I'm inclined to
believe it so far.

Andy


Ken Wesson

unread,
Feb 10, 2011, 10:23:12 PM2/10/11
to clo...@googlegroups.com
On Thu, Feb 10, 2011 at 9:49 PM, Andy Fingerhut

Then you've got boxed primitives, but the Clojure compiler knows what
types they all are...

Andy Fingerhut

unread,
Feb 11, 2011, 12:17:19 AM2/11/11
to clo...@googlegroups.com

On Feb 10, 2011, at 4:49 PM, Andy Fingerhut wrote:
>
> I've published that program at the link below. It contains comments
> marking two lines in the function gen-random-fast where the Hotspot
> JVM profiler tells me that the CPU is spending lots of time in
> java.lang.Integer.valueOf (Clojure 1.2.0). I can't seem to get rid
> of these calls to Integer.valueOf even after trying about half a
> dozen different variations of type-hinting. Can anyone else see a
> way to change the program to avoid those calls?
>
> http://github.com/jafingerhut/clojure-benchmarks/blob/master/fasta/fasta.clj-10.clj

Here is something a little odd. I've looked at the disassembly of the
Java bytecode produced for this Clojure function:

(def random-seed (int-array [42]))

(let [scale (double (/ *lookup-size* 139968))]
(defn gen-random-fast []
(let [^ints random-seed random-seed


IM (int 139968)
IA (int 3877)
IC (int 29573)

zero (int 0)
new-seed (int (unchecked-remainder
(unchecked-add
(unchecked-multiply
(aget random-seed zero) IA) IC) IM))]
(aset random-seed zero new-seed) ;; <--- this line produces
odd Java bytecode
(int (* new-seed scale)))))

The marked line produces bytecodes for pushing the args on the stack,
then calling aset, and then it calls Integer.valueOf to take the int
returned by aset (which is just the value of new-seed assigned to the
array element) and convert it to an Integer, then it pops it off the
stack, discarding that Integer value. Strange. Apparently java -
server was not able to optimize away that wasted work, but I don't yet
know why the Clojure compiler generated the call in the first place.

The second call to Integer.valueOf() from the last line of the
function makes sense to me now: the int return value must be boxed
into an Integer before being returned as the value of function gen-
random-fast, since all Clojure 1.2 function arguments and returns
values must be Objects.

Given the wasted bytecode instructions mentioned above, on a hunch I
decided to assign the return value of aset to a symbol via let -- one
that is never used again. This actually produces faster code:


(def random-seed (int-array [42]))

(let [scale (double (/ *lookup-size* 139968))]
(defn gen-random-fast []
(let [^ints random-seed random-seed


IM (int 139968)
IA (int 3877)
IC (int 29573)

zero (int 0)
new-seed (int (unchecked-remainder
(unchecked-add
(unchecked-multiply
(aget random-seed zero) IA) IC) IM))
throwaway-val (int (aset random-seed zero new-seed))]
<-- this is the only changed line
(int (* new-seed scale)))))


Hmmmm. I hope such wasted instructions are rare.

Andy

Isaac Gouy

unread,
Feb 11, 2011, 4:50:27 PM2/11/11
to Clojure
Bill James doesn't seem to have had difficulty answering that question
- "Both the C++ program and the Clojure program use a much more
efficient algorithm than the other programs. That may violate the
rules."

But what about that fasta Java 6 -server #3 program?

Andy Fingerhut

unread,
Feb 11, 2011, 5:28:32 PM2/11/11
to clo...@googlegroups.com
fasta.java-3.java calls the method next(), implementing the linear
congruential generator, 200,000,000 times, once for each randomly
generated DNA character in the output file.

Andy

Isaac Gouy

unread,
Feb 12, 2011, 4:12:42 PM2/12/11
to Clojure
Yeah but it's not too hard to see why the Lisp programmer Juho
Snellman opined on HN "the [sic program] implementations seem to have
totally dived off the deep end of complexity".

Michael Gardner

unread,
Feb 12, 2011, 4:28:55 PM2/12/11
to clo...@googlegroups.com
On Feb 12, 2011, at 3:12 PM, Isaac Gouy wrote:

> Yeah but it's not too hard to see why the Lisp programmer Juho
> Snellman opined on HN "the [sic program] implementations seem to have
> totally dived off the deep end of complexity".

That's why this kind of competition is not interesting to me. As it only compares the fastest programs, there's every incentive to submit horrifically complex, optimized-to-the-hilt solutions that would almost never get used in the real world.

Rather than ask "what's the fastest this can be done in language X?", we should ask "how fast are the idiomatic ways of doing this in language X" and possibly "how hard is it to do it faster, when those are not good enough?".

Possibly just including code size/complexity with the performance metric would do the trick, though measuring that cross-language is a challenge all its own.

Isaac Gouy

unread,
Feb 12, 2011, 4:29:25 PM2/12/11
to Clojure


On Feb 10, 1:41 pm, Bill James <w_a_x_...@yahoo.com> wrote:
-snip-
> The C++ program was evidently deprecated because it did not go through
> the process of generating a random character for each character that
> it output. The author realized that the pseudo-random-number-generator
> had a cycle-length less than 200_000.
>
> Both the C++ program and the Clojure program use a much more efficient
> algorithm than the other programs.  That may violate the rules.


Now the SBCL 1.0.45 updates and the LuaJIT 2.0.0-beta6 updates are
done, and those long long running Ruby mandelbrot and Python fannkuch-
redux programs have finished - time for some Clojure.

I agree with your assessment that the C++ program and the Clojure
program use a much more efficient algorithm than the other programs.
Showing your Clojure program would go against the use the same
algorithm spirit of the benchmarks game - so it won't be shown.

In contrast, meteor-contest is pretty much open to whatever algorithm
you can dream up (with obvious exceptions - trivial programs that just
print the hard coded result).

http://shootout.alioth.debian.org/u32q/benchmark.php?test=meteor&lang=all#about

Isaac Gouy

unread,
Feb 12, 2011, 4:38:27 PM2/12/11
to Clojure


On Feb 12, 1:28 pm, Michael Gardner <gardne...@gmail.com> wrote:
> On Feb 12, 2011, at 3:12 PM, Isaac Gouy wrote:
>
> > Yeah but it's not too hard to see why the Lisp programmer Juho
> > Snellman opined on HN "the [sic program] implementations seem to have
> > totally dived off the deep end of complexity".
>
> That's why this kind of competition is not interesting to me. As it only compares the fastest programs, there's every incentive to submit horrifically complex, optimized-to-the-hilt solutions that would almost never get used in the real world.
>
> Rather than ask "what's the fastest this can be done in language X?", we should ask "how fast are the idiomatic ways of doing this in language X" and possibly "how hard is it to do it faster, when those are not good enough?".


Looking at the benchmarks game web page showing the "fasta" programs:
- 4 different Clojure programs are shown
- 2 different Java programs are shown
- 4 different C programs are shown
etc


http://shootout.alioth.debian.org/u32/performance.php?test=fasta


> Possibly just including code size/complexity with the performance metric would do the trick, though measuring that cross-language is a challenge all its own.

There is a column showing compressed source code size for each
program.


And then there's http://shootout.alioth.debian.org/u32/code-used-time-used-shapes.php

Michael Gardner

unread,
Feb 13, 2011, 12:51:54 AM2/13/11
to clo...@googlegroups.com

Those are all good, but how many people actually look at anything beyond the fastest implementation for a given language? To some extent this is their problem, sure, but it's not helped by the prominence the shootout site gives to the fastest implementations in various places.

In the first place, I'm not even sure what the purpose of all that data is. What do people do with it? Is it purely for entertainment, or do people actually use it to decide what language to use for a project? If it's just for entertainment (as the word "game" seems to imply), won't people just skim the numbers for the fastest programs and walk away with wrong impressions?

In my opinion, it's simply not meaningful nor interesting to compare the fastest implementations across different languages. But by presenting those comparisons prominently (even when it also provides other more meaningful comparisons elsewhere), the shootout site does disservice to its users and the programming community by providing a convenient yet misleading focus point.

Perhaps each language could have one "best" implementation selected by the community or by a panel of judges; then the default or most prominent comparisons could use these "best" implementations rather than the fastest ones. Either way, it would be good to remove those fastest-vs-fastest comparisons altogether, or at least reduce their prominence significantly.

Isaac Gouy

unread,
Feb 13, 2011, 3:57:11 AM2/13/11
to Clojure
On Feb 12, 9:51 pm, Michael Gardner <gardne...@gmail.com> wrote:
-snip-

> Those are all good, but ...

Before I pointed out that code size was already shown, you thought
just including such a measure would possibly "do the trick" but now
that seems not to be good enough for you.


> how many people actually look at anything beyond the fastest implementation for a given language? To some extent this is their problem, sure, ...

Their problem? Isn't it their choice?


> but it's not helped by the prominence the shootout site gives to the fastest implementations in various places.

The site name is given prominence - The Computer Language Benchmarks
Game - but if that isn't what someone wants to see they'll use a
different name.


> In the first place, I'm not even sure what the purpose of all that data is.

There's a response to that at the very top of the Help page.

-snip-
> If it's just for entertainment (as the word "game" seems to imply),

There's a response to that misunderstanding at the foot of the Help
page.



> In my opinion, it's simply not meaningful nor interesting to compare the fastest implementations across different languages.
-snip-

You're welcome to your opinion.


> Perhaps each language could have one "best" implementation selected by the community or by a panel of judges; then the default or most prominent comparisons could use these "best" implementations rather than the fastest ones.
-snip-

I look forward to seeing you publish the results of your proposed
project.
Reply all
Reply to author
Forward
0 new messages