For performance, write it in C - Part 3, Source code now available

Peter Hickman

unread,

Aug 1, 2006, 4:48:41 AM8/1/06

to

The source code is available from
http://peterhi.dyndns.org/write_it_in_c/index.html

Chad Perrin

unread,

Aug 1, 2006, 5:01:24 AM8/1/06

to

On Tue, Aug 01, 2006 at 05:48:41PM +0900, Peter Hickman wrote:
> The source code is available from
> http://peterhi.dyndns.org/write_it_in_c/index.html

Great! Thanks.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Brian K. Reid: "In computer science, we stand on each other's feet."

Jeffrey Schwab

unread,

Aug 1, 2006, 9:10:59 AM8/1/06

to

Peter Hickman wrote:
> The source code is available from
> http://peterhi.dyndns.org/write_it_in_c/index.html

Thanks. I had flagged the original thread as one to go back and read
once the dust had settled, but the web page looks a lot more user-friendly.

Isaac Gouy

unread,

Aug 1, 2006, 12:17:43 PM8/1/06

to

Peter Hickman wrote:
> The source code is available from
> http://peterhi.dyndns.org/write_it_in_c/index.html

There are some details missing from the webpages
1) which C implementation?
2) which Java implementation?
3) what hardware?

For example, using the code from the webpage

1) gcc version 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)

gcc -pipe -Wall -O3 -fomit-frame-pointer -funroll-loops -march=pentium4
latin.c -o latinc
time ./latinc > /dev/null 2>&1

user 0m0.820s
sys 0m0.000s

2) /sun-jdk-1.5.0.07/bin/javac Latin.java
time java Latin > /dev/null 2>&1

user 0m3.800s
sys 0m0.644s

3) 2GHz Intel P4

Peter Hickman

unread,

Aug 1, 2006, 12:33:11 PM8/1/06

to

Isaac Gouy wrote:
> Peter Hickman wrote:
>
>> The source code is available from
>> http://peterhi.dyndns.org/write_it_in_c/index.html
>>
>
> There are some details missing from the webpages
> 1) which C implementation?
>

[peterhickman]$ gcc -v
Using built-in specs.
Target: powerpc-apple-darwin8
Configured with: /private/var/tmp/gcc/gcc-5341.obj~1/src/configure
--disable-checking -enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
--with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib
--build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8
--target=powerpc-apple-darwin8
Thread model: posix
gcc version 4.0.1 (Apple Computer, Inc. build 5341)

> 2) which Java implementation?
>
[peterhickman]$ javac -version
javac 1.5.0_06

Additionally

[peterhickman]$ perl -V
Summary of my perl5 (revision 5 version 8 subversion 6) configuration:

[peterhickman]$ ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin]

> 3) what hardware?
>
>
Macintosh G4 with 1Gb ram

Isaac Gouy

unread,

Aug 1, 2006, 1:30:13 PM8/1/06

to

Peter Hickman wrote:
> Isaac Gouy wrote:
> > Peter Hickman wrote:
> >
> >> The source code is available from
> >> http://peterhi.dyndns.org/write_it_in_c/index.html

I recall someone stating "benchmarking without analysis is bogus".

As a first step, comment out the print statements

time ./latinc

user 0m0.492s
sys 0m0.004

time java Latin

user 0m0.992s
sys 0m0.052s

With the print statements ~5.4x
without print statements ~2.1x

iirc the Java program is shuffling around double byte unicode chars and
the C program is handling single byte chars.

csaba

unread,

Aug 1, 2006, 1:52:29 PM8/1/06

to

Peter Hickman wrote:
> The source code is available from
> http://peterhi.dyndns.org/write_it_in_c/index.html

Hmm, it would look much better for me if you had included Kristof
Bastiaensen's Curry version
into the party... Heck, if that code is not smart then nothing is, and
in a way, it's a much more
interesting question how a compiled functional language compares to
compiled imepative ones
than the thousand time discussed interpreted vs. compiled match-up.

Yes, Curry is anything but mainstream, but you can't say a decent
compiler is not at your disposal.
The Munster CC (which was used by Kristof, and AFAIK that's the most
commonly used implementation) does even have an IDE for OS X.

Regards,
Csaba

Thomas E Enebo

unread,

Aug 1, 2006, 3:32:06 PM8/1/06

to

On Wed, 02 Aug 2006, Charles O Nutter defenestrated me:
>
> First, some notes on benchmarking:
>
> - NEVER include IO when benchmarking a numeric algorithm; IO speeds vary
> greatly from system to system and can vary from run to run depending on what
> else is happening

IO can be noisy. I say avoid it for any benchmarking since it can
greatly influence timings. Usually the IO is not what you want to
measure so why add this variable into things?

> - If you're building up a large chunk of strings, write to an internal
> buffer and then write out at the end; don't write for every little tiny
> operation. At the very least, use a buffer per-line, rather than a separate
> write for every element on that line.

I just informally thought I would measure a few things involving IO.
I only changed the printing and nothing else:

Unaltered test: ~3.8s
Use of StringBuffer to print out a single row: ~2.1s
Use of StringBuffer for entire run: ~1.5s
Preallocated StringBuffer for entire run: ~1.4s

As you can see IO can have a large affect on clock time. I demonstrated
that in Java's case the IO in the benchmark accounted for over 2/3 of the
wall clock time (which is interesting because a decent chunk that is
left over is JVM startup overhead).

Some stack allocated space will likely improve the C run as well (and in
this case you can output it in a single write system call).

-Tom

Alex Young

unread,

Aug 1, 2006, 3:51:23 PM8/1/06

to

While I certainly appreciate the efforts that are going into this, I
can't help feeling it's all completely irrelevant.

We can engage in cross-implementation pissing contests until the cows
come home. None of them help make Ruby any faster.

My question to the community: is there a comprehensive benchmark suite
*for Ruby alone* that we can use to tweak compilation settings, try out
different core algorithms, and improve what is currently an improvable
situation?

If not, would a port of pybench.py be a suitable start?

--
Alex

Isaac Gouy

unread,

Aug 1, 2006, 3:57:13 PM8/1/06

to

As you're having so much fun, let me suggest you try converting the
OutputStrings to byte-arrays, and pre-allocating a byte buffer for
output like the approach taken with this program
http://shootout.alioth.debian.org/gp4/benchmark.php?test=fasta&lang=java&id=2

Ola Bini

unread,

Aug 1, 2006, 4:10:57 PM8/1/06

to

Charles O Nutter wrote:

> On 8/1/06, Alex Young <al...@blackkettle.org> wrote:
>>
>> While I certainly appreciate the efforts that are going into this, I
>> can't help feeling it's all completely irrelevant.
>
>

> My only purpose in battling these benchmarks is to help dispel the rumors
> that "Java is slow," "VMs are slow," and so on. If Ruby does move to a real
> optimizing VM, it will be a good thing...all those folks who continue to
> think that VMs are inherently bad need to join the 21st century.
>

.. Which is extremely funny, since Common Lisp have had wicked fast
virtual machines for the last 15 years (on par with C in performance).

They should catch up with the 20th century first of all. =)

--
Ola Bini (http://ola-bini.blogspot.com)
JvYAML, RbYAML, JRuby and Jatha contributor
System Developer, Karolinska Institutet (http://www.ki.se)
OLogix Consulting (http://www.ologix.com)

"Yields falsehood when quined" yields falsehood when quined.

Chad Perrin

unread,

Aug 1, 2006, 5:09:45 PM8/1/06

to

On Wed, Aug 02, 2006 at 05:04:26AM +0900, Charles O Nutter wrote:
> On 8/1/06, Alex Young <al...@blackkettle.org> wrote:
> >

> >While I certainly appreciate the efforts that are going into this, I
> >can't help feeling it's all completely irrelevant.
>
>

> My only purpose in battling these benchmarks is to help dispel the rumors
> that "Java is slow," "VMs are slow," and so on. If Ruby does move to a real
> optimizing VM, it will be a good thing...all those folks who continue to
> think that VMs are inherently bad need to join the 21st century.

. . but VMs actually are slow (to start, all else being equal).
There's a trade-off, though, and VMs tend to be faster later on in
execution for extended operations (again, all else being equal). There
are other alternatives than VMs to consider, though, and the specifics
of what one wishes to accomplish should be examined before settling on
the VM (or any other implementation style) as "the answer".

I'm kinda just babbling at this point.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]

"The first rule of magic is simple. Don't waste your time waving your
hands and hopping when a rock or a club will do." - McCloctnick the Lucid

Peter Hickman

unread,

Aug 2, 2006, 3:45:22 AM8/2/06

to

Charles O Nutter wrote:
> Ok, so there's a bunch of problems with the Java version.
>
> - In addition to the addRow run and the Java startup time you're also
> benchmarking over 5200 array modifications to set Compared values to true

That was simple because I couldn't define the array when I declared it
as I did in C.

> - Your variable naming is entirely contrary to every Java coding
> convention
> published (not a benchmark thing, but it sets off any Java devs warning
> flags)

And this affects the performance?

> - Almost all of the time spent running is spent building and printing
> strings
>
> Benchmarking just the algorithm run itself with no gross string
> creation and
> printing, I'm getting in the neighborhood of 370ms per invocation once
> HotSpot has optimized the code. I'll have more detailed numbers shortly.
>

Ola Bini

unread,

Aug 2, 2006, 3:50:50 AM8/2/06

to

Peter Hickman wrote:

> Charles O Nutter wrote:
>
>> - Your variable naming is entirely contrary to every Java coding
>> convention
>> published (not a benchmark thing, but it sets off any Java devs warning
>> flags)
>
> And this affects the performance?
>

The point Charles made with saying "but it sets off any Java devs
warning flags" is that your Java coding conventions differ so much from
regular conventions that your Java coding capacity is put into doubt. In
plain speak; are you a good enough Java programmer to write an honest
benchmark version for Java?

Peter Hickman

unread,

Aug 2, 2006, 3:52:49 AM8/2/06

to

The output of my original C and Java versions are identical. If you
write the output to a file a straight diff -s should suffice.

Peter Hickman

unread,

Aug 2, 2006, 4:01:19 AM8/2/06

to

Ola Bini wrote:
> Peter Hickman wrote:
>> Charles O Nutter wrote:
>>
>>> - Your variable naming is entirely contrary to every Java coding
>>> convention
>>> published (not a benchmark thing, but it sets off any Java devs warning
>>> flags)
>>
>> And this affects the performance?
>>
>
> The point Charles made with saying "but it sets off any Java devs
> warning flags" is that your Java coding conventions differ so much
> from regular conventions that your Java coding capacity is put into
> doubt. In plain speak; are you a good enough Java programmer to write
> an honest benchmark version for Java?
>

The question is the code itself, the names can always be renamed. Does
the code suddenly become better if I had used the regular conventions?

Does the naming convention affect performance? NO!

So having brought up such a lame point we are now trying to deflect
attention from the code by rubbishing the author. Focus on the code, not
me or whatever fantasy you have about me.

Christian Neukirchen

unread,

Aug 2, 2006, 4:43:22 AM8/2/06

to

Ola Bini <ola....@ki.se> writes:

> Charles O Nutter wrote:
>> On 8/1/06, Alex Young <al...@blackkettle.org> wrote:
>>>
>>> While I certainly appreciate the efforts that are going into this, I
>>> can't help feeling it's all completely irrelevant.
>> My only purpose in battling these benchmarks is to help dispel the
>> rumors
>> that "Java is slow," "VMs are slow," and so on. If Ruby does move to a real
>> optimizing VM, it will be a good thing...all those folks who continue to
>> think that VMs are inherently bad need to join the 21st century.
>>
>

> ... Which is extremely funny, since Common Lisp have had wicked fast

> virtual machines for the last 15 years (on par with C in performance).
>
> They should catch up with the 20th century first of all. =)

Sorry, but which CL has a "wicked fast" *virtual machine* and is on
par with C? AFAICS, they either are one or the other...

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

M. Edward (Ed) Borasky

unread,

Aug 2, 2006, 9:43:51 AM8/2/06

to

Christian Neukirchen wrote:
>>> ... Which is extremely funny, since Common Lisp have had wicked fast
>>> virtual machines for the last 15 years (on par with C in performance).
>>>
>>> They should catch up with the 20th century first of all. =)
>>>
>
> Sorry, but which CL has a "wicked fast" *virtual machine* and is on
> par with C? AFAICS, they either are one or the other...
>

I don't recall which of the "big four" open source CLs have virtual
machines, if any. IIRC the "standard Lisp benchmarks" are pretty much a
dead heat on GCL vs. CMUCL, with Clisp coming in third. When SBCL "grows
up", it will probably be as fast as GCL or CMUCL. My recollection is
that the CMUCL *compiler* is on a par with C in terms of generating x86
machine code.

"scheme48" has a virtual machine, IIRC, and it's regarded highly by the
folks who did the "vmgen" and "gforth" virtual machines, which are bred
for speed using some non-ANSI features of GCC. But that's not Common
Lisp; Scheme is in some ways easier to make "wicked fast".

Kristof Bastiaensen

unread,

Aug 2, 2006, 12:09:32 PM8/2/06

to

On Tue, 01 Aug 2006 17:48:41 +0900, Peter Hickman wrote:

> The source code is available from
> http://peterhi.dyndns.org/write_it_in_c/index.html

Here is another version in Ruby. It uses a more clever algorithm.
It makes use of the fact that permuting the rows or the columns of
a latin square still gives a latin square. It also passes a constraint
on the columns that the given number can be in, to reduce the backtracking
even more.

It runs in 15 seconds on my machine, which I find quite acceptable.

I am sure that this algorithm coded in a compiled functional or logic
language would be about as fast, or faster than your C code, but with the
advantages of a higher level language.

I still think you benchmark is flawed, because
1) the C program works only for squares of size 5
2) It shows totally no relevance to the kind of problems that you
would use Ruby for. If you want raw speed for numerical applications,
then I would suggest to use a functional language (i.e. ocaml).
3) If your application runs slowly, it is probably better to think first
about why it is slow, if you can improve the code with faster
algorithms or removing bottlenecks.

Kristof Bastiaensen

---------------- latin2.rb --------------------
require 'permutation'

$size = (ARGV.shift || 5).to_i
MaxVal = $size-1

RowPermutations = Permutation.new($size).map{|p| p.value}
BaseStr = (2..$size).to_a.join
StrPermutations = Permutation.new($size-1).map{|p| p.project(BaseStr)}

StartColumns = (1..MaxVal).to_a
def init_columns(el)
a = StartColumns.dup
a.delete_at(el-1)
return a
end

def insert(sqr, num, row, columns)
insert(sqr, num, row+1, columns) if (row == num)
columns.each do |col|
next if sqr[row][col] != ?.
sqr[row][col] = num + ?1
if (row == MaxVal)
insert(sqr, num+1, 1, init_columns(num+1))
elsif (num == MaxVal && row == MaxVal - 1)
print_solution(sqr)
else
insert(sqr, num, row+1, columns - [col])
end
sqr[row][col] = ?.
end
end

def print_solution(sqr)
RowPermutations.each do |rp|
StrPermutations.each do |sp|
0.upto(MaxVal) do |r|
print sqr[rp[r]].tr(BaseStr, sp)
print ":"
end
puts
end
end
end

$square = [("1" + BaseStr)] +
Array.new($size-1) { |i| (i+2).to_s + "." * ($size - 1) }

insert($square, 0, 1, StartColumns)

Peter Hickman

unread,

Aug 2, 2006, 12:08:21 PM8/2/06

to

Here's how it goes on my box:

[Latin]$ time ruby latin2.rb > r5

real 0m16.283s
user 0m15.380s
sys 0m0.498s

Which means that your computer is about as crap as mine :) I will add it
to the pages and make it a download. I think I need a table summarising
the timings.

William James

unread,

Aug 2, 2006, 12:24:11 PM8/2/06

to

Kristof Bastiaensen wrote:
> On Tue, 01 Aug 2006 17:48:41 +0900, Peter Hickman wrote:
>
> > The source code is available from
> > http://peterhi.dyndns.org/write_it_in_c/index.html
>
> Here is another version in Ruby. It uses a more clever algorithm.
> It makes use of the fact that permuting the rows or the columns of
> a latin square still gives a latin square. It also passes a constraint
> on the columns that the given number can be in, to reduce the backtracking
> even more.
>
> It runs in 15 seconds on my machine, which I find quite acceptable.
>
> I am sure that this algorithm coded in a compiled functional or logic
> language would be about as fast, or faster than your C code, but with the
> advantages of a higher level language.
>
> I still think you benchmark is flawed, because
> 1) the C program works only for squares of size 5
> 2) It shows totally no relevance to the kind of problems that you
> would use Ruby for. If you want raw speed for numerical applications,
> then I would suggest to use a functional language (i.e. ocaml).

Here's an OCaml version that runs in about 1.5 seconds when
output is redirected to a file on my faster computer (3.2 GHz).
It uses the same algorithm as the C program.
The C version takes 1.961 seconds when writing to /dev/null on
the o.p.'s computer. Since this is my first OCaml program, I'm
certain an expert could improve it.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list
let perms = permute list

let n = List.length perms
let incompat = Array.make_matrix n n true ;;

for x = 0 to n - 1 do
for y = 0 to n - 1 do
incompat.(x).(y) <-
List.exists2 ( = ) (List.nth perms x) (List.nth perms y)
done
done;;

let join list = String.concat "" (List.map string_of_int list)
let output_strings = List.map join perms ;;
let board = ( Array.make size 0 ) ;;

let rec add_a_row row =
if row = size then
print_endline
( String.concat ":"
(List.map (fun i -> List.nth output_strings i)
(Array.to_list board)) )
else
for latest = 0 to n - 1 do
let prev_row = ref 0 in
while (!prev_row < row) &&
(not incompat.(latest).(board.(!prev_row))) do
incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0 ;;

Isaac Gouy

unread,

Aug 2, 2006, 1:57:56 PM8/2/06

to

Peter Hickman wrote:
> Charles O Nutter wrote:
> > Ok, so there's a bunch of problems with the Java version.
> >
> > - In addition to the addRow run and the Java startup time you're also
> > benchmarking over 5200 array modifications to set Compared values to true
>
> That was simple because I couldn't define the array when I declared it
> as I did in C.

1) I made some changes to the jen.pl script you used to generate the
Java program, it now splits apart the array initialization into many
different methods - and that will let you generate a Java program for
6x6. I emailed it to you.

2) By "benchmarking without analysis is bogus" I understand that
without analysis the programs might not be doing what we think they are
doing - what we think is an apples to apples comparison might not be an
apples to apples comparison.

In this case, I understand Charles O Nutter to be saying that in C all
those boolean arrays are initialized at compile time, but in Java all
those boolean arrays are initialized at runtime - so although we're
writing a similar thing in the code, we aren't doing the same thing
when the code is run.

In the same way, we have similar print statements in the C and Java
code, but they don't do the same thing when the code is run - one deals
with bytes the other deals with Unicode double bytes.

So it depends which part of the computation we're interested in - if
we're interested in the addARow recursive calls and loops, that part of
the computation might be swamped by the time taken to to initialize the
arrays at runtime and naively print strings.

3) There seems to be a bigger problem with using this latin squares
algorithm for benchmarking - the computation explodes!

0.496s gcc 5x5 (without print statements)
30055.098s gcc 6x6 (without print statements)

William James

unread,

Aug 2, 2006, 3:34:47 PM8/2/06

to

An expert did suggest replacing a couple of lists with arrays.
This should be a bit faster.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe *)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

let incompat = Array.make_matrix n n true ;;

for x = 0 to n - 1 do
for y = 0 to n - 1 do
incompat.(x).(y) <-

List.exists2 ( = ) perms.(x) perms.(y)
done
done;;

let join list = String.concat "" (List.map string_of_int list)

let output_strings = Array.map join perms ;;

let board = ( Array.make size 0 ) ;;

(* recursive function *)

let rec add_a_row row =
if row = size then
print_endline
( String.concat ":"

(List.map (fun i -> output_strings.(i) )

sross

unread,

Aug 2, 2006, 5:12:17 PM8/2/06

to

M. Edward (Ed) Borasky wrote:
> Christian Neukirchen wrote:
> >>> ... Which is extremely funny, since Common Lisp have had wicked fast
> >>> virtual machines for the last 15 years (on par with C in performance).
> >>>
> >>> They should catch up with the 20th century first of all. =)
> >>>
> >
> > Sorry, but which CL has a "wicked fast" *virtual machine* and is on
> > par with C? AFAICS, they either are one or the other...
> >
> I don't recall which of the "big four" open source CLs have virtual
> machines, if any. IIRC the "standard Lisp benchmarks" are pretty much a
> dead heat on GCL vs. CMUCL, with Clisp coming in third. When SBCL "grows
> up", it will probably be as fast as GCL or CMUCL. My recollection is
> that the CMUCL *compiler* is on a par with C in terms of generating x86
> machine code.

Actually SBCL is a fork from CMUCL (with a focus on maintanability) and
as such uses the same compiler.
Compiler improvements are often ported between the two implementations
and when it comes to actual performance there is not much between
the two.

M. Edward (Ed) Borasky

unread,

Aug 2, 2006, 9:42:11 PM8/2/06

to

William James wrote:
> Here's an OCaml version that runs in about 1.5 seconds when
> output is redirected to a file on my faster computer (3.2 GHz).
> It uses the same algorithm as the C program.
> The C version takes 1.961 seconds when writing to /dev/null on
> the o.p.'s computer. Since this is my first OCaml program, I'm
> certain an expert could improve it.
>
>
> An expert did suggest replacing a couple of lists with arrays.
> This should be a bit faster.
>

*This* expert suggests comparing performance between the C version and
the Ocaml version on the *same* machine! :)

Jon Harrop

unread,

Aug 3, 2006, 12:22:23 AM8/3/06

to

M. Edward (Ed) Borasky wrote:

> *This* expert suggests comparing performance between the C version and
> the Ocaml version on the *same* machine! :)

On my machine (Athlon X2 4400+, Debian Linux, gcc 4.0.4, OCaml 3.09.2, Sun
JDK 1.5):

Compile Run
C 0.137s 0.386s gcc -O3 -Wall
OCaml 0.171s 0.668s ocamlopt
OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
OCaml2 0.165s 0.415s ocamlopt
OCaml2 0.165s 0.401s ocamlopt -unsafe
Java 1.565s 1.688s javac

Some niggles:

1. A lot of precalculation has been done in the C and Java, to the extent
that we're breaking Java compilers (always a bad sign).

2. Java is nothing like as slow as the OP claimed.

3. OCaml is only 4% slower than C and only 10% slower with bounds checking.

4. Whether Ruby is fast enough or not, you should be programming in OCaml
and not in Ruby or C. ;-)

Here's my OCaml2 (based upon William's):

let rec fact n = if n=0 then 1 else n*fact(n-1)

let size = 5
let n = fact size

(* Permutation generator *)
let p = Array.make size 0
let xx = Array.init size (fun i -> if i<size-1 then 0 else -1)
let gen_perm() =
let i = ref (size - 1) in
while !i > 0 && xx.(!i) = !i do
xx.(!i) <- 0;
decr i
done;
if !i = 0 then 1 else begin
xx.(!i) <- xx.(!i) + 1;
p.(0) <- 1;
for i=0 to size - 1 do
p.(i) <- p.(i - xx.(i));
p.(i - xx.(i)) <- i+1
done;
0
end

(* Permutations *)
let perms = Array.init n (fun _ -> ignore(gen_perm()); Array.copy p)

let incompat =
let aux x y =
Array.iteri (fun i px -> if px = perms.(y).(i) then raise Exit) perms
(x);
false in
Array.init n (fun x -> Array.init n (fun y -> try aux x y with Exit ->
true))

let join list = String.concat "" (Array.to_list (Array.map string_of_int
list))

let output_strings = Array.map join perms

let board = Array.make size 0

let op = String.make (size*(size+1)-1) ':'

let rec add_a_row row =

if row = size then begin
for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;
print_string op;
print_string "\n"
end else

for latest = 0 to n - 1 do
let prev_row = ref 0 in

let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row
done;
if !prev_row = row then begin

board.(row) <- latest;
add_a_row (row + 1)

end
done

let () = add_a_row 0

--
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Peter Hickman

unread,

Aug 3, 2006, 4:04:57 AM8/3/06

to

I have run the first Ocaml version on the same box as all the other
timings have come from and it sits between the C version with default
compiler settings and the C version optimised for speed. The web page
has been updated to include the Ocaml source and Kristof Bastiaensen's
faster Ruby version. Plus a simple table so that you can view all the
results.

I will take the revised Ocaml version and time it and check the output
tonight.

Peter Hickman

unread,

Aug 3, 2006, 4:08:11 AM8/3/06

to

I only have this on my box.

[peterhickman]$ ocaml -version
The Objective Caml toplevel, version 3.09.2

Is this good enough for the version you wrote?

Peter Hickman

unread,

Aug 3, 2006, 4:30:49 AM8/3/06

to

Charles O Nutter wrote:
> I'd be interested in seeing all of these run without IO as well...that
> was
> the big bottleneck for Java.

Can we just clear something up here. The Latin squares mini project was
not plucked out of the air just to be a poster child for my first post.
It was part of a project that I am working on and as such the program
has to produce some output! It does not make any sense to optimise the
program in any language if we are not going to produce any output.
Otherwise I would go with this heavily optimised C version:

[peterhickman]$ cat latin.c
int
main(int argc, char *argv[])
{
return 0;
}

This is an application, it has to produce output. Cease this willy
waving otherwise you are saying "For performance, remove the output".

Chad Perrin

unread,

Aug 3, 2006, 6:22:19 AM8/3/06

to

On Thu, Aug 03, 2006 at 01:30:04PM +0900, Jon Harrop wrote:
>
> 4. Whether Ruby is fast enough or not, you should be programming in OCaml
> and not in Ruby or C. ;-)

Actually, OCaml is one of the languages in which I've been meaning to
get competent at some point in the hopefully-near future (for some
definition of "near"). I find that I pick up a language faster if
there's a good community for it whose means of communication suits the
way I think, though. What sort of online OCaml community resources are
available? Perl's biggest strength, in terms of interactive community,
seems (for my tastes) to be the PerlMonks site, and this list fills the
same need for Ruby. What comparable community interaction mechanism
could I use for OCaml?

I'm often disappointed by the options for a given language. For
instance, I recently (earlier this year) added another language to my
list of loves: the UCBLogo implementation of the Logo language (more a
"Lisp without parentheses" than any other Logo of which I'm aware, as it
provides macro support). It has no online community at all, except for
a bunch of half-baked educators who want to use Turtle Graphics to teach
elementary and middle school kids that computers aren't scary, which is
definitely not the community I want for my own purposes. My issue with
the Python community is far less drastic -- it actually has one of the
better programming language online communities, all things considered,
of course -- but the fact that the cultural rigidity of the Python
community seems so pervasive drives me away (as does that
fork-in-the-eye sensation I get when reading Python source). Et cetera.
Both Ruby and Perl are stand-out examples of excellence for online
communities, and while I don't need the perfection of either
necessarily, I hope OCaml provides something close.

So. Does it?

Wow, that was a lot of rambling for that question. Just to keep it
on-topic:

Thanks for being a good community, guys.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]

"A script is what you give the actors. A program
is what you give the audience." - Larry Wall

Jon Harrop

unread,

Aug 3, 2006, 8:29:01 AM8/3/06

to

You probably also have:

$ ocamlopt -v
The Objective Caml native-code compiler, version 3.09.2
Standard library directory: /usr/lib/ocaml/3.09.2

In which case you just do:

$ ocamlopt latin.ml -o latin
$ time ./latin >/dev/null

Jon Harrop

unread,

Aug 3, 2006, 8:29:39 AM8/3/06

to

Peter Hickman wrote:
> I will take the revised Ocaml version and time it and check the output
> tonight.

Yeah. I didn't bother doing that. ;-)

Peter Hickman

unread,

Aug 3, 2006, 8:37:59 AM8/3/06

to

Yup, got that. Will time and test it tonight.

Peter Hickman

unread,

Aug 3, 2006, 8:59:40 AM8/3/06

to

Jon Harrop wrote:
> Peter Hickman wrote:
>
>> I will take the revised Ocaml version and time it and check the output
>> tonight.
>>
>
> Yeah. I didn't bother doing that. ;-)
>
>

Not wanting to name and shame but there was one submission (off list)
that was faster for completely the wrong reasons. Besides I get a warm
fuzzy feeling when the results match because the pain of checking which
program is actually wrong is too great. As my Perl, Java and C versions
all use the same Perl to pre compute their values they all produced the
same results, consistent yes but are they correct? I'm pretty confident
now, especially now that I have independently written programs produce
the same results.

The next stage, which is to stop it reporting rotational and flipped
versions of a grid, is going to be quite a grind to test.

Any views on Practical Ocaml by Joshua B. Smith, does it look like it
might be a good book to learn from?

Kristof Bastiaensen

unread,

Aug 3, 2006, 9:38:11 AM8/3/06

to

On Thu, 03 Aug 2006 01:08:21 +0900, Peter Hickman wrote:

> Here's how it goes on my box:
>
> [Latin]$ time ruby latin2.rb > r5
>
> real 0m16.283s
> user 0m15.380s
> sys 0m0.498s
>
> Which means that your computer is about as crap as mine :)

Yes. I was really suprised to see that your timings where longer than
mine :) But I am happy with the speed of my computer, though emacs
startup is a bit slow... :)

> I will add it
> to the pages and make it a download. I think I need a table summarising
> the timings.

I just realised that it is better to move the permutations on the
row string (the ones that use String#tr) outside the row permutations,
to avoid recalculating the rows each time. The only method I changed is
print_solution. The program runs almost twice as fast now! (0m7.8s on my
computer).

Regards,
Kristof

----------------- latin.rb, version 2 -----------------------

require 'permutation'

def print_solution(sqr)
StrPermutations.each do |sp|
newsqr = sqr.map { |r| r.tr(BaseStr, sp) }
RowPermutations.each do |rp|
rp.each do |r|
print newsqr[r]

Peter Hickman

unread,

Aug 3, 2006, 9:27:56 AM8/3/06

to

I will add it to the page.

Kristof Bastiaensen

unread,

Aug 3, 2006, 10:40:02 AM8/3/06

to

On Thu, 03 Aug 2006 21:59:40 +0900, Peter Hickman wrote:

> Jon Harrop wrote:
>> Peter Hickman wrote:
>>
>>> I will take the revised Ocaml version and time it and check the output
>>> tonight.
>>>
>>>
>> Yeah. I didn't bother doing that. ;-)
>>
>>
> Not wanting to name and shame but there was one submission (off list)
> that was faster for completely the wrong reasons. Besides I get a warm
> fuzzy feeling when the results match because the pain of checking which
> program is actually wrong is too great. As my Perl, Java and C versions
> all use the same Perl to pre compute their values they all produced the
> same results, consistent yes but are they correct? I'm pretty confident
> now, especially now that I have independently written programs produce
> the same results.
>

> <snip>

I used the following tests:

"ruby latin.rb | wc -l" to test if the number of squares matches the
number that is written in the wikipedia article.
Then "ruby latin.rb | sort | uniq | wc -l" shows if all the squares that
are output by the program are different.
All that remains to be done is to check that each latin square is
actually a latin square. That wouldn't be so much work, but I just looked
at some of the output, and trusted that the rest of the squares are
correct too :)

Regards,
Kristof

Peter Hickman

unread,

Aug 3, 2006, 10:49:17 AM8/3/06

to

The thing that was bothering me was "have I missed any?" Sure the
results I had were correct but I was unsure if they were complete. For
example the Perl permutation module and the Ruby one could perhaps share
some libpermute.so that I was unaware of and therefore harbour the same
error - this is not the case however. Or even be both developed from the
same source. Paranoia I know but many libraries and modules are based on
existing code rather than developed from scratch so bring subtle bugs
with them. Like the one recently found in the binary search, see
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html.

But given that there is no relationship between the Perl, Ruby and Ocaml
versions I am sure that there will be no surprises.

wneu...@gmail.com

unread,

Aug 3, 2006, 11:35:19 AM8/3/06

to

Chad Perrin wrote:

> I find that I pick up a language faster if
> there's a good community for it whose means of communication suits the
> way I think, though. What sort of online OCaml community resources are
> available?

Primarily, the online resources come from one of the two mailing lists,
OCaml-Beginners[1] for introductory to intermediate questions, and the
main OCaml list[2] for intermediate to advanced questions, or questions
on the OCaml internals and whatnot. There are a few other places where
info moves, like Richard Jones' cocan wiki [3], but at the moment
things are primarily handled through the lists (though it might be
about time for something like perlmonks to arrive as the community is
growing).

[1] http://groups.yahoo.com/group/ocaml_beginners/
[2] http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
[3] http://wiki.cocan.org/

wneu...@gmail.com

unread,

Aug 3, 2006, 11:49:58 AM8/3/06

to

Peter Hickman wrote:

> Any views on Practical Ocaml by Joshua B. Smith, does it look like it
> might be a good book to learn from?

Hard to say, as it won't be available for another three months, and I
can't even find a sample table of contents...

However, in the meantime, some of the best resources are freely
available. In addition to parts I, III, and IV of the OCaml manual, I
recommend anyone starting with OCaml to begin with Jason Hickey's book
[1], and then look at the translation of the older French O'Reilly book
[2] (note however, that the book is a bit out of date, but this only
crops up as a real issue a few times -- feel free to hit the
ocaml-beginners mailing list [3] with questions). Richard Jones also
has a pretty nice tutorial available online [4]. Once you get going,
there are more specific resources for tools like OCamllex, ocamlyacc,
and camlp4.

[1] http://files.metaprl.org/doc/ocaml-book.pdf
[2] http://caml.inria.fr/pub/docs/oreilly-book/
[3] http://groups.yahoo.com/group/ocaml_beginners/
[4] http://www.ocaml-tutorial.org/

Isaac Gouy

unread,

Aug 3, 2006, 12:20:12 PM8/3/06

to

1) Over the past 3 or 4 days, people have said what's wrong with the
way you're producing output in Java, and suggested what you should do
instead. Is it that you don't know enough Java to follow their
suggestions and fix the program?

A couple of days ago single measurements on my machine gave
0.820s user+sys, gcc -pipe -Wall -O3 -fomit-frame-pointer
-funroll-loops -march=pentium4
4.444s user+sys, sun-jdk-1.5.0.07

Simply writing Java that converts the double-byte Unicode Java
outputStrings to single-byte /once/ instead of everytime there's a
print statement, and following the usual Java pattern of wrapping
output with a BufferedOutputStream, reduces that to

1.396s user+sys, sun-jdk-1.5.0.07

Yesterday I emailed you another version of your Java generating Perl
script that fixes those print problems, the output matches the C
program output.

2) If this is an application, and it has to produce output, and it has
to produce output for something larger than 6x6, then I think you need
a better algorithm.

You've mentioned the 5x5 C program took 2.473s, and the 6x6 took 9900s
- at that rate might we expect 7x7 in 15 months?

Peter Hickman

unread,

Aug 3, 2006, 12:53:41 PM8/3/06

to

Isaac Gouy wrote:
> 1) Over the past 3 or 4 days, people have said what's wrong with the
> way you're producing output in Java, and suggested what you should do
> instead. Is it that you don't know enough Java to follow their
> suggestions and fix the program?
>

So far the only language that has come close to running as fast as the C
is the Ocaml version and since then I have found other ways to speed up
the C version. Just as I have posted the Ruby and Ocaml version if any
certified Java expert cares to show me how it should be done (*cough*
Charles O Nutter *cough*) then I will be happy to time it and check the
output and report the results. That way we can get away from all this
stupidity like the fact that I didn't use the correct naming convention.
Which we all know hugely affects performance. But remember that the
timings will be based on a program that produces output.

> A couple of days ago single measurements on my machine gave
> 0.820s user+sys, gcc -pipe -Wall -O3 -fomit-frame-pointer
> -funroll-loops -march=pentium4
> 4.444s user+sys, sun-jdk-1.5.0.07
>
> Simply writing Java that converts the double-byte Unicode Java
> outputStrings to single-byte /once/ instead of everytime there's a
> print statement, and following the usual Java pattern of wrapping
> output with a BufferedOutputStream, reduces that to
>
> 1.396s user+sys, sun-jdk-1.5.0.07
>
> Yesterday I emailed you another version of your Java generating Perl
> script that fixes those print problems, the output matches the C
> program output.
>

If it was the email I replied to yesterday then reason that it got close
to the speed of the C version was because it didn't do any output. Once
I put the output back in it was only around 5 seconds faster than my own
Java version and around 15 seconds slower than the C version. Hardly
encouragement to drop C and pursue Java. Remember I am running the
program to get the output.

>
> 2) If this is an application, and it has to produce output, and it has
> to produce output for something larger than 6x6, then I think you need
> a better algorithm.
>
> You've mentioned the 5x5 C program took 2.473s, and the 6x6 took 9900s
> - at that rate might we expect 7x7 in 15 months?
>

Actually the real problem is the amount of disk space I need to store
the output. 6 x 6 resulted in 32Gb of output. The 7 x 7 will probably
eat my hard disk well before it completes. I'm not sure that the 1Tb I
have will be enough so I am looking at removing the rotational and
mirrored versions of the grids from the output.

Isaac Gouy

unread,

Aug 3, 2006, 1:15:24 PM8/3/06

to

Peter Hickman wrote:
> Isaac Gouy wrote:
> > 1) Over the past 3 or 4 days, people have said what's wrong with the
> > way you're producing output in Java, and suggested what you should do
> > instead. Is it that you don't know enough Java to follow their
> > suggestions and fix the program?
> >
> So far the only language that has come close to running as fast as the C
> is the Ocaml version and since then I have found other ways to speed up
> the C version. Just as I have posted the Ruby and Ocaml version if any
> certified Java expert cares to show me how it should be done (*cough*
> Charles O Nutter *cough*) then I will be happy to time it and check the
> output and report the results.

So you don't know enough Java?

> That way we can get away from all this
> stupidity like the fact that I didn't use the correct naming convention.
> Which we all know hugely affects performance. But remember that the
> timings will be based on a program that produces output.
> > A couple of days ago single measurements on my machine gave
> > 0.820s user+sys, gcc -pipe -Wall -O3 -fomit-frame-pointer
> > -funroll-loops -march=pentium4
> > 4.444s user+sys, sun-jdk-1.5.0.07
> >
> > Simply writing Java that converts the double-byte Unicode Java
> > outputStrings to single-byte /once/ instead of everytime there's a
> > print statement, and following the usual Java pattern of wrapping
> > output with a BufferedOutputStream, reduces that to
> >
> > 1.396s user+sys, sun-jdk-1.5.0.07
> >
> > Yesterday I emailed you another version of your Java generating Perl
> > script that fixes those print problems, the output matches the C
> > program output.
> >
> If it was the email I replied to yesterday then reason that it got close
> to the speed of the C version was because it didn't do any output. Once
> I put the output back in it was only around 5 seconds faster than my own
> Java version and around 15 seconds slower than the C version. Hardly
> encouragement to drop C and pursue Java. Remember I am running the
> program to get the output.

If it didn't produce any output I would have mentioned that, as I have
before.

Check your email - the subject line reads " For performance use ascii
output"

William James

unread,

Aug 3, 2006, 2:07:23 PM8/3/06

to

A small point.
print_endline op
is shorter. Is it faster?

wneu...@gmail.com

unread,

Aug 3, 2006, 2:42:15 PM8/3/06

to

Jon Harrop wrote:

> Here's my OCaml2 (based upon William's):

Just a small note here:
Depending on your goals, you can do even better by stuffing the output
in a buffer then dumping the buffer at the end of computation (see
example code below). Of course this consumes more memory, and isn't
possible on a larger board, but it's yet another tweak to exploit,
since we're benchmarking output in addition to benchmarking work (note
that on my G5 output to stdout is a bit slower than your version, but
this isn't the case on the AMD machines I have access to -- redirecting
to a file is always faster).

let rec add_a_row row =

let buf = Buffer.create 1_000_000 in
let rec addh row =
match row with
| x when x=size ->

for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;

Buffer.add_string buf op; Buffer.add_char buf '\n'
| _ ->

for latest = 0 to n - 1 do
let prev_row = ref 0 in
let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row
done;

if !prev_row = row then (board.(row) <- latest; addh (row + 1))
done in
addh row;
Buffer.output_buffer stdout buf

let () =
add_a_row 0;

Chad Perrin

unread,

Aug 3, 2006, 2:46:34 PM8/3/06

to

Thanks. I'll look into it.

You mention it needs something like PerlMonks -- which is interesting,
considering I've noticed there's nothing like that for Ruby, either.
I've found myself wishing a couple of times that there was something
like a RubyMonks kinda thing (though in a more Rubyish idiom than *Monks
would be), but every time I consider the notion I find myself wishing
that PerlMonks were just a bit more multi-language, focusing primarily
on languages I personally find interesting. Bah.

I've considered putting something vaguely like that together myself,
actually, but I'm lazy and too much of a perfectionist. Since I'd be
working toward my own project spec to create such a website, I'd
probably just end up spending months working on what I want the site to
do and never get around to writing a line of code.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]

"The first rule of magic is simple. Don't waste your time waving your
hands and hopping when a rock or a club will do." - McCloctnick the Lucid

William James

unread,

Aug 3, 2006, 2:48:42 PM8/3/06

to

Jon Harrop wrote:

> let incompat =
> let aux x y =
> Array.iteri (fun i px -> if px = perms.(y).(i) then raise Exit) perms
> (x);

Perhaps google has mangled you code; at this point ocaml reports

File "latin-squares-H.ml", line 30, characters 4-15:
This function is applied to too many arguments,
maybe you forgot a `;'

Neumann

unread,

Aug 3, 2006, 3:52:53 PM8/3/06

to

William James wrote:

> File "latin-squares-H.ml", line 30, characters 4-15:
> This function is applied to too many arguments,
> maybe you forgot a `;'

It lost a . for array access. It should be (slightly reformatted
trying to lose nothing):

let incompat =
let aux x y =
Array.iteri
(fun i px -> if px = perms.(y).(i) then raise Exit)

perms.(x);

William James

unread,

Aug 3, 2006, 4:26:21 PM8/3/06

to

William James wrote:
> Jon Harrop wrote:
> ....

> > let rec add_a_row row =
> > if row = size then begin
> > for i=0 to size-1 do
> > String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
> > done;
> > print_string op;
> > print_string "\n"
>
> A small point.
> print_endline op
> is shorter. Is it faster?

Not with the interpreter; it's slower. Haven't tested the compiler yet.

Neumann

unread,

Aug 3, 2006, 4:42:01 PM8/3/06

to

William James wrote:

> > A small point.
> > print_endline op
> > is shorter. Is it faster?
>
> Not with the interpreter; it's slower. Haven't tested the compiler yet.

If anything, it'll be slower, as print_endline flushes the buffer every
call, whereas print_string and print_char do not.

Jon Harrop

unread,

Aug 3, 2006, 7:32:48 PM8/3/06

to

Peter Hickman wrote:
> Any views on Practical Ocaml by Joshua B. Smith, does it look like it
> might be a good book to learn from?

It looks very good but not as good as my book. ;-)

http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html

Jon Harrop

unread,

Aug 3, 2006, 7:40:37 PM8/3/06

to

Isaac Gouy wrote:
> Simply writing Java that converts the double-byte Unicode Java
> outputStrings to single-byte /once/ instead of everytime there's a
> print statement, and following the usual Java pattern of wrapping
> output with a BufferedOutputStream, reduces that to
>
> 1.396s user+sys, sun-jdk-1.5.0.07

Post your code here.

Jon Harrop

unread,

Aug 3, 2006, 7:41:15 PM8/3/06

to

Jon Harrop wrote:
> M. Edward (Ed) Borasky wrote:
>> *This* expert suggests comparing performance between the C version and
>> the Ocaml version on the *same* machine! :)
>
> On my machine (Athlon X2 4400+, Debian Linux, gcc 4.0.4, OCaml 3.09.2, Sun
> JDK 1.5):
>
> Compile Run
> C 0.137s 0.386s gcc -O3 -Wall
> OCaml 0.171s 0.668s ocamlopt
> OCaml 0.161s 0.626s ocamlopt -inline 100 -unsafe
> OCaml2 0.165s 0.415s ocamlopt
> OCaml2 0.165s 0.401s ocamlopt -unsafe
> Java 1.565s 1.688s javac

That was 64-bit. Here are my 32-bit timings (language versions are the
same):

Compile Run
C 0.126s 0.386s gcc -O3 -Wall
OCaml2 0.089s 0.391s ocamlopt
OCaml2 0.010s 10.111s ocamlc
Java 1.615s 12.745s javac

Again, OCaml is basically as fast as C. Note that C and OCaml get slower
moving to 64-bit but Java gets >7x faster. Amazingly, OCaml's interpreted
bytecode is faster than Java on 32-bit, whilst being >160x faster to
compile!

Here's my latest OCaml (I think we could revert to the simpler list-based
permuter because no significant time is spent generating the permutations):

let rec fact n = if n=0 then 1 else n*fact(n-1)

let size = 5

(* Permutation generator *)

let p = Array.make size 0
let xx = Array.init size (fun i -> if i<size-1 then 0 else -1)
let gen_perm() =
let i = ref (size - 1) in
while !i > 0 && xx.(!i) = !i do
xx.(!i) <- 0;
decr i
done;
if !i = 0 then 1 else begin
xx.(!i) <- xx.(!i) + 1;
p.(0) <- 1;
for i=0 to size - 1 do
p.(i) <- p.(i - xx.(i));
p.(i - xx.(i)) <- i+1
done;
0
end

let n = fact size

(* Permutations *)
let perms = Array.init n (fun _ -> ignore(gen_perm()); Array.copy p)

let incompat =
let rec aux (px : int array) py i =
i<size && (px.(i) = py.(i) || aux px py (i+1)) in
Array.init n (fun x -> Array.init n (fun y -> aux perms.(x) perms.(y) 0))

let join list = String.concat "" (Array.to_list (Array.map string_of_int
list))
let output_strings = Array.map join perms
let board = Array.make size 0

let op = String.make (size*(size+1)) ':'

let rec add_a_row row =

if row=size then begin

for i=0 to size-1 do
String.blit output_strings.(board.(i)) 0 op (i*(size+1)) size
done;
print_string op

end else

for latest = 0 to n - 1 do
let prev_row = ref 0 in
let incompat = incompat.(latest) in
while !prev_row < row && not incompat.(board.(!prev_row)) do
incr prev_row;
done;

if !prev_row = row then begin
board.(row) <- latest;
add_a_row (row + 1)
end
done

let () =
op.[size*(size+1)-1] <- '\n';
add_a_row 0

Kristof Bastiaensen

unread,

Aug 3, 2006, 8:28:28 PM8/3/06

to

On Fri, 04 Aug 2006 01:53:41 +0900, Peter Hickman wrote:

>> 2) If this is an application, and it has to produce output, and it has
>> to produce output for something larger than 6x6, then I think you need
>> a better algorithm.
>>
>> You've mentioned the 5x5 C program took 2.473s, and the 6x6 took 9900s
>> - at that rate might we expect 7x7 in 15 months?
>>
> Actually the real problem is the amount of disk space I need to store
> the output. 6 x 6 resulted in 32Gb of output. The 7 x 7 will probably
> eat my hard disk well before it completes. I'm not sure that the 1Tb I
> have will be enough so I am looking at removing the rotational and
> mirrored versions of the grids from the output.

Why would you want to store all this data? If you want to store the 7x7
version, you will need 3187.2 TB to store all the squares. To store
only the reduced squares you still need 920 MB. To store the reduced
squares for 8x8 you will need 35.6TB. (I used the wikipedia article
on http://en.wikipedia.org/wiki/Latin_square to compute these numbers).

My program can easily show only the reduced squares by removing the
permutations from the solutions method, and it will be a lot faster too.
If you want to do this for large squares it would be better to rewrite my
algorithm in C. But I am interested how this data could ever be useful.

Regards,
Kristof

Isaac Gouy

unread,

Aug 3, 2006, 8:23:32 PM8/3/06

to

No.

That version is on Peter Hickman's website, perhaps he'll replace it
with the newer version on Friday.

William James

unread,

Aug 3, 2006, 8:29:10 PM8/3/06

to

Jon Harrop wrote:

Here's a faster version of my program.

Eliminated a "not" in a loop by replacing the array of booleans
that shows incompatibility of two rows with an array that shows
compatibility.

Borrowed you idea of creating an output line ahead of time and
modifying it in place.

Timings:
1.11 yours
1.04 mine

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
| (hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
| x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ] ;;

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

(* Used to determine if one row is compatible with another. *)
let compatible = Array.make_matrix n n true ;;
for x = 0 to n - 1 do
for y = 0 to n - 1 do
compatible.(x).(y) <-
List.for_all2 ( <> ) perms.(x) perms.(y)
done
done

let join list = String.concat "" (List.map string_of_int list)

let output_strings = Array.map join perms

(* For speed, create a string that's the length of the lines
that we'll print; the :'s that aren't needed as separators
will later be overwritten. *)
let output_line = String.make (size*(size+1)-1) ':' ^ "\n"

let board = Array.make size 0

let rec add_a_row row =

if row = size then

( for i=0 to size-1 do
String.blit
output_strings.(board.(i)) 0 (* source *)
output_line (i*(size+1)) (* dest *)
size
done;
print_string output_line
)

else
for latest = 0 to n - 1 do

(* Create a changeable thing (variable). *)

let prev_row = ref 0 in

(* The ! below fetches the variable's value. *)
while (!prev_row < row) &&
(compatible.(latest).(board.(!prev_row))) do

incr prev_row
done;
if !prev_row = row then

( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0

Jon Harrop

unread,

Aug 3, 2006, 8:50:29 PM8/3/06

to

Isaac Gouy wrote:
> That version is on Peter Hickman's website, perhaps he'll replace it
> with the newer version on Friday.

Curiously the two Java implementations on Peter's site perform the same on
my machine in x86 and they are >32x slower than OCaml. I know Java is slow
but that is ridiculous...

Isaac Gouy

unread,

Aug 3, 2006, 9:45:44 PM8/3/06

to

You're probably doing something wrong.

Jon Harrop

unread,

Aug 3, 2006, 9:51:37 PM8/3/06

to

William James wrote:
> Here's a faster version of my program.
>
> Eliminated a "not" in a loop by replacing the array of booleans
> that shows incompatibility of two rows with an array that shows
> compatibility.

Heh, should've thought of that. :-)

> Borrowed you idea of creating an output line ahead of time and
> modifying it in place.
>
> Timings:
> 1.11 yours
> 1.04 mine

I get the opposite order:

0.394s yours
0.390s mine

but your timings are probably more accurate because your computer is so
slow. ;-)

You can make it slightly faster by factoring out the CSE "compatible
(latest)", giving:

0.382s

That's actually faster than the C here. Woohoo! :-)

Using bitvectors may also improve things, depending if skipping a bunch of
comparisons in that loop is significant. However, I think it is time we
searched for a new task to optimise. Ray tracer anyone? ;-)

William James

unread,

Aug 3, 2006, 10:28:51 PM8/3/06

to

Jon Harrop wrote:
> William James wrote:
> > Here's a faster version of my program.
> >
> > Eliminated a "not" in a loop by replacing the array of booleans
> > that shows incompatibility of two rows with an array that shows
> > compatibility.
>
> Heh, should've thought of that. :-)
>
> > Borrowed you idea of creating an output line ahead of time and
> > modifying it in place.
> >
> > Timings:
> > 1.11 yours
> > 1.04 mine
>
> I get the opposite order:
>
> 0.394s yours
> 0.390s mine
>
> but your timings are probably more accurate because your computer is so
> slow. ;-)

It's 800MHz. I ran each program 4 times and took the average.

>
> You can make it slightly faster by factoring out the CSE "compatible
> (latest)", giving:
>
> 0.382s
>
> That's actually faster than the C here. Woohoo! :-)

Good point. That cuts my time to 1.01 seconds.
Also cleaned up the code a bit.
Here's the final version.

(* compile with:
ocamlopt -unsafe -inline 100 latin-squares.ml -o latin-squares.exe
*)

(* permutation code by Eric C. Cooper *)
let rec distribute elt = function
(hd :: tl) as list -> (elt :: list) ::
(List.map (fun x -> hd :: x) (distribute elt tl))
| [] -> [ [elt] ]
let rec permute = function
x :: rest -> List.flatten (List.map (distribute x) (permute rest))
| [] -> [ [] ]

let list = [ 1; 2; 3; 4; 5 ]
let size = List.length list

let perms = Array.of_list (permute list)
let n = Array.length perms

(* Boolean array used to determine if one row is

compatible with another. *)
let compatible = Array.make_matrix n n true ;;

Array.iteri (fun x ex ->
Array.iteri (fun y ey ->
compatible.(x).(y) <- List.for_all2 (<>) ex ey) perms ) perms

let join list = String.concat "" (List.map string_of_int list)
let output_strings = Array.map join perms

(* For speed, create a string that's the length of the lines
that we'll print; the :'s that aren't needed as separators
will later be overwritten. *)
let output_line = String.make (size*(size+1)-1) ':' ^ "\n"
let board = Array.make size 0

(* A recursive function. *)

let rec add_a_row row =
if row = size then
( for i=0 to size-1 do
String.blit
output_strings.(board.(i)) 0 (* source *)
output_line (i*(size+1)) (* dest *)
size
done;
print_string output_line
)
else
for latest = 0 to n - 1 do

let compatible_slice = compatible.(latest) in

(* Create a changeable thing (variable). *)
let prev_row = ref 0 in
(* The ! below fetches the variable's value. *)

while !prev_row < row &&
compatible_slice.(board.(!prev_row)) do

incr prev_row
done;
if !prev_row = row then
( board.(row) <- latest ; add_a_row (row + 1) )
done
;;

add_a_row 0

>

Jon Harrop

unread,

Aug 3, 2006, 10:28:10 PM8/3/06

to

Jon Harrop wrote:
> Here are my 32-bit timings (language versions are the same):
>
> Compile Run
> C 0.126s 0.386s gcc -O3 -Wall
> OCaml2 0.089s 0.391s ocamlopt
> OCaml2 0.010s 10.111s ocamlc
> Java 1.615s 12.745s javac

Here's a C++ implementation:

C++ 0.652s 0.608s g++ -O3 -Wall

(95% of time is spent in addRow())

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

int fact(int n) { return n == 0 ? 1 : n*fact(n-1); }

int size=5;
vector<int> list(size);
int n = fact(size);
vector< vector<int> > perms(n);
vector< vector<bool> > compat(n);
vector<int> board(size);
string out(size*(size+1), ':');

void print() {
for (int i=0; i<size; ++i)
for (int j=0; j<size; ++j)
out[i*(size+1) + j] = '1' + perms[board[i]][j];
cout << out;
}

int next(vector<bool> &compat, int row) {
int prev_row = 0;
while (prev_row < row && compat[board[prev_row]])
++prev_row;
return prev_row;
}

void addRow(int row) {
if (row == size) print(); else
for (int latest=0; latest<n; ++latest) {
if (next(compat[latest], row) == row) {
board[row] = latest;
addRow(row+1);
}
}
}

int main() {
for (int i=0; i<size; ++i) list[i] = i;

for (int i=0; i<n; ++i) {
perms[i] = vector<int>(list.begin(), list.end());
next_permutation(list.begin(), list.end());
}

for (int i=0; i<n; ++i) {
compat[i] = vector<bool>(n);
for (int j=0; j<n; ++j) {
compat[i][j] = true;
for (int k=0; k<size; ++k)
if (perms[i][k] == perms[j][k]) {
compat[i][j] = false;
break;
}
}
}

out[size*(size+1)-1] = '\n';

addRow(0);

Jon Harrop

unread,

Aug 3, 2006, 10:31:04 PM8/3/06

to

Isaac Gouy wrote:
> You're probably doing something wrong.

Other people have posted not entirely dissimilar timings for Java and I'm
doing exactly the same thing on x86_64 where is runs much faster.

Has anyone managed to get Java close to OCaml or C?

Isaac Gouy

unread,

Aug 3, 2006, 11:09:12 PM8/3/06

to

Peter Hickman's measurements were displayed on his website before you
asked for the Java source code - they show the current Java program
taking ~2.15x the time of the current OCaml program, on his machine.

Jon Harrop

unread,

Aug 4, 2006, 12:04:12 AM8/4/06

to

Jon Harrop wrote:
>> Here are my 32-bit timings (language versions are the same):
>>
>> Compile Run
>> C 0.126s 0.386s gcc -O3 -Wall
>> OCaml2 0.089s 0.391s ocamlopt
>> OCaml2 0.010s 10.111s ocamlc
>> Java 1.615s 12.745s javac
>
> Here's a C++ implementation:
>
> C++ 0.652s 0.608s g++ -O3 -Wall

For Java2 from the site and using Sun's Java 1.5 instead of GIJ 1.4.2
(d'oh!), I get:

Java 0.543s 0.711s javac

That's a much more respectable running time although the compile time is
comparatively poor.

Chad Perrin

unread,

Aug 4, 2006, 2:44:57 AM8/4/06

to

On Fri, Aug 04, 2006 at 11:30:11AM +0900, William James wrote:
> Jon Harrop wrote:
> >

> > but your timings are probably more accurate because your computer is so
> > slow. ;-)
>
> It's 800MHz. I ran each program 4 times and took the average.

I guess that means we should be testing everything on the laptop I'm
currently using as a remote terminal for my server. It's a 366MHz
dinosaur.

. . except I'm doing real work right now, and don't need to be bogging
stuff down.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]

print substr("Just another Perl hacker", 0, -2);

Peter Hickman

unread,

Aug 4, 2006, 4:19:50 AM8/4/06

to

Time for another update.

Isaac Gouy provided a Java implementation based on mine (ie still pre
computes the tables in Perl) that brought the times down to sub 9 seconds.

real 0m8.966s
user 0m5.815s
sys 0m1.488s

But the big news is that William James' revision of his previous Ocaml
version is now the fastest.

real 0m3.660s
user 0m1.958s
sys 0m1.421s

The source code for both are available on the web site for you to examine.

Peter Hickman

unread,

Aug 4, 2006, 4:43:59 AM8/4/06

to

Thanks for that fix. We now have an even faster Ocaml version

real 0m2.260s
user 0m1.867s
sys 0m0.221s

Peter Hickman

unread,

Aug 4, 2006, 4:50:52 AM8/4/06

to

Mmmm. Your code convinces me, you have a sale.

William James

unread,

Aug 4, 2006, 5:53:40 AM8/4/06

to

I stole code and ideas from Jon Harrop, so this should be called
the James/Harrop version.

William James

unread,

Aug 4, 2006, 6:06:47 AM8/4/06

to

I just noticed that your site doesn't have my last version (the 4th).
Here it is again:

(* Thanks to Jon Harrup for code and ideas. *)

Peter Hickman

unread,

Aug 4, 2006, 6:16:47 AM8/4/06

to

When I get home I'll time it and add it.

I'm beginning to wonder if my computer is slow enough to give meaningful
timings as the Ocaml versions seem to be pushing for sub second timings :)

William James

unread,

Aug 4, 2006, 6:56:33 AM8/4/06

to

In order to be fair to gcc, I'd like to point out one thing.
Although I used your algorithm, I tweaked it in an attempt to make
it faster. So I think that you should take my add_a_row function
and translate it back to C.

Peter Hickman

unread,

Aug 4, 2006, 8:50:07 AM8/4/06

to

I changed the use of the bool data type (which is really an int) to char
and shrunk the size of the program and sped it up.

real 0m1.885s
user 0m1.649s
sys 0m0.168s

I think this is because the top most loop, when processing the top row,
basically just zooms down a given row of the compared array. With four
chars being packed in where one bool would be more of the tests will be
using data that is available in the CPUs cache.

Peter Hickman

unread,

Aug 4, 2006, 10:12:03 AM8/4/06

to

Sorry but I have to retract these timings. They were from the tests to
find the best gcc compiler settings and do not include the Perl pre
calculation for each run so they cannot be compared to any of the other
timings. Sorry about that.

Isaac Gouy

unread,

Aug 4, 2006, 10:30:41 AM8/4/06

to

Peter Hickman wrote:
> Time for another update.
>
> Isaac Gouy provided a Java implementation based on mine (ie still pre
> computes the tables in Perl) that brought the times down to sub 9 seconds.
>
> real 0m8.966s
> user 0m5.815s
> sys 0m1.488s

sub 9 seconds?
7.3s
"real" is elapsed time, which includes all those other processes that
grabbed CPU after you gave the time command.

Tim Bray

unread,

Aug 4, 2006, 11:32:18 AM8/4/06

to

Peter Hickman wrote:
> Time for another update.
>

..

>
> The source code for both are available on the web site for you to examine.

That URL again? -Tim

>
>
>

Peter Hickman

unread,

Aug 4, 2006, 11:32:32 AM8/4/06

to

This is a good point but in all the posts where I have mentioned the
time taken I have used the real time (which is the best case from 10
runs - except for the first Perl version) all the way back to the first
and fourth versions in Perl (473 and 12 minutes). However it does not
affect the ordering except to push my improved C version ahead of
William James' revised Ocaml version by just 0.029 of a second.

This late in the proceedings it would only confuse the issue to start
saying that the first Perl version ran in 251 minutes when 473 minutes
has been mentioned several times. Besides now that we are getting into
such short times for the Ocaml and C versions the difference between
real and user + sys is sub second and it is becoming harder to say that
one is significantly faster then the other. One more run of William
James' program and it might make up that 0.029 of a second difference.

Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

ara.t....@noaa.gov

unread,

Aug 4, 2006, 11:41:01 AM8/4/06

to

On Sat, 5 Aug 2006, Peter Hickman wrote:

> Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?

showing how it performs on a 2, 4, and 8 grid would show how it scaled...

-a
--
happiness is not something ready-made. it comes from your own actions.
- h.h. the 14th dali lama

Peter Hickman

unread,

Aug 4, 2006, 11:41:39 AM8/4/06

to

http://peterhi.dyndns.org/write_it_in_c/index.html

Peter Hickman

unread,

Aug 4, 2006, 11:54:54 AM8/4/06

to

ara.t....@noaa.gov wrote:
> On Sat, 5 Aug 2006, Peter Hickman wrote:
>
>> Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?
>
> showing how it performs on a 2, 4, and 8 grid would show how it scaled...
>
> -a

I'll test the 6 x 6 in Ocaml, C and Java but I'm not sure that I will be
going for the best of 10 runs on this. 2 and 4 are simply too fast to
measure accurately except for the slowest code. Someone calculated that
some values higher than 6 would take months to run and 8 might be a
little too far into that ballpark.

I do have a very noisy and hot AMD 64 4400 with 2 Gb ram running Ubuntu.
Perhaps I will set this as the new benchmark machine.

William James

unread,

Aug 4, 2006, 2:00:43 PM8/4/06

to

I wish that you would dump Perl and do everything in C.

Isaac Gouy

unread,

Aug 4, 2006, 3:23:42 PM8/4/06

to

Peter Hickman wrote:
-snip-
> if any certified Java expert cares to show me how it should be done (*cough*
> Charles O Nutter *cough*) then I will be happy to time it and check the
> output and report the results. That way we can get away from all this
> stupidity ...
-snip

Now do you understand that when you wrote "C is still faster by an
order of magnitude" it had nothing to do with Java and had everything
to do with your "bogus" program.

In other words - Charles O Nutter was right, and you were wrong.

Depending on the machine hardware, we should expect a vanilla Java
implementation to take 1.5x - 2.5x longer than C for this problem.

Jon Harrop

unread,

Aug 4, 2006, 4:31:40 PM8/4/06

to

Isaac Gouy wrote:
> Depending on the machine hardware, we should expect a vanilla Java
> implementation to take 1.5x - 2.5x longer than C for this problem.

Only if by "vanilla" you mean optimised Java.

Jon Harrop

unread,

Aug 4, 2006, 5:39:04 PM8/4/06

to

William James wrote:
> I wish that you would dump Perl and do everything in C.

I posted an OCaml implementation that used an imperative, array-based
permuter. We could just port that back to C (or find the site with the C
code that I translated).

Thomas E Enebo

unread,

Aug 4, 2006, 6:22:57 PM8/4/06

to

On Sat, 05 Aug 2006, Jon Harrop defenestrated me:

> Isaac Gouy wrote:
> > Depending on the machine hardware, we should expect a vanilla Java
> > implementation to take 1.5x - 2.5x longer than C for this problem.
>
> Only if by "vanilla" you mean optimised Java.

Did you look at Isaac's implementation? It is almost exactly the same it
was before, but he is not printing out the latin squares as unicode (which
none of the other impls are doing to my knowledge) and it is writing to
a BufferedReader (which any Java programmer with a couple months of
experience would/should do).

I think of things like manual loop unrolling and other similiar techniques
as optimization techniques. The stuff he did was common sense and I think
closer to the C impl in action (these languages are not the same -- some
consideration should be taken in this).

My only question on this thread about Java is what Java is being used
for the test (used on the webpage -- a good addition to the page)? I ran
Java 1.4, Java 5 and Java 6 (beta but what the hell). The speedup was
surprising and exciting (obviously I do Java programming). I went from
1.1s to 0.84s between the versions.

-Tom

Isaac Gouy

unread,

Aug 4, 2006, 6:58:57 PM8/4/06

to

What things in that program do you consider optimizations?

I also provided a Java program with the same inner loop code as the
OCaml program but it hasn't appeared on Peter Hickman's website yet.

Isaac Gouy

unread,

Aug 4, 2006, 7:10:19 PM8/4/06

to

Peter Hickman wrote:
> Isaac Gouy wrote:
> > Peter Hickman wrote:
> >
> >> Time for another update.
> >>
> >> Isaac Gouy provided a Java implementation based on mine (ie still pre
> >> computes the tables in Perl) that brought the times down to sub 9 seconds.
> >>
> >> real 0m8.966s
> >> user 0m5.815s
> >> sys 0m1.488s
> >>
> >
> > sub 9 seconds?
> > 7.3s
> > "real" is elapsed time, which includes all those other processes that
> > grabbed CPU after you gave the time command.
> >
> >
> >
> >
> This is a good point but in all the posts where I have mentioned the
> time taken I have used the real time (which is the best case from 10
> runs - except for the first Perl version) all the way back to the first
> and fourth versions in Perl (473 and 12 minutes). However it does not
> affect the ordering except to push my improved C version ahead of
> William James' revised Ocaml version by just 0.029 of a second.
>
> This late in the proceedings it would only confuse the issue to start
> saying that the first Perl version ran in 251 minutes when 473 minutes
> has been mentioned several times.

Having repeatedly made a mistake in the past is no reason to continue
making the same mistake in the future!

(You're timing for Perl includes 3 or 4 hours of you surfing the web or
installing OCaml or whatever!)

Message has been deleted

Isaac Gouy

unread,

Aug 4, 2006, 7:17:26 PM8/4/06

to

It scales badly

0.496s gcc 5x5 (without print statements)
30055.098s gcc 6x6 (without print statements)

imo If Peter has figured out what he's trying to do then it would be
smart to find a better way of doing it rather than scrabbling for small
percentage improvements.

ara.t....@noaa.gov

unread,

Aug 4, 2006, 7:31:30 PM8/4/06

to

indeed - that's my point exactly.

with that kind scaling a better ruby version could kill the c one!

Isaac Gouy

unread,

Aug 4, 2006, 7:39:24 PM8/4/06

to

ara.t.how...@noaa.gov wrote:
> On Sat, 5 Aug 2006, Isaac Gouy wrote:
>
> >
> > ara.t....@noaa.gov wrote:
> >> On Sat, 5 Aug 2006, Peter Hickman wrote:
> >>
> >>> Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?
> >>
> >> showing how it performs on a 2, 4, and 8 grid would show how it scaled...
> >>
> >> -a
> >> --
> >> happiness is not something ready-made. it comes from your own actions.
> >> - h.h. the 14th dali lama
> >
> > It scales badly
> >
> > 0.496s gcc 5x5 (without print statements)
> > 30055.098s gcc 6x6 (without print statements)
> >
> > imo If Peter has figured out what he's trying to do then it would be
> > smart to find a better way of doing it rather than scrabbling for small
> > percentage improvements.
>
> indeed - that's my point exactly.

At last, something we can agree about - that makes all this silliness
worthwhile :-)

Jon Harrop

unread,

Aug 4, 2006, 9:20:19 PM8/4/06

to

Thomas E Enebo wrote:
>> Only if by "vanilla" you mean optimised Java.
>
> Did you look at Isaac's implementation? It is almost exactly the same
> it
> was before, but he is not printing out the latin squares as unicode (which
> none of the other impls are doing to my knowledge) and it is writing to
> a BufferedReader (which any Java programmer with a couple months of
> experience would/should do).

Yes.

> I think of things like manual loop unrolling and other similiar
> techniques
> as optimization techniques.

They are all optimisations.

> The stuff he did was common sense and I think
> closer to the C impl in action (these languages are not the same -- some
> consideration should be taken in this).

These languages are not the same. However, using "ordinary" IO via
System.out.println or cout or print_endline is the "vanilla" way of
outputting in all of these languages. Of these languages, only Java gives
appalling performance unless you switch to buffers.

Even when you do use buffers in Java, it is still slower than unbuffered
C++, OCaml etc. on my machine.

> My only question on this thread about Java is what Java is being used
> for the test (used on the webpage -- a good addition to the page)?

I was using Sun's J2SE 1.5. I also tried GNU's GIJ, which is even slower
than OCaml's bytecode.

Isaac Gouy

unread,

Aug 4, 2006, 10:59:29 PM8/4/06

to

Jon Harrop wrote:
-snip-

> These languages are not the same. However, using "ordinary" IO via
> System.out.println or cout or print_endline is the "vanilla" way of
> outputting in all of these languages. Of these languages, only Java gives
> appalling performance unless you switch to buffers.

I guess you'd fail a Java exam :-)

(I'm not interested enough to find out, but I suspect stdout is at
minimum line-buffered.)

> Even when you do use buffers in Java, it is still slower than unbuffered
> C++, OCaml etc. on my machine.

No where near an order of magnitude slower - iirc you reported 1.85x C

> I was using Sun's J2SE 1.5. I also tried GNU's GIJ, which is even slower
> than OCaml's bytecode.

You failed the Java exam again :-)
For Java without JIT use the -Xint command line option with the Sun JVM
like this

time java -Xint Latin > /dev/null

And if the program ran for any amount of time we of course be using

time java -server Latin > /dev/null

Jon Harrop

unread,

Aug 5, 2006, 12:28:34 AM8/5/06

to

Isaac Gouy wrote:
> (I'm not interested enough to find out, but I suspect stdout is at
> minimum line-buffered.)

Sure, the naive implementations in other languages are doing silly things as
well (like the OCaml flushing after every line) but they are still faster
than the most optimised Java to date.

>> Even when you do use buffers in Java, it is still slower than unbuffered
>> C++, OCaml etc. on my machine.
>
> No where near an order of magnitude slower - iirc you reported 1.85x C

I get 0.604s unoptimised OCaml vs 1.740s unoptimised Java on x86_64. That's
2.9x slower.

>> I was using Sun's J2SE 1.5. I also tried GNU's GIJ, which is even slower
>> than OCaml's bytecode.
>
> You failed the Java exam again :-)
>
> For Java without JIT use the -Xint command line option with the Sun JVM
> like this
>
> time java -Xint Latin > /dev/null

That's as slow as ocamlc.

> And if the program ran for any amount of time we of course be using
>
> time java -server Latin > /dev/null

I already tried and it improved performance slightly.

Chad Perrin

unread,

Aug 5, 2006, 2:05:46 AM8/5/06

to

On Sat, Aug 05, 2006 at 07:22:57AM +0900, Thomas E Enebo wrote:
> On Sat, 05 Aug 2006, Jon Harrop defenestrated me:
>

> > Only if by "vanilla" you mean optimised Java.
>
> Did you look at Isaac's implementation? It is almost exactly the same it
> was before, but he is not printing out the latin squares as unicode (which
> none of the other impls are doing to my knowledge) and it is writing to
> a BufferedReader (which any Java programmer with a couple months of
> experience would/should do).

Not to get embroiled in a nascent flamewar, or anything, but . . .

I don't think the fact that everyone in one language's community does
something that people in other languages' communities don't so they can
get better performance makes what they're doing any less an
optimization. It just makes it an exceedingly common optimization (and
might indicate that language implementation defaults should be changed).

Then again, maybe not.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
"A script is what you give the actors. A program
is what you give the audience." - Larry Wall

Isaac Gouy

unread,

Aug 5, 2006, 2:39:54 AM8/5/06

to

Jon Harrop wrote:
> Isaac Gouy wrote:
> > (I'm not interested enough to find out, but I suspect stdout is at
> > minimum line-buffered.)
>
> Sure, the naive implementations in other languages are doing silly things as
> well (like the OCaml flushing after every line) but they are still faster
> than the most optimised Java to date.

afaict none of the OCaml implementations work with the data created by
that Perl script - like the C and Java implementations - why is that?

>
> >> Even when you do use buffers in Java, it is still slower than unbuffered
> >> C++, OCaml etc. on my machine.
> >
> > No where near an order of magnitude slower - iirc you reported 1.85x C
>
> I get 0.604s unoptimised OCaml vs 1.740s unoptimised Java on x86_64. That's
> 2.9x slower.
>
> >> I was using Sun's J2SE 1.5. I also tried GNU's GIJ, which is even slower
> >> than OCaml's bytecode.
> >
> > You failed the Java exam again :-)
> >
> > For Java without JIT use the -Xint command line option with the Sun JVM
> > like this
> >
> > time java -Xint Latin > /dev/null
>
> That's as slow as ocamlc.

as slow as? does that mean faster :-)

Kristof Bastiaensen

unread,

Aug 5, 2006, 5:47:02 AM8/5/06

to

On Sat, 05 Aug 2006 08:31:30 +0900, ara.t.howard wrote:

> On Sat, 5 Aug 2006, Isaac Gouy wrote:
>
>>
>> ara.t....@noaa.gov wrote:
>>> On Sat, 5 Aug 2006, Peter Hickman wrote:
>>>
>>>> Perhaps stepping up to a 6 x 6 grid would allow more meaningful timings?
>>>
>>> showing how it performs on a 2, 4, and 8 grid would show how it scaled...
>>>
>>> -a
>>> --
>>> happiness is not something ready-made. it comes from your own actions.
>>> - h.h. the 14th dali lama
>>
>> It scales badly
>>
>> 0.496s gcc 5x5 (without print statements)
>> 30055.098s gcc 6x6 (without print statements)
>>
>> imo If Peter has figured out what he's trying to do then it would be
>> smart to find a better way of doing it rather than scrabbling for small
>> percentage improvements.
>
> indeed - that's my point exactly.
>
> with that kind scaling a better ruby version could kill the c one!

Did you actually run the 6x6 version? I think my Ruby version might
actually be faster than the C one for large squares, since it only
computes normalised squares. A reasonable estimate indicates that
for 6x6 squares it will take about 23500 seconds on my computer. But I
didn't test it, because that's still longer than I am prepared to wait for
it :)

Kristof

Jon Harrop

unread,

Aug 5, 2006, 10:59:56 AM8/5/06

to

Isaac Gouy wrote:
> Jon Harrop wrote:
>> Isaac Gouy wrote:
>> > (I'm not interested enough to find out, but I suspect stdout is at
>> > minimum line-buffered.)
>>
>> Sure, the naive implementations in other languages are doing silly things
>> as well (like the OCaml flushing after every line) but they are still
>> faster than the most optimised Java to date.
>
> afaict none of the OCaml implementations work with the data created by
> that Perl script - like the C and Java implementations - why is that?

The authors couldn't be bothered because it is easier to generate that data
in OCaml.

>> > For Java without JIT use the -Xint command line option with the Sun JVM
>> > like this
>> >
>> > time java -Xint Latin > /dev/null
>>
>> That's as slow as ocamlc.
>
> as slow as? does that mean faster :-)

No, that means even slower.

Peter Hickman

unread,

Aug 7, 2006, 4:41:18 AM8/7/06

to

Thomas E Enebo wrote:
> My only question on this thread about Java is what Java is being used
> for the test (used on the webpage -- a good addition to the page)? I ran
> Java 1.4, Java 5 and Java 6 (beta but what the hell). The speedup was
> surprising and exciting (obviously I do Java programming). I went from
> 1.1s to 0.84s between the versions.
>
> -Tom
>
>

[peterhickman]$ javac -version
javac 1.5.0_06

I think that's the current best version for the Mac. It does tend to lag
behind the rest of the world when it comes to releases.

I will stick a language versions page up tonight.

Peter Hickman

unread,

Aug 7, 2006, 5:04:14 AM8/7/06

to

Actually this is a miss reading. The whole point of the first post was
to point out that the speed up from going from Perl (or any scripting
language) to C. I used the same Perl code to pre compute values to
create the C header file because I already had code at this point that
did the work and the impact of doing the pre compute was pretty much
insignificant to the overall run time. I then used to same technique to
get the Java implementation. I have never claimed that the C version was
the best possible implementations only that it delivered a massive
performance boost. I think that I have proven that.

There are optimisations that could be applied to the C version but they
are only going to get percentage improvements and are getting much
harder to reliably measure.

As to Ocaml versions, well they were submitted and proved to be faster
than my C version. C is quite a fragile language to develop in and so
any high level language that can deliver that performance is very
interesting. To me at least.

And all this Java stuff, well that comes from the post by Charles O Nutter:

> - Write it in C is as valid as write it in Java (as someone else
mentioned).
> Java is at least as fast as C for most algorithms.

It was the "at least as fast as" bit that I disputed and the whole Java
sub thread comes from that.

The only requirement is that I can get an implementation of the language
in question on my Macintosh.