[Caml-list] HLVM ray tracer performance

Jon Harrop

unread,

Jan 8, 2010, 8:39:25 AM1/8/10

to caml...@inria.fr

I just published results for the ray tracer benchmark written in HLVM and
compared to other languages including OCaml:

http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html

Note that these results were obtained with HLVM's multicore garbage collector
enabled.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

shaw...@msu.edu

unread,

Jan 10, 2010, 1:29:54 PM1/10/10

to caml...@inria.fr

Jon,
I�wanted to run the raytracing benchmark myself to see if Haskell really was that slow. I'm using ghc 6.10 because that's what ubuntu comes with. I�don't know if ghc 6.12 generates slower executables than 6.10 or what else might be going on. I�ran each several times and the numbers I�pasted are typical (+/- 0.2 seconds, say).

jeff@ubuntu:~/Desktop$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.4
jeff@ubuntu:~/Desktop$ g++ --version
g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.� There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

jeff@ubuntu:~/Desktop$ ocamlopt -v
The Objective Caml native-code compiler, version 3.11.1
Standard library directory: /usr/lib/ocaml

I�compiled the raytracers for c++, haskell and ocaml from

http://www.ffconsultancy.com/languages/ray_tracer/code/5

and used the compile instructions at

http://www.ffconsultancy.com/languages/ray_tracer/benchmark.html

though I�had to change the haskell one to use just ghc instead of specifying a version. I�also ran the ocaml and haskell code in the 1/ directory, and they completed within 0.1 seconds of each other.

c++
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real�� 0m3.515s
user�� 0m3.440s
sys�� 0m0.016s

haskell
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real�� 0m5.811s
user�� 0m5.752s
sys�� 0m0.032s

ocaml
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real�� 0m6.572s
user�� 0m6.544s
sys�� 0m0.016s

Jeff

Jon Harrop

unread,

Jan 10, 2010, 2:00:19 PM1/10/10

to caml...@yquem.inria.fr, shaw...@msu.edu

On Sunday 10 January 2010 18:29:42 shaw...@msu.edu wrote:
> Jon,
>
> I wanted to run the raytracing benchmark myself to see if Haskell really
> was that slow. I'm using ghc 6.10 because that's what ubuntu comes with.
> I don't know if ghc 6.12 generates slower executables than 6.10 or what
> else might be going on.

I used GHC 6.12 with --make -O2 to get the results from the recent article
because it generated results faster than GHC 6.10. However, I failed to
detect that only the Haskell was generating garbage output. Rerunning the
benchmark with GHC 6.10 here, Haskell does give the correct answer but the
times are even worse than those I quoted.

> I ran each several times and the numbers I pasted
> are typical (+/- 0.2 seconds, say).
>
> jeff@ubuntu:~/Desktop$ ghc --version
> The Glorious Glasgow Haskell Compilation System, version 6.10.4
> jeff@ubuntu:~/Desktop$ g++ --version
> g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1
> Copyright (C) 2009 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> jeff@ubuntu:~/Desktop$ ocamlopt -v
> The Objective Caml native-code compiler, version 3.11.1
> Standard library directory: /usr/lib/ocaml

I used g++ 4.3.3 and OCaml 3.11.1 on a 64-bit Linux kernel running 32-bit
userland. The machine is an 8-core with two Quad-Core AMD Opteron(tm) 2352
Processors running at 2.1GHz. AFAICT they have 512kb L2 caches each and 2Mb
L3 caches per quadcore CPU.

> I compiled the raytracers for c++, haskell and ocaml from
>
> http://www.ffconsultancy.com/languages/ray_tracer/code/5
>
> and used the compile instructions at
>
> http://www.ffconsultancy.com/languages/ray_tracer/benchmark.html
>
> though I had to change the haskell one to use just ghc instead of
> specifying a version. I also ran the ocaml and haskell code in the 1/
> directory, and they completed within 0.1 seconds of each other.
>
> c++
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m3.515s
> user    0m3.440s
> sys    0m0.016s
>
> haskell
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m5.811s
> user    0m5.752s
> sys    0m0.032s
>
> ocaml
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m6.572s
> user    0m6.544s
> sys    0m0.016s

Are you running x64 or on Intel hardware? What results do you get for 12, 13
or 14 instead of 9?

Richard Jones

unread,

Jan 10, 2010, 3:37:24 PM1/10/10

to Jon Harrop, caml...@yquem.inria.fr, shaw...@msu.edu

On Sun, Jan 10, 2010 at 08:14:29PM +0000, Jon Harrop wrote:
> on a 64-bit Linux kernel running 32-bit userland

I'm assuming you mean x86 (not eg ppc64), in which case that's a very
unusual choice. Any reason for this?

Rich.

--
Richard Jones
Red Hat

Jeff Shaw

unread,

Jan 10, 2010, 7:47:47 PM1/10/10

to Jon Harrop, caml...@yquem.inria.fr

> Are you running x64 or on Intel hardware? What results do you get for 12, 13
> or 14 instead of 9?
>
>

I am running an AMD Phenom 9950, but the Ubuntu I'm using is just
32-bit. I tried 5/ray.hs with level=12 instead of 9 but it ran into a
stack overflow problem. When I increased the stack size it completed but
it also took more time than 1/ray.hs, which required no stack size
increase. I made sure that the other arguments I fed it were the same. I
think there is some problem that needs to be worked out in the 5/ray.hs.
Maybe the problem is in ghc, I'm not sure. Below, ./ray5 is 5/ray.hs,
and ./ray is 1/ray.hs

jeff@ubuntu:~/Desktop$ time ./ray 12 512 > /dev/null

real 0m21.479s
user 0m21.093s
sys 0m0.180s
jeff@ubuntu:~/Desktop$ time ./ray5 12 512 +RTS -K2000000000 > /dev/null

real 0m28.366s
user 0m25.674s
sys 0m2.608s
jeff@ubuntu:~/Desktop$ time ./ray 14 512 > /dev/null

real 0m23.544s
user 0m23.021s
sys 0m0.500s

I tried level=14 but I ran out of memory for 5/ray.ml and 5/ray.hs.

I considered that maybe I had saved the files from your website wrong,
or mixed them up during compilation. So I ran the timer again with
level=9 and level=12 and got all the same results. That is, level=9 is
faster on 5/ray.hs but level=12 is faster with 1/ray.hs. So I don't
think I'm making a simple manual labor error.

It seems that 5/ray.ml and 5/ray.hs aren't quite equivalent in some
important way since 1/ray.ml is faster than 5/ray.ml for both level=9
and level=12. Whether it's a code problem or compiler problem, I cannot say.

The stack size problem does not go away when I remove all the extra
optimization arguments to ghc.

--Jeff

Jon Harrop

unread,

Jan 11, 2010, 4:33:27 AM1/11/10

to caml...@yquem.inria.fr, Jeff Shaw

On Monday 11 January 2010 00:47:26 Jeff Shaw wrote:
> > Are you running x64 or on Intel hardware? What results do you get for 12,
> > 13 or 14 instead of 9?
>
> I am running an AMD Phenom 9950, but the Ubuntu I'm using is just

> 32-bit given that we're running the same architecture.

Then I'm even more surprised that you would see significantly different
results to mine.

> I tried 5/ray.hs with level=12 instead of 9 but it ran into a
> stack overflow problem.

Yes. Many of the Haskell versions regularly die with stack overflows. They are
not predictable.

> When I increased the stack size it completed but
> it also took more time than 1/ray.hs, which required no stack size
> increase.

This is an interesting result. I hadn't noticed that the most optimized
Haskell implementation is not necessarily the fastest. However, I think I can
explain the phenomenon: with a huge number of spheres, some groups of spheres
(branches of scene tree) are always occluded and never need to be explicitly
generated but only the Haskell is generating the scene tree lazily. In fact,
it may be the case that with level->infinity only the Haskell required
bounded space.

For example, at level=13 the 1/ray.hs Haskell takes 25.8s, 2/ray.hs takes 93s
and the 5/ray.ml OCaml takes 118s. Presumably Lennart made the more optimized
Haskell implementations eager in order to improve performance at level=9 but,
in doing so, he degraded performance for level>9.

Unpredictable...

> I made sure that the other arguments I fed it were the same. I
> think there is some problem that needs to be worked out in the 5/ray.hs.

There is no easy solution to this because the performance is a non-trivial
function of "level" and "n".

> I tried level=14 but I ran out of memory for 5/ray.ml and 5/ray.hs.

But 1/ray.hs can handle level=14 and 15:

$ time ./ray 14 512 >image.pgm

real 0m27.581s
user 0m26.790s
sys 0m0.764s

$ time ./ray 15 512 >image.pgm

real 0m29.532s
user 0m28.982s
sys 0m0.552s

In fact, that is faster than any other version.

> It seems that 5/ray.ml and 5/ray.hs aren't quite equivalent in some
> important way since 1/ray.ml is faster than 5/ray.ml for both level=9
> and level=12.

Did you mean .hs instead of .ml here?

> Whether it's a code problem or compiler problem, I cannot
> say.

The relative performance of the Haskell implementations also varies with
compiler versions, of course. I cannot tell when it will run out of memory or
even out of stack space. You just have to try it and, when Haskell dies with
a stack overflow after several minutes, you just have to tweak the
command-line parameters to try again until it happens to work.

Finally, I'd add that this "benefit" of the Haskell will almost certainly
destroy its scalability in the parallel case because you'll have threads
competing to force the evaluation of thunks in the shared scene tree which
incurs global synchronization in wholly unpredictable ways (it even depends
upon the layout of the scene!). So, while this is academically interesting,
I'd argue that it is practically useless.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________