Port my ray tracer

Jon Harrop

unread,

Jun 1, 2005, 9:25:45 PM6/1/05

to

I've written a mini ray tracer for the computer language shootout. The
original version was a 66-line OCaml program which I ported to C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

I have since ported the program to C (icc and gcc), SML (sml-nj and mlton)
and Java (j2se and gcj) and Simon Geard kindly ported it to Fortran 90.

I'd like to see it ported to other languages as well, particularly Haskell
and I'm keen to hear any constructive criticisms of my existing
implementations.

It is interesting to note that the SML is 50% longer than the OCaml. When
compiled with Mlton it is also significantly faster. Here are my timings
(1.2GHz Athlon t-bird) for 128x128 resolution, 4x4 oversampling and 6
levels of spheres:

Mlton 1.316s mlton ray.sml
IFC 1.361s ifort -O3 -u -static-libcxa -o raytracer raytracer.f90
C++ 1.605s g++-3.4 -O3 -funroll-all-loops -ffast-math ray.cpp -o ray
ocamlopt 1.932s ocamlopt -inline 100 ray.ml -o ray
SML-NJ 2.245s sml ray.sml
G95 3.351s g95 -O3 -ffast-math ray.f90 -o ray
C 5.971s gcc-3.4 -lm -std=c99 -O3 -ffast-math ray.c -o ray
Java 6.492s javac ray.java
GCJ 20.316s gcj-3.4 --main=ray -Wall -O3 -ffast-math ray.java -o ray
ocamlc 41.047s ocamlc ray.ml -o ray

Note: the OCaml and C++ implementations timed here are not those on the site
- they are more optimised and longer (keeping within 100 LOC).

The compile times are also interesting:

ocamlc 0.099s
IFC 0.333s
SML-NJ 0.625s
C 0.655s
ocamlopt 0.714s
GCJ 0.723s
G95 0.941s
Java 1.362s
Mlton 8.267s
C++ 8.676s

Here is the SML (for Mlton):

(* The Great Computer Language Shootout
http://shootout.alioth.debian.org/
contributed by Jon Harrop, 2005
Compile: mlton ray.sml *)

fun real n = Real.fromInt n
(* Mlton does not provide a "for loop" construct. *)
fun for (s, e, f) = if s=e then () else (f (real s); for (s+1, e, f))

val delta = 0.00000001 (* FIXME: Where is mach eps in the std libs? *)
val infinity = 1.0 / 0.0 (* FIXME: Where is infinity? *)
val pi = 4.0 * Real.Math.atan 1.0 (* FIXME: Where is pi? *)

(* 3D vector
SML typically uses tuples rather than records (as in the OCaml
implementation). *)
type vec = real * real * real
(* SML allows operator precedences to be declared when defining infix
operators. *)
infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z)
infix 1 +| fun (x1, y1, z1) +| (x2, y2, z2) =
(x1+x2+0.0, y1+y2+0.0, z1+z2+0.0)
infix 1 -| fun (x1, y1, z1) -| (x2, y2, z2) =
(x1-x2+0.0, y1-y2+0.0, z1-z2+0.0)
fun dot (x1, y1, z1) (x2, y2, z2) = x1*x2 + y1*y2 + z1*z2+0.0
fun unitise r = (1.0 / Real.Math.sqrt (dot r r)) *| r

(* Node in the scene tree *)
datatype scene =
Sphere of vec * real
| Group of vec * real * scene list

(* Find the first intersection of the given ray with this sphere *)
fun ray_sphere (orig, dir) center radius =
let
val v = center -| orig
val b = dot v dir
val disc = b * b - dot v v + radius * radius
in
if disc < 0.0 then infinity else
let val disc = Real.Math.sqrt disc in
let val t2 = b + disc in
if t2 < 0.0 then infinity else
(fn t1 => if t1 > 0.0 then t1 else t2) (b - disc)
end end end

(* Accumulate the first intersection of the given ray with this sphere *)
(* This function is used both for primary and shadow rays. *)
fun intersect (orig, dir) scene =
let fun of_scene (scene, (l, n)) =
case scene of
Sphere (center, radius) =>
let val l' = ray_sphere (orig, dir) center radius in
if l' >= l then (l, n) else
(l', unitise (orig +| l' *| dir -| center))
end
| Group (center, radius, scenes) =>
let val l' = ray_sphere (orig, dir) center radius in
if l' >= l then (l, n) else
foldl of_scene (l, n) scenes
end in
of_scene (scene, (infinity, (0.0, 0.0, 0.0)))
end

(* Trace a single ray into the scene *)
fun ray_trace light (orig, dir) scene =
let val (lambda, n) = intersect (orig, dir) scene in
if lambda >= infinity then 0.0 else
let val g = 0.0 - dot n light in
(* If we are on the shadowed side of a sphere then don't
bother casting a shadow ray as we know it will intersect
the same sphere. *)
if g <= 0.0 then 0.0 else
let
val orig = orig +| lambda *| dir +| delta *| n
val dir = (0.0, 0.0, 0.0) -| light
in
let val (l, _) = intersect (orig, dir) scene in
if l >= infinity then g else 0.0
end end end end

(* Recursively build the scene tree *)
fun create level r (x, y, z) =
let val obj = Sphere ((x, y, z), r) in
if level = 1 then obj else
let val r' = 3.0 * r / Real.Math.sqrt 12.0 in
let fun aux x' z' =
create (level-1) (0.5 * r) (x-x', y+r', z+z') in
let val objs = [aux (~r') (~r'), aux r' (~r'),
aux (~r') r', aux r' r', obj] in
Group ((x, y, z), 3.0 * r, objs)
end end end end

(* Build a scene and trace many rays into it, outputting a PGM image *)
val () =
let
val level = 6 (* Number of levels of spheres *)
val ss = 4 (* Oversampling *)
(* Resolution *)
val n = case CommandLine.arguments () of
[s] => (case Int.fromString s of
SOME n => n | _ => 256)
| _ => 256
val scene = create level 1.0 (0.0, ~1.0, 0.0) (* Scene tree *)
in
(fn s => print ("P5\n"^s^" "^s^"\n255\n")) (Int.toString n);
for (0, n, fn y => for (0, n, fn x =>
let val g = ref 0.0 in
for (0, ss, fn dx => for (0, ss, fn dy =>
let val n = real n
val x = x + dx / real ss - n / 2.0
val y = n - 1.0 - y + dy / real ss - n / 2.0 in
let val eye = unitise (~1.0, ~3.0, 2.0)
val ray = ((0.0, 0.0, ~4.0),
unitise (x, y, n)) in
g := !g + ray_trace eye ray scene
end
end));
let val g = 0.5 + 255.0 * !g / real (ss*ss) in
print (String.str(Char.chr(Real.trunc g)))
end
end))
end

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com

Matthias Blume

unread,

Jun 1, 2005, 11:12:20 PM6/1/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

Is the SML/NJ code mostly identical to the MLton code? If not,
can I see the SML/NJ code you used?

Matthias

alex goldman

unread,

Jun 1, 2005, 11:15:09 PM6/1/05

to

Jon Harrop wrote:

> C++ 1.605s g++-3.4 -O3 -funroll-all-loops -ffast-math ray.cpp -o ray

> C 5.971s gcc-3.4 -lm -std=c99 -O3 -ffast-math ray.c -o ray

What explains such big difference here? Where is the code actually used for
benchmarking, BTW?

Jon Harrop

unread,

Jun 2, 2005, 3:58:31 AM6/2/05

to

The code is virtually identical.

Jon Harrop

unread,

Jun 2, 2005, 4:02:22 AM6/2/05

to

alex goldman wrote:
> Jon Harrop wrote:
>
>> C++ 1.605s g++-3.4 -O3 -funroll-all-loops -ffast-math ray.cpp -o
>> ray
>> C 5.971s gcc-3.4 -lm -std=c99 -O3 -ffast-math ray.c -o ray
>
> What explains such big difference here?

C does a lot better on AMD64 but I believe the difference is due to the
efficiency of inlined const reference passing of vectors in C++ compared to
the naive approach used by the C code.

Also, the C++ code includes some optimisations not in the C code because it
already exceeded the 100 LOC limit of the shootout. Specifically,
specialised calls for shadow rays. This shaves off another 30% or so.

> Where is the code actually used for benchmarking, BTW?

Here's the OCaml:

(* The Great Computer Language Shootout
http://shootout.alioth.debian.org/

Contributed by Jon Harrop, 2005
Compile: ocamlopt -inline 100 ray.ml -o ray *)

(* This implementation differs from the original in several ways:

Uses an implicit scene, generated as a ray is traced rather than being
precalculated and stored explicitly in a tree.

Specialized shadow-ray intersection functions. *)

let delta = sqrt epsilon_float and pi = 4. *. atan 1.

(* 3D vector and associated functions *)
type vec = {x:float; y:float; z:float}
let ( *| ) s r = {x = s *. r.x; y = s *. r.y; z = s *. r.z}
let ( +| ) a b = {x = a.x +. b.x; y = a.y +. b.y; z = a.z +. b.z}
let ( -| ) a b = {x = a.x -. b.x; y = a.y -. b.y; z = a.z -. b.z}
let dot a b = a.x *. b.x +. a.y *. b.y +. a.z *. b.z
let unitise r = (1. /. sqrt (dot r r)) *| r

(* A semi-infinite ray starting at "orig" and with direction "dir". *)
type ray = { orig: vec; dir: vec }

(* Calculate the parametric intersection of the given ray with the given
sphere. *)
let ray_sphere orig dir center radius =
let v = center -| orig in
let b = dot v dir in
let disc = b *. b -. dot v v +. radius *. radius in
if disc < 0. then infinity else
let disc = sqrt disc in
(fun t2 -> if t2 < 0. then infinity else
((fun t1 -> if t1 > 0. then t1 else t2) (b -. disc))) (b +. disc)

(* Calculate whether or not the given ray intersects the given sphere. *)
let ray_sphere' orig dir center radius =
let v = center -| orig in
let b = dot v dir in
let disc = b *. b -. dot v v +. radius *. radius in
if disc < 0. then false else b +. sqrt disc >= 0.

(* Ratio of the radii of one level of spheres to the next. *)
let s = 6. /. sqrt 12.

(* Find the first intersection point of the given ray with the scene. *)
let intersect level orig dir =
let rec of_scene center radius lambda normal level =

if level = 1 then

let lambda' = ray_sphere orig dir center radius in
if lambda' >= lambda then lambda, normal else
lambda', unitise (orig +| lambda' *| dir -| center)
else
if ray_sphere orig dir center (3. *. radius) >= lambda
then lambda, normal else
let accu = of_scene center radius lambda normal 1 in
let r = 0.5 *. radius and l = level - 1 in let r' = s *. r in
let aux dx dz (lambda, normal) =
of_scene (center +| {x=dx; y=r'; z=dz}) r lambda normal l in
let mr' = -.r' in
aux r' mr' (aux r' r' (aux mr' r' (aux mr' mr' accu))) in
of_scene {x=0.; y= -1.; z=0.} 1. infinity {x=0.; y=0.; z=0.} level

(* Find if the given ray intersects the scene. *)
let intersect' level orig dir =
let rec of_scene center radius level =
if level = 1 then ray_sphere' orig dir center radius else
(* Exploit short-circuit evaluation of boolean comparisons to
terminate
this function early. *)
ray_sphere' orig dir center (3. *. radius) &&
(of_scene center radius 1 ||
let r = 0.5 *. radius and l = level - 1 in let r' = s *. r in
of_scene (center +| {x= -.r'; y=r'; z= -.r'}) r l ||
of_scene (center +| {x= r'; y=r'; z= -.r'}) r l ||
of_scene (center +| {x= -.r'; y=r'; z= r'}) r l ||
of_scene (center +| {x= r'; y=r'; z= r'}) r l) in
of_scene {x=0.; y= -1.; z=0.} 1. level

(* Trace a single ray by casting it into the scene and, if it intersects
anything, casting a second ray toward the light to determine occlusion.
*)
let rec ray_trace l light orig dir =
let lambda, n = intersect l orig dir in
if lambda = infinity then 0. else
let g = -. dot n light in

(* If we are on the shadowed side of a sphere then don't bother casting
a
shadow ray as we know it will intersect the same sphere. *)

if g <= 0. then 0. else
let p = orig +| lambda *| dir +| delta *| n in
if intersect' l p ({x=0.; y=0.; z=0.} -| light) then 0. else g

(* Ray trace the scene at the given resolution. *)
let () =
(* Resolution *)
let n = match Sys.argv with [| _; l |] -> int_of_string l | _ -> 256 in
(* Light direction *)
let light = unitise {x= -1.; y= -3.; z=2.} in
(* Number of levels of spheres, and oversampling. *)
let level = 6 and ss = 4 in

Printf.printf "P5\n%d %d\n255\n" n n;
for y = n - 1 downto 0 do
for x = 0 to n - 1 do
(* Average each pixel over ss*ss separate rays. *)
let g = ref 0. in
for dx = 0 to ss - 1 do
for dy = 0 to ss - 1 do
(* Calculate the origin and direction of this ray. *)
let orig = {x=0.; y=0.; z= -4.} in
let dir = unitise {x = float (x - n / 2) +. float dx /. float ss;
y = float (y - n / 2) +. float dy /. float ss;
z = float n} in
g := !g +. ray_trace level light orig dir
done
done;
let g = int_of_float (0.5 +. 255. *. !g /. float (ss*ss)) in
Printf.printf "%c" (char_of_int g)
done
done

Here's the C++:

// The Great Computer Language Shootout
// http://shootout.alioth.debian.org/
// Contributed by Jon Harrop, 2005
// Compile: g++ -Wall -O3 -ffast-math ray.cpp -o ray

#include <vector>
#include <iostream>
#include <limits>
#include <cmath>

using namespace std;

numeric_limits<double> real;
double delta = sqrt(real.epsilon()), infinity = real.infinity(), pi = M_PI;

// 3D vector
struct Vec {
double x, y, z;
Vec(double x2, double y2, double z2) : x(x2), y(y2), z(z2) {}
};
Vec operator+(const Vec &a, const Vec &b)
{ return Vec(a.x + b.x, a.y + b.y, a.z + b.z); }
Vec operator-(const Vec &a, const Vec &b)
{ return Vec(a.x - b.x, a.y - b.y, a.z - b.z); }
Vec operator*(double a, const Vec &b) { return Vec(a * b.x, a * b.y, a *
b.z); }
double dot(const Vec &a, const Vec &b) { return a.x*b.x + a.y*b.y +
a.z*b.z; }
Vec unitise(const Vec &a) { return (1 / sqrt(dot(a, a))) * a; }

// Semi-infinite ray
struct Ray { Vec orig, dir; Ray(Vec o, Vec d) : orig(o), dir(d) {} };

// Scene tree
// In this implementation, a node in the scene tree is represented by a
single
// struct which is either a group of scene trees with a spherical bound or,
// implicitly, a single sphere if the group has no children.
// This is not equivalent to the variant type used to represent a node in
the
// original OCaml implementation because this representation cannot
associate
// data with only leaf nodes (such as color, reflectivity etc.) but requires
// significantly less C++ code and gives room to implement a specialized
// shadow-ray intersection algorithm.
struct Scene {
vector<Scene> child; // Child nodes in the scene tree
Vec center; // Center of the sphere or spherical bound
double radius; // Radius of the sphere or spherical bound

Scene(Vec c, double r) : child(), center(c), radius(r) {}
};

// Find the first intersection of the given ray with this sphere
double ray_sphere(const Ray &ray, const Scene &s) {
Vec v = s.center - ray.orig;
double b = dot(v, ray.dir), disc = b*b - dot(v, v) + s.radius*s.radius;
if (disc < 0) return infinity;
double d = sqrt(disc), t2 = b + d;
if (t2 < 0) return infinity;
double t1 = b - d;
return (t1 > 0 ? t1 : t2);
}

// Accumulate the first intersection of the given ray with the given scene
// The accumulated parameter (lambda) and normal vector (normal) are passed
by
// reference to avoid having to define a struct to represent the real return
// type of this function.
void intersect(double &lambda, Vec &normal, const Ray &ray, const Scene &s)
{
double l = ray_sphere(ray, s);
// If there is no intersection with this node or if the intersection point
is
// farther than the current intersection then return as no closer
// intersection is to be found here.
if (l >= lambda) return;
if (s.child.size() == 0) {
// Intersect with a single sphere
lambda = l;
normal = unitise(ray.orig + l * ray.dir - s.center);
} else
// Intersect with a group
for (std::vector<Scene>::const_iterator it=s.child.begin();
it!=s.child.end(); ++it)
intersect(lambda, normal, ray, *it);
}

// Find any intersection of the given ray with the given scene
// This function is significantly faster than the above function because it
can
// terminate as soon as any intersection is found.
// This function is distinguished from the above function by its arguments
// (function overloading).
bool intersect(const Ray &ray, const Scene &s) {
if (ray_sphere(ray, s) == infinity) return false;
if (s.child.size() == 0) return true; else
for (std::vector<Scene>::const_iterator it=s.child.begin();
it != s.child.end(); ++it)
if (intersect(ray, *it)) return true;
return false;
}

// Trace a single ray into the scene
double ray_trace(const double weight, const Vec light, const Ray ray,
const Scene &s) {
// As the accumulator is passed to the "intersect" function by reference,
// they cannot be given inline so they are declared as local variables
here.
double lambda = infinity;
Vec normal(0, 0, 0);
intersect(lambda, normal, ray, s);
if (lambda == infinity) return 0;
Vec o = ray.orig + lambda * ray.dir + delta * normal;
double g = -dot(normal, light);
// If we are on the shadowed side of a sphere then don't bother casting a
// shadow ray as we know it will intersect the same sphere.
if (g <= 0) return 0.;
return (intersect(Ray(o, Vec(0, 0, 0) - light), s) ? 0 : g);
}

// Recursively build the scene tree
Scene create(int level, double r, double x, double y, double z) {
if (level == 1) return Scene(Vec(x, y, z), r);
Scene group = Scene(Vec(x, y, z), 3*r);
group.child.push_back(Scene(Vec(x, y, z), r));
double rn = 3*r/sqrt(12.);
for (int dz=-1; dz<=1; dz+=2)
for (int dx=-1; dx<=1; dx+=2)
group.child.push_back(create(level-1, r/2, x-dx*rn, y+rn, z-dz*rn));
return group;
}

// Build a scene and trace many rays into it, outputting a PGM image
int main(int argc, char *argv[]) {
// Number of levels of spheres, resolution and oversampling
int level = 6, n = (argc==2 ? atoi(argv[1]) : 256), ss = 4;
// Direction of the light
Vec light = unitise(Vec(-1, -3, 2));
// Build the scene
Scene s = create(level, 1, 0, -1, 0);

std::cout << "P5\n" << n << " " << n << "\n255\n";
for (int y=n-1; y>=0; --y) {
for (int x=0; x<n; ++x) {
double g=0;
for (int dx=0; dx<ss; ++dx)
for (int dy=0; dy<ss; ++dy) {
// We use "dx*1." instead of "double(dx)" to save space
Vec d(x+dx*1./ss-n/2., y+dy*1./ss-n/2., n);
g += ray_trace(1, light, Ray(Vec(0, 0, -4), unitise(d)), s);
}
std::cout << char(.5 + 255. * g / (ss*ss));
}
}

// In this implementation the Scene destructor recursively deallocates the
// entire scene for us, obviating the need for manually-defined
destructors.

return 0;

Matthew Fluet

unread,

Jun 2, 2005, 8:35:12 AM6/2/05

to

> It is interesting to note that the SML is 50% longer than the OCaml.

It may be 50% longer than the original 66-line OCaml program, but it
seems to be exactly the same size at the optimized OCaml program you posted.

> Here is the SML (for Mlton):

Just a couple of SML notes:

> fun real n = Real.fromInt n

A function "real" (of identical semantics) is available in the top-level
environment of the Basis Library, so there is no need to define your own
version.

> val delta = 0.00000001 (* FIXME: Where is mach eps in the std libs? *)

The Standard ML Basis Library does have Real.minNormalPos which
corresponds to OCaml's min_float.
You can compute OCaml's epsilon_float with
val epsilon = Real.nextAfter(1.0,2.0) - 1.0

> val infinity = 1.0 / 0.0 (* FIXME: Where is infinity? *)

val infinity = Real.posInf

> val pi = 4.0 * Real.Math.atan 1.0 (* FIXME: Where is pi? *)

val pi = Real.Math.pi

> (* 3D vector
> SML typically uses tuples rather than records (as in the OCaml implementation). *)
> type vec = real * real * real

I don't know if it is typical, but records will have no impact on the
peformance of the code compiled under MLton (and presumably under other
SML compilers).

> (* SML allows operator precedences to be declared when defining infix operators. *)
> infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z)
> infix 1 +| fun (x1, y1, z1) +| (x2, y2, z2) =
> (x1+x2+0.0, y1+y2+0.0, z1+z2+0.0)
> infix 1 -| fun (x1, y1, z1) -| (x2, y2, z2) =
> (x1-x2+0.0, y1-y2+0.0, z1-z2+0.0)
> fun dot (x1, y1, z1) (x2, y2, z2) = x1*x2 + y1*y2 + z1*z2+0.0

Rather than using "+0.0" to force the overloaded operators to resolve to
Real.real, you could use a simple type annotation on the functions:

fun s *| (x, y, z) : vec = (s*x, s*y, s*z)

MLton doesn't actually perform constant folding on floating point
operations, so you are paying for some extra operations in the final
executable.

Marcin 'Qrczak' Kowalczyk

unread,

Jun 2, 2005, 9:34:17 AM6/2/05

to

Matthew Fluet <mfl...@acm.org> writes:

>> infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z)

> fun s *| (x, y, z) : vec = (s*x, s*y, s*z)

>
> MLton doesn't actually perform constant folding on floating point
> operations, so you are paying for some extra operations in the final
> executable.

Actually x+0.0 is not the same as x when x is -0.0 (according to IEEE
floating point semantics - I don't know how SML treats it).

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Matthias Blume

unread,

Jun 2, 2005, 10:41:47 AM6/2/05

to

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> writes:

> Matthew Fluet <mfl...@acm.org> writes:
>
>>> infix 2 *| fun s *| (x, y, z) = (s*x+0.0, s*y, s*z)
>
>> fun s *| (x, y, z) : vec = (s*x, s*y, s*z)
>>
>> MLton doesn't actually perform constant folding on floating point
>> operations, so you are paying for some extra operations in the final
>> executable.
>
> Actually x+0.0 is not the same as x when x is -0.0 (according to IEEE
> floating point semantics - I don't know how SML treats it).

Right. And that's why it's a bad idea to use +0.0 just for the sake
of indicating a type constraint.

Vesa Karvonen

unread,

Jun 2, 2005, 11:05:33 AM6/2/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> I'm keen to hear any constructive criticisms of my existing
> implementations.

[...]

> let val r' = 3.0 * r / Real.Math.sqrt 12.0 in
> let fun aux x' z' =
> create (level-1) (0.5 * r) (x-x', y+r', z+z') in
> let val objs = [aux (~r') (~r'), aux r' (~r'),
> aux (~r') r', aux r' r', obj] in
> Group ((x, y, z), 3.0 * r, objs)
> end end end end

[...]

This won't have an effect on performance, but the above style of
let-nesting (used throughout the code), which is probably a result of
a conservative translating from OCaml, is not necessary in SML. In
SML, you can flatten the nested let-expressions. In other words, you
can flatten an expression of the form

let val ... in
let val ... in
...
let val in ... end ... end end

into

let val ...
val ...
...
in ... end

-Vesa Karvonen

Vesa Karvonen

unread,

Jun 2, 2005, 11:54:38 AM6/2/05

to

Matthew Fluet <mfl...@acm.org> wrote:

> > It is interesting to note that the SML is 50% longer than the OCaml.

> It may be 50% longer than the original 66-line OCaml program, but it
> seems to be exactly the same size at the optimized OCaml program you posted.

The SML code does indeed have more lines (count them), but it is
structurally almost the same. In my experience, formatted SML code
tends to "naturally" spread out over more lines than equivalent
OCaml code. (The idea of comparing just the number of lines of code
is flawed IMO.)

> val pi = Real.Math.pi

While one is measuring code length, it might be better to use

val pi = Math.pi

Of course, this is a moot point as pi is declared but never used in
the program.

> > infix 1 -| fun (x1, y1, z1) -| (x2, y2, z2) =
> > (x1-x2+0.0, y1-y2+0.0, z1-z2+0.0)
> > fun dot (x1, y1, z1) (x2, y2, z2) = x1*x2 + y1*y2 + z1*z2+0.0

[...]

> Rather than using "+0.0" to force the overloaded operators to resolve to
> Real.real, you could use a simple type annotation on the functions:

> fun s *| (x, y, z) : vec = (s*x, s*y, s*z)

> MLton doesn't actually perform constant folding on floating point
> operations, so you are paying for some extra operations in the final
> executable.

Indeed, a couple of the vector operations (unsurprisingly) seem take
quite a bit of time (the below data is for an optimized version using
type annotations rather than +0.0 (and a couple of other minor
changes)):

function cur
-------------------------------- -----
ray_sphere 31.6%
intersect.of_scene 17.7%
dot 17.1%
Sequence.Slice.foldli.loop 14.8%
-| 9.8%
unitise 2.4%
[...]

-Vesa Karvonen

Vesa Karvonen

unread,

Jun 2, 2005, 12:29:16 PM6/2/05

to

Vesa Karvonen <vesa.k...@cs.helsinki.fi> wrote:
> Matthew Fluet <mfl...@acm.org> wrote:

> > > It is interesting to note that the SML is 50% longer than the OCaml.

> > It may be 50% longer than the original 66-line OCaml program, but it
> > seems to be exactly the same size at the optimized OCaml program you posted.

> The SML code does indeed have more lines (count them) [...]

Oops, I was actually comparing to another OCaml version (I copied it from
a shootout page earlier --- can't seem to find the page anymore) that was
structurally identical to the SML version and had ~80 lines (versus many
more in the SML version). Upon closer inspection, the latest OCaml version
posted to the group is considerably different from the SML version posted
earlier.

-Vesa Karvonen

Vesa Karvonen

unread,

Jun 2, 2005, 2:11:32 PM6/2/05

to

Matthew Fluet <mfl...@acm.org> wrote:
[...]

> > (* 3D vector
> > SML typically uses tuples rather than records (as in the OCaml implementation). *)
> > type vec = real * real * real

> I don't know if it is typical, but records will have no impact on the
> peformance of the code compiled under MLton (and presumably under other
> SML compilers).

SML and O'Caml do handle records in somewhat different ways. It has
been a couple of years since I programmed daily in OCaml, so my
recollection of the experience may not be perfect, but as I have lately
been programming mostly in SML, it feels like the use of records in SML
tends to require more type annotations. So, I think that there may
be some truth in the above comment (more annotations -> less convenient
-> less use).

Here is an example transcript in OCaml (3.08.3):

# type vec = {x:int ; y:int} ;;
type vec = { x : int; y : int; }
# let getX vec = vec.x ;;
val getX : vec -> int = <fun>

Here is a failing attempt at a "similar" code in SML (SML/NJ 110.42):

- type vec = {x:int, y:int} ;
type vec = {x:int, y:int}
- fun getX vec = #x vec ;
stdIn:2.1-2.22 Error: unresolved flex record
(can't tell what fields there are besides #x)

It seems fairly common practise in SML to use datatypes to
eliminate the need to use *type* annotations:

- datatype vec = VEC of {x:int, y:int} ;
datatype vec = VEC of {x:int, y:int}
- fun getX (VEC vec) = #x vec ;
val getX = fn : vec -> int

(One reason why someone might prefer to use tuples instead of records is
that tuple patterns are slightly simpler.)

-Vesa Karvonen

Matthias Blume

unread,

Jun 2, 2005, 4:12:54 PM6/2/05

to

Vesa Karvonen <vesa.k...@cs.helsinki.fi> writes:

> Matthew Fluet <mfl...@acm.org> wrote:
> [...]
>> > (* 3D vector
>> > SML typically uses tuples rather than records (as in the OCaml implementation). *)
>> > type vec = real * real * real
>
>> I don't know if it is typical, but records will have no impact on the
>> peformance of the code compiled under MLton (and presumably under other
>> SML compilers).
>
> SML and O'Caml do handle records in somewhat different ways. It has
> been a couple of years since I programmed daily in OCaml, so my
> recollection of the experience may not be perfect, but as I have lately
> been programming mostly in SML, it feels like the use of records in SML
> tends to require more type annotations. So, I think that there may
> be some truth in the above comment (more annotations -> less convenient
> -> less use).

I actually try to use records frequently in SML.

> Here is an example transcript in OCaml (3.08.3):
>
> # type vec = {x:int ; y:int} ;;
> type vec = { x : int; y : int; }
> # let getX vec = vec.x ;;
> val getX : vec -> int = <fun>
>
> Here is a failing attempt at a "similar" code in SML (SML/NJ 110.42):
>
> - type vec = {x:int, y:int} ;
> type vec = {x:int, y:int}
> - fun getX vec = #x vec ;
> stdIn:2.1-2.22 Error: unresolved flex record
> (can't tell what fields there are besides #x)

Just write it as

fun getX { x, y } = x

If there aren't too many other fields, this is not too inconvenient.
Otherwise just use a type constraint.

fun getX (v: vec) = #x v

Of course, that requires actually declaring the record type --
something that is otherwise not necessary.

> It seems fairly common practise in SML to use datatypes to
> eliminate the need to use *type* annotations:
>
> - datatype vec = VEC of {x:int, y:int} ;
> datatype vec = VEC of {x:int, y:int}
> - fun getX (VEC vec) = #x vec ;
> val getX = fn : vec -> int
>
> (One reason why someone might prefer to use tuples instead of records is
> that tuple patterns are slightly simpler.)

Actually, record patters are just as simple. Record construction is a
bit wordier because one has to write all those label names.

Matthias

Vesa Karvonen

unread,

Jun 2, 2005, 5:34:14 PM6/2/05

to

Matthias Blume <fi...@my.address.elsewhere> wrote:
> Vesa Karvonen <vesa.k...@cs.helsinki.fi> writes:
[...]
> Just write it as

> fun getX { x, y } = x

> If there aren't too many other fields, this is not too inconvenient.

Fair enough, but this approach becomes inconvenient when you need to
add/remove a field to/from a record. You then need to modify every
function that takes such records as parameters even if the functions
make no use of the field.

> Otherwise just use a type constraint.

> fun getX (v: vec) = #x v

> Of course, that requires actually declaring the record type --
> something that is otherwise not necessary.

This approach becomes inconvenient when you want to write
polymorphic functions that manipulate records. The type can become
somewhat complex (think ('a, 'b, ..., 'z) my_record).

The use of datatypes seems to eliminate both problems. When you
define record datatypes like this

datatype ('a, 'b, ..., 'z) my_record
= MY_REC of {foo: 'a, bar: 'b, ..., foobar: 'z}

and write functions like this

fun usingFooBar (MY_REC myRec) = ... #foobar myRec ...

the function definitions are more resilient to changes like adding
or removing fields. The formal parameter pattern is also simpler
than with a type annotation. You can also use wildcards without
problems:

fun usingFooAndBar (MY_REC {foo, bar, ...}) = ... foo ... bar ...

A good compiler should be able to eliminate any run-time overhead of
using datatypes.

> > (One reason why someone might prefer to use tuples instead of records is
> > that tuple patterns are slightly simpler.)

> Actually, record patters are just as simple. Record construction is a
> bit wordier because one has to write all those label names.

Records become less convenient in patterns when a function takes two
or more records (with the same labels). You will then need to
specify the names of the labels as well as the names of the
bindings, e.g.

fun {x=x0, y=y0} +| {x=x1, y=y1} = {x=x0+x1, y=y0+y1}

With tuples you can say

fun (x0, y0) +| (x1, y1) = (x0+x1, y0+y1)

-Vesa Karvonen

Matthias Blume

unread,

Jun 2, 2005, 6:12:37 PM6/2/05

to

Vesa Karvonen <vesa.k...@cs.helsinki.fi> writes:

> Matthias Blume <fi...@my.address.elsewhere> wrote:
>> Vesa Karvonen <vesa.k...@cs.helsinki.fi> writes:
> [...]
>> Just write it as
>
>> fun getX { x, y } = x
>
>> If there aren't too many other fields, this is not too inconvenient.
>
> Fair enough, but this approach becomes inconvenient when you need to
> add/remove a field to/from a record. You then need to modify every
> function that takes such records as parameters even if the functions
> make no use of the field.

Sure, but that's no different with a tuple.

Anyway, I also think that a solution such as SML# is really the answer
to all this.

>> Otherwise just use a type constraint.
>
>> fun getX (v: vec) = #x v
>
>> Of course, that requires actually declaring the record type --
>> something that is otherwise not necessary.
>
> This approach becomes inconvenient when you want to write
> polymorphic functions that manipulate records. The type can become
> somewhat complex (think ('a, 'b, ..., 'z) my_record).

Indeed. Again, SML# would be the answer (but, unfortunately, SML# is
not SML).

> The use of datatypes seems to eliminate both problems. When you
> define record datatypes like this
>
> datatype ('a, 'b, ..., 'z) my_record
> = MY_REC of {foo: 'a, bar: 'b, ..., foobar: 'z}
>
> and write functions like this
>
> fun usingFooBar (MY_REC myRec) = ... #foobar myRec ...
>
> the function definitions are more resilient to changes like adding
> or removing fields. The formal parameter pattern is also simpler
> than with a type annotation. You can also use wildcards without
> problems:
>
> fun usingFooAndBar (MY_REC {foo, bar, ...}) = ... foo ... bar ...
>
> A good compiler should be able to eliminate any run-time overhead of
> using datatypes.
>
>> > (One reason why someone might prefer to use tuples instead of records is
>> > that tuple patterns are slightly simpler.)
>
>> Actually, record patters are just as simple. Record construction is a
>> bit wordier because one has to write all those label names.
>
> Records become less convenient in patterns when a function takes two
> or more records (with the same labels). You will then need to
> specify the names of the labels as well as the names of the
> bindings, e.g.
>
> fun {x=x0, y=y0} +| {x=x1, y=y1} = {x=x0+x1, y=y0+y1}
>
> With tuples you can say
>
> fun (x0, y0) +| (x1, y1) = (x0+x1, y0+y1)

You are right on all these, of course.

Matthias

Jon Harrop

unread,

Jun 3, 2005, 12:35:24 PM6/3/05

to

Vesa Karvonen wrote:

> Vesa Karvonen <vesa.k...@cs.helsinki.fi> wrote:
>> The SML code does indeed have more lines (count them) [...]
>
> Oops, I was actually comparing to another OCaml version (I copied it from
> a shootout page earlier --- can't seem to find the page anymore) that was
> structurally identical to the SML version and had ~80 lines (versus many
> more in the SML version). Upon closer inspection, the latest OCaml version
> posted to the group is considerably different from the SML version posted
> earlier.

The original 66-line OCaml is here:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

The current version on the shootout is here:

http://shootout.alioth.debian.org/sandbox/benchmark.php?test=raytracer&lang=ocaml&id=0&sort=fullcpu

The version I posted to this newsgroup is optimised. I have not written an
SML equivalent as it would exceed the 100 LOC limit of the shootout.

Jon Harrop

unread,

Jun 3, 2005, 12:44:57 PM6/3/05

to

Matthew Fluet wrote:
>> It is interesting to note that the SML is 50% longer than the OCaml.
>
> It may be 50% longer than the original 66-line OCaml program, but it
> seems to be exactly the same size at the optimized OCaml program you
> posted.

The optimised OCaml is not equivalent to the optimised SML, of course. If
you were to add the optimisations (e.g. specialised shadow intersection
algorithms) then the SML would probably remain 50% longer than the OCaml.

From my point of view, this is because SML requires lots of "end" and "val"
tokens and uses a much more verbose representation of local variables:

let a=1 and b=2 in
...

vs

let
val a = 1
val b = 2
in
...
end

In real SML programming, do people typically stack the "end"s on one line as
I have done, or do they nest them on many lines?

Thanks for the help. :-)

Mlton-compiled SML is now significantly faster than all of the other
implementations.

Matthias Blume

unread,

Jun 3, 2005, 1:02:01 PM6/3/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthew Fluet wrote:
>>> It is interesting to note that the SML is 50% longer than the OCaml.
>>
>> It may be 50% longer than the original 66-line OCaml program, but it
>> seems to be exactly the same size at the optimized OCaml program you
>> posted.
>
> The optimised OCaml is not equivalent to the optimised SML, of course. If
> you were to add the optimisations (e.g. specialised shadow intersection
> algorithms) then the SML would probably remain 50% longer than the OCaml.
>
> From my point of view, this is because SML requires lots of "end" and "val"
> tokens and uses a much more verbose representation of local variables:
>
> let a=1 and b=2 in
> ...
>
> vs
>
> let
> val a = 1
> val b = 2
> in
> ...
> end

Actually, you can write this as

let val a = 1 and b = 2
in ...
end

or

let val a = 1 val b = 2
in ...
end

which is not so bad.

>
> In real SML programming, do people typically stack the "end"s on one line as
> I have done, or do they nest them on many lines?

I don't stack them on one line as doing so is ugly (IMO). (That is,
unless I am purposely squeezing for space because of someone's
bean^H^H^Hline-counting. :-) )

Cheers,
Matthias

Benjamin Ylvisaker

unread,

Jun 3, 2005, 1:39:52 PM6/3/05

to

On Fri, 03 Jun 2005 17:44:57 +0100
Jon Harrop <use...@jdh30.plus.com> wrote:

> From my point of view, this is because SML requires lots of "end" and "val"
> tokens and uses a much more verbose representation of local variables:
>
> let a=1 and b=2 in
> ...
>
> vs
>
> let
> val a = 1
> val b = 2
> in
> ...
> end
>
> In real SML programming, do people typically stack the "end"s on one line as
> I have done, or do they nest them on many lines?

I don't typically stack "end"s, but I do usually use a slightly more
compact let-in-end style:

let val a = 1
val b = 2
in ...
end

Benjamin

Jon Harrop

unread,

Jun 3, 2005, 2:25:09 PM6/3/05

to

Vesa Karvonen wrote:
> The SML code does indeed have more lines (count them), but it is
> structurally almost the same. In my experience, formatted SML code
> tends to "naturally" spread out over more lines than equivalent
> OCaml code. (The idea of comparing just the number of lines of code
> is flawed IMO.)

LOC is definitely flawed, but can we improve upon it?

> Of course, this is a moot point as pi is declared but never used in
> the program.

Thanks, that's shortened several implementations. :-)

Alan Morgan

unread,

Jun 3, 2005, 3:03:46 PM6/3/05

to

In article <42a0a0ba$0$3878$ed26...@ptn-nntp-reader03.plus.net>,

Jon Harrop <use...@jdh30.plus.com> wrote:
>Vesa Karvonen wrote:
>> The SML code does indeed have more lines (count them), but it is
>> structurally almost the same. In my experience, formatted SML code
>> tends to "naturally" spread out over more lines than equivalent
>> OCaml code. (The idea of comparing just the number of lines of code
>> is flawed IMO.)
>
>LOC is definitely flawed, but can we improve upon it?

How about comparing # of tokens (lexemes? Not sure what the technical
word would be here).

Alan
--
Defendit numerus

Marcin 'Qrczak' Kowalczyk

unread,

Jun 3, 2005, 3:37:46 PM6/3/05

to

amo...@xenon.Stanford.EDU (Alan Morgan) writes:

>>LOC is definitely flawed, but can we improve upon it?
>
> How about comparing # of tokens (lexemes? Not sure what the technical
> word would be here).

In some cases it's hard to define. For example how many tokens this
Perl snippet has?

s/(foo*)bar/$1 abc def/

Mark Carroll

unread,

Jun 3, 2005, 3:57:41 PM6/3/05

to

In article <42a0a0ba$0$3878$ed26...@ptn-nntp-reader03.plus.net>,
Jon Harrop <use...@jdh30.plus.com> wrote:
>Vesa Karvonen wrote:

(snip)

>> OCaml code. (The idea of comparing just the number of lines of code
>> is flawed IMO.)
>
>LOC is definitely flawed, but can we improve upon it?

(snip)

Maybe see how large the programme is after gzip or some other
compression? That'll help discount for repetitious scaffolding.

Mark.

--
Haskell vacancies in Columbus, Ohio, USA: see http://www.aetion.com/jobs.html

Alan Morgan

unread,

Jun 3, 2005, 4:33:15 PM6/3/05

to

In article <87k6lbu...@qrnik.zagroda>,

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> wrote:
>amo...@xenon.Stanford.EDU (Alan Morgan) writes:
>
>>>LOC is definitely flawed, but can we improve upon it?
>>
>> How about comparing # of tokens (lexemes? Not sure what the technical
>> word would be here).
>
>In some cases it's hard to define. For example how many tokens this
>Perl snippet has?
>
> s/(foo*)bar/$1 abc def/

The fact that it is hard to define in perl is more of a strike against
perl than a strike against my suggestion :-)

Yes, it's hard to define. However, it's probably a lot easier to define for
O'Caml/SML and for this sort of rule-of-thumb comparison of the sizes of the
two languages it should work pretty well.

Alan
--
Defendit numerus

Marcin 'Qrczak' Kowalczyk

unread,

Jun 3, 2005, 5:29:56 PM6/3/05

to

amo...@xenon.Stanford.EDU (Alan Morgan) writes:

> Yes, it's hard to define. However, it's probably a lot easier to
> define for O'Caml/SML and for this sort of rule-of-thumb comparison
> of the sizes of the two languages it should work pretty well.

I agree. Besides Perl it should work quite well.

Mark Carroll <ma...@chiark.greenend.org.uk> writes:

> Maybe see how large the programme is after gzip or some other
> compression? That'll help discount for repetitious scaffolding.

This gives unfair advantage to languages which require a lot of
repetition (assuming that the shorter programs can be written,
the better the language is).

Jon Harrop

unread,

Jun 3, 2005, 7:29:36 PM6/3/05

to

Mark Carroll wrote:
> Maybe see how large the programme is after gzip or some other
> compression? That'll help discount for repetitious scaffolding.

That sounds like a measure of the information content, which should be the
same for all languages. We want to measure the exact opposite - the amount
of unnecessary code that has to be written.

Matthias Blume

unread,

Jun 4, 2005, 11:51:18 PM6/4/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

>> Where is the code actually used for benchmarking, BTW?
>
> Here's the OCaml:

[... big snip ... ]

Ok, I took the trouble and translated your Ocaml code to SML for use
under SML/NJ. (MLton needs a minor adjustment.) The code is only a
few lines longer than the original and should come in under 100loc
when ignoring comments and blank lines.

On an old 400 MHz Celeron, the program runs almost 30% faster than the
original SML code that you posted earlier. I'd be interested to hear
how it fares on the Athlon...

Matthias

------------------

(* This is a straight translation of Jon Harrop's OCaml ray tracer code
* for the "Great Computer Language Shootout", with very minor modifications.
* Transliteration for SML/NJ done by Matthias Blume. *)

(* The following comment was in the original Ocaml code. It does not
* refer to differences between the Ocaml and SML/NJ versions: *)

(* This implementation differs from the original in several ways:

Uses an implicit scene, generated as a ray is traced rather than being
precalculated and stored explicitly in a tree.

Specialized shadow-ray intersection functions. *)

structure Ray:sig val main:string*string list->OS.Process.status end = struct

val delta = Math.sqrt 2.22044604925031e~16

(* 3D vector and associated functions *)

type vec = real * real * real

infix 7 *| fun s *| (x, y, z) = (s*x, s*y, s*z) : vec
infix 6 |+| fun (x, y, z) |+| (x', y', z') = (x+x', y+y', z+z') : vec
infix 6 |-| fun (x, y, z) |-| (x', y', z') = (x-x', y-y', z-z') : vec
infix 7 |*| fun (x, y, z) |*| (x', y', z') = x*x'+y*y'+z*z' : real
fun unitise r = (1.0 / Math.sqrt (r |*| r)) *| r
fun ~|(x,y,z) = (~x,~y,~z) : vec

(* Calculate the parametric intersection of the given ray with the given
sphere. *)

fun ray_sphere (orig, dir, center, radius) =
let val v = center |-| orig
val b = v |*| dir
val disc = b * b - v |*| v + radius * radius
in if disc < 0.0 then Real.posInf
else let val disc = Math.sqrt disc
val t2 = b + disc
in if t2 < 0.0 then Real.posInf
else let val t1 = b - disc in if t1 > 0.0 then t1 else t2 end
end
end

(* Calculate whether or not the given ray intersects the given sphere. *)

fun ray_sphere' (orig, dir, center, radius) =
let val v = center |-| orig
val b = v |*| dir
val disc = b * b - v |*| v + radius * radius
in if disc < 0.0 then false else b + Math.sqrt disc >= 0.0 end

(* Ratio of the radii of one level of spheres to the next. *)

val s = 6.0 / Math.sqrt 12.0

(* Find the first intersection point of the given ray with the scene. *)

fun intersect (level, orig, dir) =
let fun of_scene (center, radius, lambda, normal, level) =

if level = 1 then

let val lambda' = ray_sphere (orig, dir, center, radius)
in if lambda' >= lambda then (lambda, normal)
else (lambda', unitise (orig |+| lambda' *| dir |-| center))
end
else if ray_sphere (orig, dir, center, 3.0 * radius) >= lambda
then (lambda, normal)
else let val accu = of_scene (center, radius, lambda, normal, 1)
val r = 0.5 * radius val l = level - 1 val r' = s * r
fun h (dx, dz, (lam, norm)) =
of_scene (center |+| (dx, r', dz), r, lam, norm, l)
val mr' = ~r'
in h (r', mr', h (r', r', h (mr', r', h (mr', mr', accu))))
end
in of_scene ((0.0, ~1.0, 0.0), 1.0, Real.posInf, (0.0, 0.0, 0.0), level) end

(* Find if the given ray intersects the scene. *)

fun intersect' (level, orig, dir) =
let fun of_scene (center, radius, level) =
if level = 1 then ray_sphere' (orig, dir, center, radius) else

(* Exploit short-circuit evaluation of boolean comparisons to

* terminate this function early. *)
ray_sphere' (orig, dir, center, (3.0 * radius)) andalso
(of_scene (center, radius, 1) orelse
let val r = 0.5 * radius val l = level-1 val r' = s * r
in of_scene (center |+| (~r', r', ~r'), r, l) orelse
of_scene (center |+| (r', r', ~r'), r, l) orelse
of_scene (center |+| (~r', r', r'), r, l) orelse
of_scene (center |+| (~r', r', r'), r, l)
end)
in of_scene ((0.0, ~1.0, 0.0), 1.0, level) end

(* Trace a single ray by casting it into the scene and, if it intersects
anything, casting a second ray toward the light to determine occlusion.
*)

fun ray_trace (l, light, orig, dir) =
let val (lam, n) = intersect (l, orig, dir)
in if lam >= Real.posInf then 0.0
else let val g = ~ (n |*| light)

(* If we are on the shadowed side of a sphere then don't bother casting
a
shadow ray as we know it will intersect the same sphere. *)

in if g<=0.0 orelse intersect' (l, orig|+|lam*|dir|+|delta*|n, ~|light)
then 0.0 else g
end
end

(* Ray trace the scene at the given resolution. *)

fun doit n (* Resolution *) =
let val n2 = n div 2 val rn = real n val ns = Int.toString n
val light = unitise (~1.0, ~3.0, 2.0) (* Light direction *)
val level = 6 val ss = 4 (* ss: oversampling *)
val rss' = 1.0 / real ss val rss'sq255 = rss' * rss' * 255.0
fun yl y = if y < 0 then OS.Process.success else xl (0, y)
and xl (x, y) = if x >= n then yl (y-1) else dxl (0.0, 0, x, y)
and dxl (g, dx, x, y) =
if dx < ss then dyl (g, 0, dx, x, y)
else (print (String.str (chr (Real.trunc (0.5 + g * rss'sq255))));
xl (x+1, y))
and dyl (g, dy, dx, x, y) =
if dy >= ss then dxl (g, dx+1, x, y)
else let val orig = (0.0, 0.0, ~4.0)
fun cvt (w, dw) = real (w-n2) + real dw * rss'
val dir = unitise (cvt (x, dx), cvt (y, dy), rn)
in dyl (g+ray_trace (level, light, orig, dir), dy+1, dx, x, y)
end
in print (concat ["P5\n", ns, " ", ns, "\n255\n"]); yl (n-1) end

fun main (_, arg :: _) = doit (getOpt (Int.fromString arg, 256))
| main _ = doit 256
end

Dr Jon D Harrop

unread,

Jun 5, 2005, 1:05:48 PM6/5/05

to

Matthias Blume wrote:
> Ok, I took the trouble and translated your Ocaml code to SML for use
> under SML/NJ. (MLton needs a minor adjustment.) The code is only a
> few lines longer than the original and should come in under 100loc
> when ignoring comments and blank lines.

That's great, thank you. :-)

> On an old 400 MHz Celeron, the program runs almost 30% faster than the
> original SML code that you posted earlier. I'd be interested to hear
> how it fares on the Athlon...

Using AMD64 OCaml but x86 Mlton and SML/NJ (are there AMD64 versions?)
I get:

Compiler LOC Time
Mlton 80 2.950
Mlton 88 2.571
Mlton 91 3.569
SML/NJ 80 7.487
SML/NJ 88 6.738
SML/NJ 91 3.738
ocamlopt 61 4.078
ocamlopt 67 3.845
ocamlopt 78 3.184

The tree versions are:

1. Minimal
2. Specialised "intersect" function for shadow rays
3. Implicit scene

Interestingly, the "implicit scene" Mlton implementation is much
slower. Both Mlton and SML/NJ will probably do a lot better if I can
get native AMD64 code gens working.

Cheers,
Jon.

Matthew Fluet

unread,

Jun 5, 2005, 2:20:55 PM6/5/05

to

> Using AMD64 OCaml but x86 Mlton and SML/NJ (are there AMD64 versions?)

MLton does not have a native AMD64 port at the current time.
I do not believe that SML/NJ has a native AMD64 port either.

Jon Harrop

unread,

Jun 7, 2005, 5:16:52 AM6/7/05

to

That's a great shame. ocamlopt is relatively much faster on AMD64 than on
x86. I suspect SML/NJ and Mlton could be similarly much faster on AMD64.

Here are some results for comparison between 32- and 64-bit processors:

ss=4 level=9 n=512 Athlon T-bird

Compiler LOC Run Time
smlnj 79 83.470
smlnj 84 57.112
smlnj 90 47.253
smlnj 94 45.499
ocamlopt 60 81.846
ocamlopt 65 52.961
ocamlopt 72 52.578
ocamlopt 76 47.331
g++ 106 62.597
g++ 121 42.399
g++ 130 36.419
g++ 135 35.268
mlton 79 30.679
mlton 84 23.539
mlton 90 21.647
mlton 94 18.966

ss=4 level=9 n=512 Athlon 64

Compiler LOC Run Time
smlnj 79 39.092
smlnj 84 26.444
smlnj 90 22.246
smlnj 94 21.530
ocamlopt 60 26.474
ocamlopt 65 17.950
ocamlopt 72 16.657
ocamlopt 76 16.270
g++ 106 24.395
g++ 121 16.441
g++ 130 14.841
g++ 135 14.733
mlton 79 14.827
mlton 84 10.546
mlton 90 8.682
mlton 94 8.390

I'm curious as to why x86 ocamlopt shows no improvement moving from version
2 to version 3 (specialised shadow rays), whereas all other languages on
both platforms show significant improvement.