Ray tracer update

Jon Harrop

unread,

Jun 12, 2005, 4:51:50 PM6/12/05

to

I've updated my C++ vs OCaml comparison of the cut-down ray tracer:

http://www.ffconsultancy.com/free/ray_tracer/comparison.html

1. The C++ implementation is optimised to use pass by reference and now
slightly outperforms ocamlopt.
2. I've shrunk the OCaml implementation down to 62 LOC.

I've also added a C++ vs SML comparison which shows Mlton-compiled SML
greatly outperforming C++:

http://www.ffconsultancy.com/free/ray_tracer/comparison_cpp_vs_sml.html

This application appears to be a real strong point of Mlton - x86 Mlton even
outperforms AMD64 g++ (3.4)! I guess this is because the most important
automatable optimisation is altering the representation of the scene tree.
However, the SML uses more memory and, therefore, runs out significantly
sooner. I'll try it against g++ 4.0 ASAP.

I think this ray tracer is a great benchmark because it uses so many
different features (ints, floats, tuples, records, variants, trees, lambda
functions, nested functions, purely functional recursive functions,
imperative looping functions etc.) in such a small space whilst being
performance intensive, arguably useful and even pretty!

I must say that I find SML significantly harder to work with than OCaml
though. I don't like having to type annotate things because of "+" being
overloaded. Although having "end" may be syntactically more elegant, it
adds lots of LOC which detract from the individual character of each
function. My main gripe is the unfriendliness of the compilers compared to
OCaml's compilers.

Having said that, I do like the ability to define operator associativities
and precedences and I appreciate the extra performance (although the
compile time is pretty awful!).

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com

Vesa Karvonen

unread,

Jun 13, 2005, 6:41:40 AM6/13/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> I must say that I find SML significantly harder to work with than OCaml
> though. I don't like having to type annotate things because of "+" being
> overloaded.

To avoid having to use type annotations, you can use the same
technique in SML as is being used in OCaml. Namely, you can define
separate monomorphic operators for each type. This won't help with
shootouts, but might be useful in more serious work.

> Although having "end" may be syntactically more elegant, it
> adds lots of LOC which detract from the individual character of each
> function.

Yes, especially if you nest let-constructs redundantly like in

fun create level r (c as (x, y, z)) =
let val obj = Sphere { center = c, radius = r } in
if level = 1 then obj else
let val r' = 3.0 * r / Real.Math.sqrt 12.0 in
let fun aux x' z' =
create (level-1) (0.5 * r) (x-x', y+r', z+z') in
let val objs = [aux (~r') (~r'), aux r' (~r'),
aux (~r') r', aux r' r', obj] in
Group ({ center = c, radius = 3.0 * r }, objs)
end
end
end
end

The above can be rewritten to

fun create level r (c as (x, y, z)) =
let val obj = Sphere { center = c, radius = r } in
if level = 1 then obj else
let val r' = 3.0 * r / Real.Math.sqrt 12.0
fun aux x' z' =
create (level-1) (0.5 * r) (x-x', y+r', z+z')
val objs = [aux (~r') (~r'), aux r' (~r'),
aux (~r') r', aux r' r', obj] in
Group ({ center = c, radius = 3.0 * r }, objs)
end
end

The semantics is the same. (And I've noted about this earlier.)

I can see that you have also noticed that let-bindings are usually
longer than equivalent function calls as in

(fn t1 => if t1 > 0.0 then t1 else t2) (b - disc)

In more general form, as long as let-polymorphism is not needed, you
can rewrite

let val x = y in z end

as

(fn x => z) y

A direct function application may be more difficult to read than
an equivalent let-construct due to the "backwards" order of the
subexpressions. I sometimes use a "pipe" operator

infix >| fun x >| f = f x

to make nested unary function applications easier to read (though
opinions might vary). In general,

(fN ... (f2 (f1 x)) ...)

can be rewritten to

f1 x >| f2 >| ... >| fN

which can sometimes be more natural.

Using the above pipe operator you can rewrite

let val x = y in z end

as

y >| (fn x => z)

as long as let-polymorphism is not needed.

> My main gripe is the unfriendliness of the compilers compared to
> OCaml's compilers.

It might help compiler developers if you would describe the problems
in more detail.

-Vesa Karvonen

alex goldman

unread,

Jun 13, 2005, 6:49:08 AM6/13/05

to

Vesa Karvonen wrote:

[...]

I'm bothered by the absence of (1) 'for' and (2) macros in SML.

With macros, I could at least define my own 'for'. But the absence of both
of these requires some really unnatural-looking code, especially where in
OCaml one would use nested for's. Do you have any advice for how to make
SML more palatable for my taste?

Vesa Karvonen

unread,

Jun 13, 2005, 8:49:09 AM6/13/05

to

The obvious technique is to define a higher-order for-function.
Whether the result is sweet enough is a matter of taste. Yes, I also
miss macros. Higher-order functions can not completely replace
compile-time expanded macros.

At any rate, the following is a for-function variation that tries to
mimic the OCaml for-loop.

datatype for_direction = to of int * int
| downto of int * int

infix to downto

val for =
fn a to b =>
(fn f => let fun lp a = if a>b then () else (f a; lp (a+1))
in lp a end)
| a downto b =>
(fn f => let fun lp a = if a<b then () else (f a; lp (a-1))
in lp a end)

Now

for (1 to 9) (fn i => print (Int.toString i))

prints

123456789

and

for (9 downto 1) (fn i => print (Int.toString i))

prints

987654321

(I purposely didn't use (print o Int.toString) above.)

Nested uses could be formatted like this

for (a to b)
(fn i =>
for (c to d)
(fn j =>
...))

which arguably isn't completely unreadable.

Of course, there is an infinite number of variations. The above is
just one way to write a for-function.

In general, I would probably recommend going for the simplest looping
function that fits the bill, but the following is the beginnings of a
for-loop combinator library.

First a signature to show the types of the combinators:

signature FOR =
sig
type 'a for = ('a -> unit) -> unit

val to : int * int -> int for
val downto : int * int -> int for

val && : 'a for * 'b for -> ('a * 'b) for

val for : 'a for -> ('a -> unit) -> unit
end

Note the type of for above. The for-function is trivial and only used as
syntactic sugar.

Here is an implementation of the above signature

structure For :> FOR =
struct
infix 1 to downto
infixr 0 &&

type 'a for = ('a -> unit) -> unit

fun a to b =
(fn f => let fun lp a = if a>b then () else (f a; lp (a+1))
in lp a end)

fun a downto b =
(fn f => let fun lp a = if a<b then () else (f a; lp (a-1))
in lp a end)

fun ga && gb = fn fab => ga (fn a => gb (fn b => fab (a, b)))

fun for g f = g f
end

To use the constructs one needs to open the For-structure

open For

and declare the infix status of the operators

infix 1 to downto
infixr 0 &&

Here is a three dimensional loop

for (1 to 3 && 9 downto 8 && 4 to 5)
(fn (i, (j, k)) =>
print ("("^Int.toString i^","^Int.toString j^","^Int.toString k^")"))

that prints

(1,9,4)(1,9,5)(1,8,4)(1,8,5)(2,9,4)(2,9,5)(2,8,4)(2,8,5)(3,9,4)(3,9,5)(3,8,4)(3,8,5)

The above loop shows why I used a right-associative operator &&.
Namely, the pattern (i, (j, k)) is arguably easier to write than
((i, j), k). (Some monadic techniques might help to avoid the ugly
nesting.)

DISCLAIMER: I just wrote the above for this post and I'm not
suggesting that this would be a particularly good way to go about
for-loops in SML. I don't use a lof of for-loops in my SML
programming.

-Vesa Karvonen

Vesa Karvonen

unread,

Jun 13, 2005, 9:36:19 AM6/13/05

to

Vesa Karvonen <vesa.k...@cs.helsinki.fi> wrote:
[...]

> Here is a three dimensional loop

> for (1 to 3 && 9 downto 8 && 4 to 5)
> (fn (i, (j, k)) =>
> print ("("^Int.toString i^","^Int.toString j^","^Int.toString k^")"))

> that prints

> (1,9,4)(1,9,5)(1,8,4)(1,8,5)(2,9,4)(2,9,5)(2,8,4)(2,8,5)(3,9,4)(3,9,5)(3,8,4)(3,8,5)

> The above loop shows why I used a right-associative operator &&.
> Namely, the pattern (i, (j, k)) is arguably easier to write than
> ((i, j), k). (Some monadic techniques might help to avoid the ugly
> nesting.)

Silly me. I finally realized that it is easy to avoid the ugly
syntactic nesting with a home brewn product datatype and an infix
constructor. Here is an updated for-loop example:

signature FOR =
sig
type 'a for

datatype ('a, 'b) product = & of 'a * 'b

val to : int * int -> int for
val downto : int * int -> int for

val && : 'a for * 'b for -> ('a, 'b) product for

val for : 'a for -> ('a -> unit) -> unit
end

structure For :> FOR =

struct
infix 1 to downto

infix 0 && &

type 'a for = ('a -> unit) -> unit

datatype ('a, 'b) product = & of 'a * 'b

fun a to b =
(fn f => let fun lp a = if a>b then () else (f a; lp (a+1))
in lp a end)

fun a downto b =
(fn f => let fun lp a = if a<b then () else (f a; lp (a-1))
in lp a end)

fun ga && gb = fn fab => ga (fn a => gb (fn b => fab (a & b)))

fun for g f = g f
end

open For

infix 1 to downto
infix 0 && &

val () =

for (1 to 3 && 9 downto 8 && 4 to 5)

(fn i & j & k =>

print ("("^Int.toString i^","^Int.toString j^","^Int.toString k^")"))

-Vesa Karvonen

Matthew Fluet

unread,

Jun 13, 2005, 9:45:28 AM6/13/05

to

Vesa Karvonen wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
> [...]
>

>>Although having "end" may be syntactically more elegant, it
>>adds lots of LOC which detract from the individual character of each
>>function.
>
> Yes, especially if you nest let-constructs redundantly like in

I think that Jon's current code on the website does much better in this
regard than his earlier posted code. The fact that "create" function
still has this redundancy is probably an oversight.

<personal opinion>
I actually find that the "let .. in .. end" increases the readability of
SML code. I tend to format the construct as

let
val x1 = ...
...
val xn = ...
in
e
end

which I find helpful in distinguishing the declarations from the expression.

In OCaml, in the absense of syntax highlighting, I find it difficult to
navigate through

let x1 = ... in
...
let xn = ... in
e

particularly when e is a non-trivial (multi-line) expression.
</personal opinion>

The above opinion is certainly not shared by everyone, and many SML
programmers don't format "let .. in .. end" as I do. I give it only to
suggest that there are other benefits to the construct, particularly if
it is used on it's own, rather than to mimic OCaml's "let .. in e".

>>My main gripe is the unfriendliness of the compilers compared to
>>OCaml's compilers.
>
> It might help compiler developers if you would describe the problems
> in more detail.

Seconded. We have attempted to make MLton's command-line interface
straight-forward. I don't see how one can improve up on "mlton ray.sml"
to get an executable named "ray".

Jon Harrop wrote:
> Having said that, I do like the ability to define operator
> associativities and precedences

This is indeed a useful feature, but it's actually not as convenient as
it initially appears. The major limitation is that infixity directives
are associated with a syntactic scope, not with an identifier. This
means that when you modularize you program:

structure Vec3D = struct

type vec = real * real * real

infix 7 *|
infix 6 +| -|

fun s *| (x, y, z) : vec = (s*x, s*y, s*z)
fun (x1, y1, z1) +| (x2, y2, z2) : vec = (x1+x2, y1+y2, z1+z2)
fun (x1, y1, z1) -| (x2, y2, z2) : vec = (x1-x2, y1-y2, z1-z2)
end

The scope of the fixity directives is the Vec3D structure. Furthermore,
the fixity is not caught up in the environment of the structure, so
every client must reassert the fixity:

structure Client = struct
open Vec3D
infix 7 *|
infix 6 +| -|

fun foo a b c = ... a *| b +| c
end

AliceML (http://www.ps.uni-sb.de/alice/) extends SML in numerous ways,
including allowing fixity to be associated with identifiers in a module
(http://www.ps.uni-sb.de/alice/manual/modules.html#fixity).

> and I appreciate the extra performance
> (although the compile time is pretty awful!).

You get what you pay for!
Seriously, though, it depends on what your usage scenario looks like.

For example, compiling MLton from clean sources with MLton is
significantly faster than compiling MLton from clean sources with
SML/NJ. Also, depending on where in the dependency graph you make a
change, it can be significantly faster to type-check the MLton sources
with MLton than with SML/NJ.

So, I think MLton makes perfect sense for deliverables, whether binary
or source. Most of the software I download (for Linux) is in source
form, where I compile from clean sources, never intending to modify the
sources.

Now, I will readily admit that if you are in the middle of the
edit-compile-debug loop, then MLton's compile times can be daunting.
Then it makes sense to develop with another compiler.

Matthew Fluet

unread,

Jun 13, 2005, 9:52:40 AM6/13/05

to

Jon Harrop wrote:
> I've also added a C++ vs SML comparison which shows Mlton-compiled SML
> greatly outperforming C++:
>
> http://www.ffconsultancy.com/free/ray_tracer/comparison_cpp_vs_sml.html

Please note that the name of the whole-program, optimizing SML compiler
is "MLton" (capital-M, capital-L), and not "Mlton".

Vesa Karvonen

unread,

Jun 13, 2005, 11:28:35 AM6/13/05

to

Matthew Fluet <mfl...@acm.org> wrote:
[...]
> <personal opinion> [...] I tend to format the construct as

> let
> val x1 = ...
> ...
> val xn = ...
> in
> e
> end

Me too. (Because the above form is arguably the easiest to edit and
edits lead to easy-to-read diffs.)

> In OCaml, in the absense of syntax highlighting, I find it difficult to
> navigate through

> let x1 = ... in
> ...
> let xn = ... in
> e

The first ML dialect I used was Ocaml and when I started using SML
I missed the clarity of the Ocaml let-constructs. Namely,

let a = b in
let c = d in
... in
z

makes both the order of execution and the scope of bindings
particularly clear. It took me a couple of days of SML programming
to unlearn the Ocaml idiom and understand the general form of the
SML let-construct. The syntax of Ocaml let-constructs feels a bit
more "combinatory" than the SML let-construct and may perhaps (at
least in my experience) be easier to understand (one form at a
time). I particularly like the Ocaml syntax for non-recursive
function bindings

let name args = body in expr

In SML one either writes

let fun name args = body in expr end

and the reader needs to examine the body to see if the function is
recursive (it can sometimes be useful to immediately see that a
function is not recursive) or one must write

let val name = fn arg1 => fn arg2 => ... fn argN => body in expr end

in case the function definition is layered on top of a previous
binding by the name name (in other words, name appears free in
body).

The above also illustrates the lack of a convenient built-in syntax
to declare anonymous curried functions in SML.

At any rate, I don't find these syntactic trade-offs extremely
important. It would be easy to improve upon the syntax of both Ocaml
and SML (along a suitable metric). I particularly envy the
conciseness of the Haskell syntax for lambdas.

-Vesa Karvonen

Andreas Rossberg

unread,

Jun 13, 2005, 11:29:38 AM6/13/05

to

Matthew Fluet wrote:
>
> AliceML (http://www.ps.uni-sb.de/alice/) extends SML in numerous ways,
> including allowing fixity to be associated with identifiers in a module
> (http://www.ps.uni-sb.de/alice/manual/modules.html#fixity).

Early versions of SML/NJ already had this feature, but it was removed
after 0.93 (probably because it has pretty ugly implications on the
compiler internals, and moreover leads to a minor incompatibility with
the Standard).

--
Andreas Rossberg, ross...@ps.uni-sb.de

Let's get rid of those possible thingies! -- TB

Matthew Fluet

unread,

Jun 13, 2005, 11:58:46 AM6/13/05

to

Andreas Rossberg wrote:
> Matthew Fluet wrote:
>
>>
>> AliceML (http://www.ps.uni-sb.de/alice/) extends SML in numerous ways,
>> including allowing fixity to be associated with identifiers in a module
>> (http://www.ps.uni-sb.de/alice/manual/modules.html#fixity).
>
>
> Early versions of SML/NJ already had this feature, but it was removed
> after 0.93 (probably because it has pretty ugly implications on the
> compiler internals, and moreover leads to a minor incompatibility with
> the Standard).

And I would be remiss not to point out that MLton's facility for
programming in the very large, the ML Basis system
(http://mlton.org/MLBasis) allows top-level fixity directives to be
caught up in the denotation of a .mlb file, adding the directives to any
file elaborated in the denotation of the .mlb file. See
http://mlton.org/pages/MLBasis/attachments/mlb-formal.pdf for the formal
details and http://mlton.org/MLBasisExamples for an informal example.

Jon Harrop

unread,

Jun 13, 2005, 1:27:18 PM6/13/05

to

Vesa Karvonen wrote:
> To avoid having to use type annotations, you can use the same
> technique in SML as is being used in OCaml. Namely, you can define
> separate monomorphic operators for each type. This won't help with
> shootouts, but might be useful in more serious work.

Yes, good idea. If I can find the time then I'll have a more serious go at
writing some SML code.

Note that I've basically given up on the shootout - I'm going to do this
separately and, IMHO, "properly". So such things are perfectly acceptable.

[let ... end]

> Yes, especially if you nest let-constructs redundantly like in

> ...

> The semantics is the same. (And I've noted about this earlier.)

Yes indeed, thank you. I'll change the code ASAP.

> A direct function application may be more difficult to read than
> an equivalent let-construct due to the "backwards" order of the
> subexpressions. I sometimes use a "pipe" operator
>
> infix >| fun x >| f = f x
>
> to make nested unary function applications easier to read (though
> opinions might vary). In general,
>
> (fN ... (f2 (f1 x)) ...)
>
> can be rewritten to
>
> f1 x >| f2 >| ... >| fN
>
> which can sometimes be more natural.

I like that. Mathematica has "//" which does the same thing. I've tried
similar things in OCaml but never settled on any of them for production
code.

>> My main gripe is the unfriendliness of the compilers compared to
>> OCaml's compilers.
>
> It might help compiler developers if you would describe the problems
> in more detail.

Yes, I shall try to make a compendium of example programs and errors.

In the mean time I can list some points about SML itself (not
Mlton-specific):

1. Not having capitalised constructors worries me, although I've never had a
problem (probably because I capitalise them anyway). Could a compiler flag
be added to warn of non-capitalised constructors? That would essentially
solve that problem.

2. I prefer explicit "rec", not implicit as in SML's "fun". I think this
only really serves to heighten the "activation barrier" for newbies.

3. No standard way to compile a Standard ML program - MLton- and
SML-compiled programs must be different. I was also surprised to find no
mach_eps in SML/NJ (perhaps more surprisingly, pre 1.4 Java also lacks
this).

4. I currently prefer separate "+" and "+." (but I like OCaml's overloaded
printf), although this might change as I grow more accustomed to SML.
However, I have been bitten by different inferred types from SML/NJ and
Mlton due to overloaded "+".

5. I miss "for" loops. Although these can be implemented in the style of
SML/NJ, they seem to result in awful indentation. (Update: I'll read the
latest post on this)

6. I think a single precision float type for storage in general data
structures is a good idea (this makes the x86 C++ version of my ray tracer
twice as fast with only trivial changes). There may already be one in SML
that I've missed. OCaml only has big arrays.

7. After all the time and effort spent making SML itself very robust,
comparatively little effort seems to have been placed on making it
interface safely with non-SML libraries. I can see why, historically, but
it is a significant thorn in the idea of teaching SML as a general purpose
programming language on an undergrad course.

As well as the problems listed in Appel's critique of SML.

Of course, SML is still one of the best languages that I've come across and
I'm hugely impressed with its definition, implementations and practical
application. So my overall feeling for SML is extremely positive, which
might not come across in these criticisms.

I'll have a go at translating the examples from my OCaml book when I find
the time.

Jon Harrop

unread,

Jun 13, 2005, 1:40:26 PM6/13/05

to

Matthew Fluet wrote:
> I think that Jon's current code on the website does much better in this
> regard than his earlier posted code. The fact that "create" function
> still has this redundancy is probably an oversight.

Thank you. I am trying. :-)

> <personal opinion>
> ...

> In OCaml, in the absense of syntax highlighting, I find it difficult to
> navigate through
>
> let x1 = ... in
> ...
> let xn = ... in
> e
>
> particularly when e is a non-trivial (multi-line) expression.
> </personal opinion>

I am obviously very biased here, so I shan't bother offering my contrary
opinion. ;-)

However, I think colour syntax highlighting should be considered a
requirement. Indeed, I think a GUI IDE should be a requirement. For
example, if you want to overload "+" then I think the editor should colour
the symbol according to the inferred type.

>> It might help compiler developers if you would describe the problems
>> in more detail.
>
> Seconded. We have attempted to make MLton's command-line interface
> straight-forward. I don't see how one can improve up on "mlton ray.sml"
> to get an executable named "ray".

I shall endeavour to describe my gripe in more detail at some point in the
future. For now, I certainly wasn't referring to the "mlton ray.sml" which
is a flawless way to invoke a compiler!

> Jon Harrop wrote:
> > Having said that, I do like the ability to define operator
> > associativities and precedences
>
> This is indeed a useful feature, but it's actually not as convenient as
> it initially appears. The major limitation is that infixity directives
> are associated with a syntactic scope, not with an identifier. This
> means that when you modularize you program:

> ...

Yes, I've read about this. It seems like a very severe shortcoming to me. I
would argue that, although this "spoils modularity", it could addressed in
the future by programming in a more friendly IDE which provides a trivial
way to view related definitions. Indeed, that would completely change the
notion of "local" code. I've always thought that it is silly to code
vertically and computer scientists always jabber on about the asymptotic
benefits of trees... ;-)

> ...

> You get what you pay for!

LOL. :-)

> ...

> Now, I will readily admit that if you are in the middle of the
> edit-compile-debug loop, then MLton's compile times can be daunting.
> Then it makes sense to develop with another compiler.

I would be more than happy to do that if the SML/NJ and MLton versions of my
ray tracer were identical. Of course, they are not and I am using a bash
script to generate them from the same core code. That is for a <100 LOC
program. I assume it only gets worse for bigger programs.

Vesa Karvonen

unread,

Jun 13, 2005, 9:39:29 PM6/13/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> I've also added a C++ vs SML comparison which shows Mlton-compiled SML
> greatly outperforming C++:

> http://www.ffconsultancy.com/free/ray_tracer/comparison_cpp_vs_sml.html
[...]

Noting that the SML code snippets are not syntax highlighted, I
suggest taking a look here

http://mlton.org/Enscript

for scripts for syntax highlighting SML code.

> I must say that I find SML significantly harder to work with than

> OCaml though. [...]

And on the C++ vs SML page it is said that:

"SML is related to the OCaml language which, whilst not as fast,
is significantly more concise and easier to learn."

As I've expressed earlier, I'm not sure how significant the
difference really is. Looking at both the Ocaml and SML versions, it
seems that the extra lines of the SML version are essentially due to
the end-keyword (I'm counting 11 lines). If those lines are ignored,
the difference is just 6 lines (68 - 62) or less than 10%. A couple
of lines could easily be trimmed from the SML version.

Extracting a value from an option, as in

(case Int.fromString s of
SOME n => n | _ => 6)

is so common that the SML Basis Library contains the function getOpt
to do it, as in

getOpt (Int.fromString s, 6)

This would eliminate one line from the SML version.

The SML code contains

val n = 512
val ss = 4
...
val light = unitise (~1.0, ~3.0, 2.0)

while the Ocaml code contains

let n = 512 and ss = 4 and light = unitise {x= -1.; y= -3.; z=2.} in

You could just as well write

val n = 512 val ss = 4 val light = unitise (~1.0, ~3.0, 2.0)

in SML. This would eliminate 2 lines from the SML version.

One could also argue that it doesn't really help to define

val infinity = Real.posInf

and one could just refer to Real.posInf directly.

The main differences between the SML and Ocaml versions are in the
main program. After parsing the arguments, the SML version uses a
loop-function, while the Ocaml version uses the built-in
for-construct. With a simple reusable combinator library for looping
(as shown in one my earlier post on this thread), one could reduce
the number of lines required for the nested loops in both languages.

It seems to me that to make language shootouts more accurate, it
might make sense to allow different shootout entries for a language
to share libraries as long as each shared library provides a
generally useful facility (somewhat subjective) and is used in at
least two separate entries. This would reduce the advantage of
languages that have large "standard" libraries and/or lots of
syntactic sugar. I think that it might also lead to more natural
looking programs.

-Vesa Karvonen

Daniel C. Wang

unread,

Jun 13, 2005, 10:54:52 PM6/13/05

to Jon Harrop

Jon Harrop wrote:
{stuff deleted}

>>>My main gripe is the unfriendliness of the compilers compared to
>>>OCaml's compilers.
>>
>>It might help compiler developers if you would describe the problems
>>in more detail.
>
>
> Yes, I shall try to make a compendium of example programs and errors.
>
> In the mean time I can list some points about SML itself (not
> Mlton-specific):
>
> 1. Not having capitalised constructors worries me, although I've never had a
> problem (probably because I capitalise them anyway). Could a compiler flag
> be added to warn of non-capitalised constructors? That would essentially
> solve that problem.

I think this is a fair and common gripe. However, the SML match compiler
is good at spotting errors when you confusing constructors. I find it
unfortunate that the OCaml pattern match semantics aren't as strong as
SML. In particular if I'm not mistaken it's impossible for Ocaml's match
compiler to catch redundant or non-exhaustive matches.

> 2. I prefer explicit "rec", not implicit as in SML's "fun". I think this
> only really serves to heighten the "activation barrier" for newbies.

SML does have "val rec"

val rec fib = (fn 0 => 0 | 1 => 0 | n => fib(n-1) + fib(n-2))

not the most pretty I agree.

> 3. No standard way to compile a Standard ML program - MLton- and
> SML-compiled programs must be different. I was also surprised to find no
> mach_eps in SML/NJ (perhaps more surprisingly, pre 1.4 Java also lacks
> this).

Well this is not exactly fair, since there's only one OCaml system
around. :)

> 4. I currently prefer separate "+" and "+." (but I like OCaml's overloaded
> printf), although this might change as I grow more accustomed to SML.
> However, I have been bitten by different inferred types from SML/NJ and
> Mlton due to overloaded "+".

You can define your own set of +. constants and define the appropriate
infix operators if you like that style.

> 5. I miss "for" loops. Although these can be implemented in the style of
> SML/NJ, they seem to result in awful indentation. (Update: I'll read the
> latest post on this)

> 6. I think a single precision float type for storage in general data
> structures is a good idea (this makes the x86 C++ version of my ray tracer
> twice as fast with only trivial changes). There may already be one in SML
> that I've missed. OCaml only has big arrays.

MLton supports single precision float types. Single precision float
types are an "optional" part of the Standard Basis. (I have a gripe that
the Standard Basis has too many optional parts...)

> 7. After all the time and effort spent making SML itself very robust,
> comparatively little effort seems to have been placed on making it
> interface safely with non-SML libraries. I can see why, historically, but
> it is a significant thorn in the idea of teaching SML as a general purpose
> programming language on an undergrad course.

There's been quite a bit of work in this area. MLTon's FFI is pretty
straightforward. The new FFI of SML/NJ is pretty sophisticated. Other
can chime in... what I know about the OCaml FFI makes me feel like it's
not soo much better.

> As well as the problems listed in Appel's critique of SML.
>
> Of course, SML is still one of the best languages that I've come across and
> I'm hugely impressed with its definition, implementations and practical
> application. So my overall feeling for SML is extremely positive, which
> might not come across in these criticisms.

I've finally found my self in a position that makes me have to program
in C++. I dearly miss SML! SML has it's warts.. but most of the warts
are erring on the side of clean simple semantics. Some times the clean
and simple thing is a bit verbose and at times clunky. C++ seems to err
on the side of making things very concise, but god knows what going on!

Jon Harrop

unread,

Jun 13, 2005, 11:17:54 PM6/13/05

to

Daniel C. Wang wrote:
> Jon Harrop wrote:
> {stuff deleted}

>> 1. Not having capitalised constructors worries me, although I've never
>> had a problem (probably because I capitalise them anyway). Could a
>> compiler flag be added to warn of non-capitalised constructors? That
>> would essentially solve that problem.
>
> I think this is a fair and common gripe. However, the SML match compiler
> is good at spotting errors when you confusing constructors. I find it
> unfortunate that the OCaml pattern match semantics aren't as strong as
> SML. In particular if I'm not mistaken it's impossible for Ocaml's match
> compiler to catch redundant or non-exhaustive matches.

You mean non-exhaustive matches like this:

# function Some _ -> 1;;
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
None
- : 'a option -> int = <fun>
#

and redundant matches like this:

# function _ -> 1 | _ -> 2;;
Warning: this match case is unused.
- : 'a -> int = <fun>
#

>> 2. I prefer explicit "rec", not implicit as in SML's "fun". I think this
>> only really serves to heighten the "activation barrier" for newbies.
>
> SML does have "val rec"
>
> val rec fib = (fn 0 => 0 | 1 => 0 | n => fib(n-1) + fib(n-2))
>
> not the most pretty I agree.

I think the implicitly rec "fun" is more confusing, for a newbie.

>> 3. No standard way to compile a Standard ML program - MLton- and
>> SML-compiled programs must be different. I was also surprised to find no
>> mach_eps in SML/NJ (perhaps more surprisingly, pre 1.4 Java also lacks
>> this).
>
> Well this is not exactly fair, since there's only one OCaml system
> around. :)

Well, I compile to bytecode for fast tests and native code for production
output. Mlton produces very fast code but is an order of magnitude slower
to compile to native code than ocamlopt so I want to have a fast compiler.
SML/NJ would fit the bill, only it requires programs to be executed in a
different way.

>> 6. I think a single precision float type for storage in general data
>> structures is a good idea (this makes the x86 C++ version of my ray
>> tracer twice as fast with only trivial changes). There may already be one
>> in SML that I've missed. OCaml only has big arrays.
>
> MLton supports single precision float types. Single precision float
> types are an "optional" part of the Standard Basis. (I have a gripe that
> the Standard Basis has too many optional parts...)

I see. How do I create a single precision float? Is it Real<32>.real or
something?

>> 7. After all the time and effort spent making SML itself very robust,
>> comparatively little effort seems to have been placed on making it
>> interface safely with non-SML libraries. I can see why, historically, but
>> it is a significant thorn in the idea of teaching SML as a general
>> purpose programming language on an undergrad course.
>
> There's been quite a bit of work in this area. MLTon's FFI is pretty
> straightforward. The new FFI of SML/NJ is pretty sophisticated. Other
> can chime in... what I know about the OCaml FFI makes me feel like it's
> not soo much better.

Oh no, OCaml's FFI is not any better. That was a generic ML gripe. :-)

Seriously though, and I know I keep saying this, I think great OpenGL
bindings would be incredibly beneficial.

>> Of course, SML is still one of the best languages that I've come across
>> and I'm hugely impressed with its definition, implementations and
>> practical application. So my overall feeling for SML is extremely
>> positive, which might not come across in these criticisms.
>
> I've finally found my self in a position that makes me have to program
> in C++. I dearly miss SML! SML has it's warts.. but most of the warts
> are erring on the side of clean simple semantics. Some times the clean
> and simple thing is a bit verbose and at times clunky. C++ seems to err
> on the side of making things very concise, but god knows what going on!

Absolutely. My tip is to write ML code to generate C++. ;-)

Jon Harrop

unread,

Jun 13, 2005, 11:18:39 PM6/13/05

to

Vesa Karvonen wrote:
> Noting that the SML code snippets are not syntax highlighted, I
> suggest taking a look here
>
> http://mlton.org/Enscript
>
> for scripts for syntax highlighting SML code.

Will do, thanks.

> "SML is related to the OCaml language which, whilst not as fast,
> is significantly more concise and easier to learn."
>
> As I've expressed earlier, I'm not sure how significant the
> difference really is. Looking at both the Ocaml and SML versions, it
> seems that the extra lines of the SML version are essentially due to
> the end-keyword (I'm counting 11 lines).

Yes, the "end" keyword is the main cause of the verbosity. In previous
versions, working around the lack of a for loop was taking up space but my
latest incarnation is actually shorter (albeit significantly obfuscated).

Following other people's posts, it seems that SML programmers typically use
a much more verbose style than OCaml programmers. Consequently, I believe
my current implementations use an unusually concise SML style.

> Extracting a value from an option, as in
>
> (case Int.fromString s of
> SOME n => n | _ => 6)
>
> is so common that the SML Basis Library contains the function getOpt
> to do it, as in
>
> getOpt (Int.fromString s, 6)
>
> This would eliminate one line from the SML version.

Great.

> The SML code contains
>
> val n = 512
> val ss = 4
> ...
> val light = unitise (~1.0, ~3.0, 2.0)
>
> while the Ocaml code contains
>
> let n = 512 and ss = 4 and light = unitise {x= -1.; y= -3.; z=2.} in
>
> You could just as well write
>
> val n = 512 val ss = 4 val light = unitise (~1.0, ~3.0, 2.0)
>
> in SML. This would eliminate 2 lines from the SML version.

I didn't realise you should do "val ... val ...". I tried "and", as in
OCaml, but it didn't work. Is this typical coding style in SML though? I
get the impression it isn't. I'd say it is in OCaml.

Incidentally, is there something wrong with my use of "infix", as the
default emacs mode indents all subsequent lines enormously?

> One could also argue that it doesn't really help to define
>
> val infinity = Real.posInf
>
> and one could just refer to Real.posInf directly.

Yes, I only did that to make the different languages easier to compare.

> The main differences between the SML and Ocaml versions are in the
> main program. After parsing the arguments, the SML version uses a
> loop-function, while the Ocaml version uses the built-in
> for-construct. With a simple reusable combinator library for looping
> (as shown in one my earlier post on this thread), one could reduce
> the number of lines required for the nested loops in both languages.

Are you suggesting that the implementation of a "for" loop should not be
counted? I'd expect the justification for that to be "for loops are common"
in which case their omission from a language is likely to be of interest to
people.

Note that this is effectively taken into account by the language comparison
if you look at the increase in line count between versions.

> It seems to me that to make language shootouts more accurate, it
> might make sense to allow different shootout entries for a language
> to share libraries as long as each shared library provides a
> generally useful facility (somewhat subjective) and is used in at
> least two separate entries. This would reduce the advantage of
> languages that have large "standard" libraries and/or lots of
> syntactic sugar. I think that it might also lead to more natural
> looking programs.

Yes, perhaps. I think that will require too much work for me to do it on
these pages though.

Matthias Blume

unread,

Jun 14, 2005, 12:42:19 AM6/14/05

to

"Daniel C. Wang" <danw...@gmail.com> writes:

>> 3. No standard way to compile a Standard ML program - MLton- and
>> SML-compiled programs must be different. I was also surprised to find no
>> mach_eps in SML/NJ (perhaps more surprisingly, pre 1.4 Java also lacks
>> this).
>
> Well this is not exactly fair, since there's only one OCaml system
> around. :)

The Standard ML Basis library has Real.nextAfter, which is more
general than mach_eps. SML/NJ, unfortunately, has not yet implemented
this part of the spec. At best, this is a "quality of implementation"
issue. It has nothing to do with language comparisons.
(MLton does implement nextAfter.)

>> 6. I think a single precision float type for storage in general data
>> structures is a good idea (this makes the x86 C++ version of my ray tracer
>> twice as fast with only trivial changes). There may already be one in SML
>> that I've missed.

Again, this has nothing to do with SML (the language). There is
nothing (other than lack of developer time) that would prevent an
implementation from providing 32-bit floats. In fact, there are plans
to add these to SML/NJ.

Matthias

Matthias Blume

unread,

Jun 14, 2005, 12:56:54 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> I think the implicitly rec "fun" is more confusing, for a newbie.

Why? I have never ever had a problem with this. In fact, I'd expect
that most "newbie"s would have a problem with a non-recursive
version. After all, in nearly every other language out there, a
function definition is by default recursive (and often not just by
default but simply always).

>>> 3. No standard way to compile a Standard ML program - MLton- and
>>> SML-compiled programs must be different. I was also surprised to find no
>>> mach_eps in SML/NJ (perhaps more surprisingly, pre 1.4 Java also lacks
>>> this).
>>
>> Well this is not exactly fair, since there's only one OCaml system
>> around. :)
>
> Well, I compile to bytecode for fast tests and native code for production
> output. Mlton produces very fast code but is an order of magnitude slower
> to compile to native code than ocamlopt so I want to have a fast compiler.
> SML/NJ would fit the bill, only it requires programs to be executed in a
> different way.

I think it is trivial to take any SML/NJ standalone program and turn
it into an MLton standalone with by just adding the same boilerplate
(and changing maybe one name -- the name of the "main" function).

Is this really such a big deal?

>>> 6. I think a single precision float type for storage in general data
>>> structures is a good idea (this makes the x86 C++ version of my ray
>>> tracer twice as fast with only trivial changes). There may already be one
>>> in SML that I've missed. OCaml only has big arrays.
>>
>> MLton supports single precision float types. Single precision float
>> types are an "optional" part of the Standard Basis. (I have a gripe that
>> the Standard Basis has too many optional parts...)
>
> I see. How do I create a single precision float? Is it Real<32>.real or
> something?

As a literal? You just write it down (and, if necessary, add a type
constraint somewhere to Real32.real). If you want to convert from
int, use Real32.fromInt. If you want to convert from 64-bit real, use
Real32.fromLarge <roundingmode> o Real64.toLarge where <roundingmode>
is of type IEEEReal.rounding_mode (i.e., one of TO_NEAREST, TO_NEGINF,
TO_POSINF, TO_ZERO).

All this and more here:

http://www.standardml.org/Basis
http://www.standardml.org/Basis/real.html

or, even better, in the Gansner/Reppy book (ISBN 0521794781).

Matthias

Matthias Blume

unread,

Jun 14, 2005, 1:09:25 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Yes, the "end" keyword is the main cause of the verbosity.

I don't quite understand why you are so hung up on line counts. Of
course, I am biased, but I really like the "end" keyword. Last
quarter I have written a compiler for a dialect of ML (and made my
students write such a compiler, too) which uses an OCaml-inspired
style. The bottom line: I hated it because it lacks a clearly visible
scope delimiter.

> In previous versions, working around the lack of a for loop was
> taking up space but my latest incarnation is actually shorter
> (albeit significantly obfuscated).

I personally never seem to need "for" -- except when I transliterate
your code into SML. :-) But that has probably more to do with the fact
that I normally don't write code that does heavy array processing.

>> The SML code contains
>>
>> val n = 512
>> val ss = 4
>> ...
>> val light = unitise (~1.0, ~3.0, 2.0)
>>
>> while the Ocaml code contains
>>
>> let n = 512 and ss = 4 and light = unitise {x= -1.; y= -3.; z=2.} in
>>
>> You could just as well write
>>
>> val n = 512 val ss = 4 val light = unitise (~1.0, ~3.0, 2.0)
>>
>> in SML. This would eliminate 2 lines from the SML version.
>
> I didn't realise you should do "val ... val ...". I tried "and", as in
> OCaml, but it didn't work.

Why did "and" not work? It should (but it means something slightly
different).

> Is this typical coding style in SML though? I
> get the impression it isn't. I'd say it is in OCaml.

The "val ... val ..." thing is definitely typical SML style -- except
one normally would put each "val ..." on its own line (unless one is
squeezing to beat a loc limit :-) ).

> Are you suggesting that the implementation of a "for" loop should not be
> counted? I'd expect the justification for that to be "for loops are common"
> in which case their omission from a language is likely to be of interest to
> people.

But "for" loops aren't common. :-) (see above)

> Note that this is effectively taken into account by the language comparison
> if you look at the increase in line count between versions.
>
>> It seems to me that to make language shootouts more accurate, it
>> might make sense to allow different shootout entries for a language
>> to share libraries as long as each shared library provides a
>> generally useful facility (somewhat subjective) and is used in at
>> least two separate entries. This would reduce the advantage of
>> languages that have large "standard" libraries and/or lots of
>> syntactic sugar. I think that it might also lead to more natural
>> looking programs.

I think that some shootout programs have already done that. I have
seen the moral equivalent of

open ShootoutLib

in certain programs that I reviewed at the shootout site.

Matthias

William Lovas

unread,

Jun 14, 2005, 4:07:48 AM6/14/05

to

On 2005-06-14, Jon Harrop <use...@jdh30.plus.com> wrote:

> Daniel C. Wang wrote:
>> I think this is a fair and common gripe. However, the SML match compiler
>> is good at spotting errors when you confusing constructors. I find it
>> unfortunate that the OCaml pattern match semantics aren't as strong as
>> SML. In particular if I'm not mistaken it's impossible for Ocaml's match
>> compiler to catch redundant or non-exhaustive matches.
>
> You mean non-exhaustive matches like this:

> [...]

Yeah -- where did this myth come from? I'm pretty sure this isn't the
first time i've heard it stated that O'Caml can't catch redundant or
non-exhaustive patterns, but it's done both for as long as i've used the
language (~5 years). Were there earlier versions that famously couldn't?

In fact, while we're on the subject, notice how O'Caml even gives you an
example of a pattern you've missed! Oh, how i've wished SML/NJ would do
that, instead of tell me all the ones i remembered *nudge*Matthias*nudge* ;)

William

Daniel C. Wang

unread,

Jun 14, 2005, 5:29:53 AM6/14/05

to

Now, I'm curious about this too.. because I do remember once stumbling
upon this wart in Ocaml in the past...perhaps I'm confusing Caml with
OCaml? Any Ocaml lurkers want to enlighten us. From what I remember
there was some interaction with pattern guards that made the original
match compiler not so useful in reporting non-exhaustive matches, was
this fixed?

Anyway, I'm glad it was improved. :)

Remi Vanicat

unread,

Jun 14, 2005, 5:57:34 AM6/14/05

to

"Daniel C. Wang" <danw...@gmail.com> writes:

> Now, I'm curious about this too.. because I do remember once stumbling
> upon this wart in Ocaml in the past...perhaps I'm confusing Caml with
> OCaml? Any Ocaml lurkers want to enlighten us. From what I remember
> there was some interaction with pattern guards that made the original
> match compiler not so useful in reporting non-exhaustive matches, was
> this fixed?

Well, when there is guard, as one can do any computation into guard,
ocaml can't determine everything, for example :

match (2,4) with
| (a,b) when a < b -> print_endline "a < b"
| (a,b) when a > b -> print_endline "a > b"

Here the only warning is:
Warning: Bad style, all clauses in this pattern-matching are guarded.
Even if there is a case where there will be a match failure.

another example is :

# match Some 4 with
| Some x when x > 3 -> 3
| Some x when x <= 3 -> x
| None -> 0;;

Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:

Some _
(However, some guarded clause may match this value.)

--
Rémi Vanicat

Vesa Karvonen

unread,

Jun 14, 2005, 9:13:03 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> Yes, the "end" keyword is the main cause of the verbosity. [...]

> Following other people's posts, it seems that SML programmers typically use
> a much more verbose style than OCaml programmers. Consequently, I believe
> my current implementations use an unusually concise SML style.

Merriam-Webster describes "verbose" as "containing more words than
necessary":

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=verbose

The way I see it, SML is not significantly more verbose than OCaml, but
SML code tends to be formatted over more lines.

Consider this paragraph, for instance. Do you
think that this paragraph is more verbose
than the other paragraph that contains the
same words?

Consider this paragraph,
for instance. Do you think
that this paragraph is
more verbose than the
other paragraph that
contains the same words?

In other words, I don't think that cramming the same code in fewer lines
makes it more concise (M-W describes "concise" as "free from all
elaboration and superfluous detail").

> [...] In previous versions, working around the lack of a for loop was

> taking up space but my latest incarnation is actually shorter (albeit

> significantly obfuscated). [...]

Yes, I think that the loop-function is somewhat obfuscated. In a previous
version there was a for-function that I understood immediately. Looking at
the new version I immediately understood that the loop-function must be
performing a 2D iteration, but skipped deciphering it on the first
reading (well, it wasn't that difficult to understand on the second
reading).

> Are you suggesting that the implementation of a "for" loop should not be
> counted?

Partially, yes. I think that it should be counted separately as part of a
separate library.

> I'd expect the justification for that to be "for loops are common" in
> which case their omission from a language is likely to be of interest to
> people.

It is interesting. (I would personally prefer a language that has powerful
abstraction facilities to a language that has a lot of built-in syntactic
sugar.)

What I mean is that productive programmers build domain specific libraries
as they build applications. If one would be working on algorithms that are
naturally expressed using for-loops in SML, a productive programmer would
definitely write a small combinator library for for-loops. In this
particular case, the library might take 10-100 lines of code, which is
insignificant in real-world programming, but can be prohibitive in a
language shootout.

Without the use of libraries a language shootout probably (unfairly, IMO)
favours languages that come with a huge library or lots of syntactic sugar
rather than languages that have powerful abstraction facilities. Without
the use of libraries a language shootout probably also favours languages
that are not standardized (as that tends to reduce the size of "standard"
libraries (with the only exception, that I'm aware of, being Common Lisp))
compared to "evolving" (to put it nicely) languages. I recall seeing many
"language comparisons" that really were library comparisons.

> I didn't realise you should do "val ... val ...". I tried "and", as in
> OCaml, but it didn't work. Is this typical coding style in SML though?

> [...]

Probably not. The point I was trying to make is that the difference in the
number of lines does not tell the whole story. Aside from the for-loops
the SML and OCaml versions are practically identical. With a few small
domain specific utility libraries that a productive programmer would
write (or use an existing library), both version could be made more
concise.

> Incidentally, is there something wrong with my use of "infix", as the
> default emacs mode indents all subsequent lines enormously?

It does not seem very common in SML to pack multiple declarations on one
line. However, the indentation algorithm of SML mode does not seem
extremely clever or natural. I have several gripes with it (e.g.
indentation of comments, indentation (or lack of indentation) of
signatures and structures, indentation of local-in-end, indentation of
curried function calls that take more than one line). The syntax
highlighting of SML mode is also not particularly pretty, IMO. (Maybe I
should do something about it rather than complain.)

-Vesa Karvonen

Jon Harrop

unread,

Jun 14, 2005, 9:56:21 AM6/14/05

to

Matthias Blume wrote:
> The Standard ML Basis library has Real.nextAfter, which is more
> general than mach_eps. SML/NJ, unfortunately, has not yet implemented
> this part of the spec. At best, this is a "quality of implementation"
> issue. It has nothing to do with language comparisons.
> (MLton does implement nextAfter.)

I think it is a triumph of hope over reality to think that quality of
language implementations has nothing to do with language comparisons.

> Again, this has nothing to do with SML (the language). There is
> nothing (other than lack of developer time) that would prevent an
> implementation from providing 32-bit floats. In fact, there are plans
> to add these to SML/NJ.

From my point of view, I would use 32-bit floats to improve performance.
Therefore, if the slow SML/NJ compiler can use them but the fast MLton
compiler can't, what would be the point?

Jon Harrop

unread,

Jun 14, 2005, 10:02:00 AM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:
>> I think the implicitly rec "fun" is more confusing, for a newbie.
> Why?

Because the unnecessary special-casing "val isn't rec but fun is rec" is
harder to remember.

> I think it is trivial to take any SML/NJ standalone program and turn
> it into an MLton standalone with by just adding the same boilerplate
> (and changing maybe one name -- the name of the "main" function).
>
> Is this really such a big deal?

If my 80 line program has to be split across three files and combined by a
script before feeding it to one of the compilers, yes, I think this is a
big deal. For one thing, the line numbers reported by the compilers are
then wrong. That will surely put people who want to do serious work off
using SML.

Can you do conditional compilation in SML? If so, at least the code for the
ray tracer could reside in a single file.

Jon Harrop

unread,

Jun 14, 2005, 10:13:26 AM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:
>> Yes, the "end" keyword is the main cause of the verbosity.
>
> I don't quite understand why you are so hung up on line counts.

Firstly, note that my comment has nothing to do with line counts. I'm using
LOC because I believe it is the best measure of brevity.

> Of
> course, I am biased, but I really like the "end" keyword. Last
> quarter I have written a compiler for a dialect of ML (and made my
> students write such a compiler, too) which uses an OCaml-inspired
> style. The bottom line: I hated it because it lacks a clearly visible
> scope delimiter.

With indenting?

> I personally never seem to need "for" -- except when I transliterate
> your code into SML. :-) But that has probably more to do with the fact
> that I normally don't write code that does heavy array processing.

Note that my ray tracer makes no use of arrays.

>> I didn't realise you should do "val ... val ...". I tried "and", as in
>> OCaml, but it didn't work.
>
> Why did "and" not work? It should (but it means something slightly
> different).

The indentor does this:

val a=1 and b=2
val c=3

It does something similar with infix.

>> Is this typical coding style in SML though? I
>> get the impression it isn't. I'd say it is in OCaml.
>
> The "val ... val ..." thing is definitely typical SML style -- except
> one normally would put each "val ..." on its own line (unless one is
> squeezing to beat a loc limit :-) ).

Right. In contrast, it is not uncommon to write "let a=1 and b=3 in" in
OCaml.

> But "for" loops aren't common. :-) (see above)

In the OCaml stdlib, for example, "for" loops are used 46 times in total, in
array, internal, digest, filename, format, hashtbl, lexing, random, scanf,
sort, string and weak.

Matthias Blume

unread,

Jun 14, 2005, 10:23:30 AM6/14/05

to

William Lovas <wlo...@force.stwing.upenn.edu> writes:

We are aware of this, and it is on "the list".

Matthias

Matthias Blume

unread,

Jun 14, 2005, 10:22:41 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthias Blume wrote:
>> Jon Harrop <use...@jdh30.plus.com> writes:
>>> I think the implicitly rec "fun" is more confusing, for a newbie.
>> Why?
>
> Because the unnecessary special-casing "val isn't rec but fun is rec" is
> harder to remember.
>
>> I think it is trivial to take any SML/NJ standalone program and turn
>> it into an MLton standalone with by just adding the same boilerplate
>> (and changing maybe one name -- the name of the "main" function).
>>
>> Is this really such a big deal?
>
> If my 80 line program has to be split across three files and combined by a
> script before feeding it to one of the compilers, yes, I think this is a
> big deal. For one thing, the line numbers reported by the compilers are
> then wrong. That will surely put people who want to do serious work off
> using SML.

What are you talking about? Have you ever had a look at CM?

> Can you do conditional compilation in SML?

What do you need conditional compilation for? Can you explain?

> If so, at least the code for the ray tracer could reside in a single
> file.

I really don't understand what the problem is that you are trying to
solve.

Matthias

Matthias Blume

unread,

Jun 14, 2005, 10:25:25 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

Come on! SML/NJ isn't *that* slow!

Jon Harrop

unread,

Jun 14, 2005, 10:27:51 AM6/14/05

to

Vesa Karvonen wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
> [...]
>> Yes, the "end" keyword is the main cause of the verbosity. [...]
>> Following other people's posts, it seems that SML programmers typically
>> use a much more verbose style than OCaml programmers. Consequently, I
>> believe my current implementations use an unusually concise SML style.

> ...

> The way I see it, SML is not significantly more verbose than OCaml, but
> SML code tends to be formatted over more lines.

So, although the SML requires more lines, more words and more bytes, you
don't consider it to be more verbose than the OCaml?

> Consider this paragraph, for instance. Do you
> think that this paragraph is more verbose
> than the other paragraph that contains the
> same words?
>
> Consider this paragraph,
> for instance. Do you think
> that this paragraph is
> more verbose than the
> other paragraph that
> contains the same words?
>
> In other words, I don't think that cramming the same code in fewer lines
> makes it more concise (M-W describes "concise" as "free from all
> elaboration and superfluous detail").

Now back to the code, which of these is more verbose:

""

" end
end
end
end
end
end
end
end
end
end"

>> Are you suggesting that the implementation of a "for" loop should not be
>> counted?
>
> Partially, yes. I think that it should be counted separately as part of a
> separate library.

But not in the standard library?

> ...

> Without the use of libraries a language shootout probably (unfairly, IMO)
> favours languages that come with a huge library or lots of syntactic sugar
> rather than languages that have powerful abstraction facilities.

OCaml has a tiny standard library compared to Java, yet it thrashes Java in
my tests. I'd have said that my ray tracer makes virtually no use of
libraries.

>> Incidentally, is there something wrong with my use of "infix", as the
>> default emacs mode indents all subsequent lines enormously?
>
> It does not seem very common in SML to pack multiple declarations on one
> line. However, the indentation algorithm of SML mode does not seem
> extremely clever or natural. I have several gripes with it (e.g.
> indentation of comments, indentation (or lack of indentation) of
> signatures and structures, indentation of local-in-end, indentation of
> curried function calls that take more than one line). The syntax
> highlighting of SML mode is also not particularly pretty, IMO. (Maybe I
> should do something about it rather than complain.)

Right, lack of a reliable indenter is also something people will want to
know if they're considering coding in SML.

Matthias Blume

unread,

Jun 14, 2005, 10:29:41 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthias Blume wrote:
>> Jon Harrop <use...@jdh30.plus.com> writes:
>>> Yes, the "end" keyword is the main cause of the verbosity.
>>
>> I don't quite understand why you are so hung up on line counts.
>
> Firstly, note that my comment has nothing to do with line counts. I'm using
> LOC because I believe it is the best measure of brevity.

Then you should not ever mention the "end" keyword.

>
>> Of
>> course, I am biased, but I really like the "end" keyword. Last
>> quarter I have written a compiler for a dialect of ML (and made my
>> students write such a compiler, too) which uses an OCaml-inspired
>> style. The bottom line: I hated it because it lacks a clearly visible
>> scope delimiter.
>
> With indenting?

Yes. Indentation can easily be gotten wrong. I also didn't have an
emacs-mode for that language (since it was brand-new) to help me out.

>> I personally never seem to need "for" -- except when I transliterate
>> your code into SML. :-) But that has probably more to do with the fact
>> that I normally don't write code that does heavy array processing.
>
> Note that my ray tracer makes no use of arrays.
>
>>> I didn't realise you should do "val ... val ...". I tried "and", as in
>>> OCaml, but it didn't work.
>>
>> Why did "and" not work? It should (but it means something slightly
>> different).
>
> The indentor does this:
>
> val a=1 and b=2
> val c=3
>
> It does something similar with infix.

Well, that's an emacs problem, not a problem with the language or the
compiler. So what exactly "didn't work"?

>>> Is this typical coding style in SML though? I
>>> get the impression it isn't. I'd say it is in OCaml.
>>
>> The "val ... val ..." thing is definitely typical SML style -- except
>> one normally would put each "val ..." on its own line (unless one is
>> squeezing to beat a loc limit :-) ).
>
> Right. In contrast, it is not uncommon to write "let a=1 and b=3 in" in
> OCaml.

And? Your point is?

>> But "for" loops aren't common. :-) (see above)
>
> In the OCaml stdlib, for example, "for" loops are used 46 times in total, in
> array, internal, digest, filename, format, hashtbl, lexing, random, scanf,
> sort, string and weak.

In our SML libraries, it is never used. :-)

Mattihas

Matthias Blume

unread,

Jun 14, 2005, 10:45:07 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Now back to the code, which of these is more verbose:
>
> ""
>
> " end
> end
> end
> end
> end
> end
> end
> end
> end
> end"

To me, the latter is *clearer*. Conciseness at the expense of clarity
is, IMO, not a good idea.

Jon Harrop

unread,

Jun 14, 2005, 11:12:10 AM6/14/05

to

Matthias Blume wrote:
> To me, the latter is *clearer*.

I disagree.

> Conciseness at the expense of clarity is, IMO, not a good idea.

I agree.

Jon Harrop

unread,

Jun 14, 2005, 11:11:20 AM6/14/05

to

Matthias Blume wrote:
> Well, that's an emacs problem, not a problem with the language or the
> compiler. So what exactly "didn't work"?

I'm trying to consider this from a practical point of view. So the
"peripherals" are important. To me, having a syntax highlighter and
indenter is very important. You are right that it is more important in
OCaml because manual indentation is more error-prone but that is irrelevant
because most people get emacs to indent for them. Perhaps people indent
their SML code manually because the emacs indenter is broken? If so, I
think that is a very valuable piece of information.

>> Right. In contrast, it is not uncommon to write "let a=1 and b=3 in" in
>> OCaml.
>
> And? Your point is?

Typical OCaml is significantly more concise than typical SML. I'm trying to
reflect that by using "typical" style in my benchmark programs.

I'm not saying that SML is as verbose as Java or Fortran. Indeed, I'm not
even pushing the opinion that extreme brevity is good. I'm trying to be
objective and fair.

Vesa Karvonen

unread,

Jun 14, 2005, 11:17:00 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
> Vesa Karvonen wrote:
[...]

> > The way I see it, SML is not significantly more verbose than OCaml, but

^^^^^^^^^^^^^

> > SML code tends to be formatted over more lines.

> So, although the SML requires more lines, more words and more bytes, you
> don't consider it to be more verbose than the OCaml?

I would agree that SML is insignificantly more verbose than OCaml. ;)

> > In other words, I don't think that cramming the same code in fewer lines
> > makes it more concise (M-W describes "concise" as "free from all
> > elaboration and superfluous detail").

> Now back to the code, which of these is more verbose:

> ""

> " end
> end
> end
> end
> end
> end
> end
> end
> end
> end"

Hey, I agree here. The end-keywords do increase the amount of words.
(Whether the end-keyword is "good" or "bad" is a subject for another
(religious) debate --- one that I try to avoid.) However, 11 words
in a ~600 word program is not significant, IMO. (The word count of
600 is not from the latest version of the ray-tracer.) Besides, many
of the let-expressions could be rewritten to use function
applications that are arguably even more concise (in both OCaml and
SML).

Which of the following is least verbose?

let val x = y in z end
let x = y in z
(fun x -> z) y
(fn x => z) y

And what about the following?

let val a = b val c = d in x end
let a = b in let c = d in x
let a = b and c = d in x
(fn (a, b) => x) (b, d)
(fun a c -> x) b d

> >> Are you suggesting that the implementation of a "for" loop should not be
> >> counted?
> >
> > Partially, yes. I think that it should be counted separately as part of a
> > separate library.

> But not in the standard library?

I'm not entirely sure what you mean here. However, the last time I
checked, the shootout line metrics did not include the standard
library.

> > Without the use of libraries a language shootout probably (unfairly, IMO)
> > favours languages that come with a huge library or lots of syntactic sugar
> > rather than languages that have powerful abstraction facilities.

> OCaml has a tiny standard library compared to Java, yet it thrashes Java in
> my tests. I'd have said that my ray tracer makes virtually no use of
> libraries.

Well, that says a lot about Java.

> Right, lack of a reliable indenter is also something people will want to
> know if they're considering coding in SML.

I also recall using extra parentheses to avoid indentation bugs
while programming in OCaml using Tuareg mode. However, an Emacs mode
does not define the language.

-Vesa Karvonen

Matthias Blume

unread,

Jun 14, 2005, 11:28:01 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthias Blume wrote:
>> Well, that's an emacs problem, not a problem with the language or the
>> compiler. So what exactly "didn't work"?
>
> I'm trying to consider this from a practical point of view. So the
> "peripherals" are important. To me, having a syntax highlighter and
> indenter is very important. You are right that it is more important in
> OCaml because manual indentation is more error-prone but that is irrelevant
> because most people get emacs to indent for them. Perhaps people indent
> their SML code manually because the emacs indenter is broken? If so, I
> think that is a very valuable piece of information.

FYI, I use emacs mode all the time. Yes, it has its quirks, and I
currently work around them, but it is quite usable.

>>> Right. In contrast, it is not uncommon to write "let a=1 and b=3 in" in
>>> OCaml.
>>
>> And? Your point is?
>
> Typical OCaml is significantly more concise than typical SML. I'm trying to
> reflect that by using "typical" style in my benchmark programs.

Ok, so the above line in SML would have two more keywords. Big deal,
I'd say. (Again, the extra "end" is a good thing in my view.) Things
get amortized if there are more declarations between "let" and "in"
(which is quite common -- at least in my code):

let val a = ...
val b = ...
val c = ...
val d = ... and e = ...
in ...
end

Basically, the number of keywords is N+3 for N declarations. In OCaml it
is N+1 if you use "and" all the time, and 2N if you never use "and".

The above sketch of an example would become

let a = ... in
let b = ... in
let c = ... in
let d = ... and e = ... in
...

which has 9 keywords vs. 8 in the SML case.

> I'm not saying that SML is as verbose as Java or Fortran. Indeed, I'm not
> even pushing the opinion that extreme brevity is good.

Ok, so then, what /is/ the point of this discussion?

> I'm trying to be objective and fair.

... about what?

Matthias

Matthias Blume

unread,

Jun 14, 2005, 11:30:05 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthias Blume wrote:
>> To me, the latter is *clearer*.
>
> I disagree.

How can you disagree with the above statement? Are you claiming that
*to me* (Matthias Blume), the latter is not clearer?

Obviously, what is clear and what isn't is very much a function of
what one is used to.

Matthew Fluet

unread,

Jun 14, 2005, 11:27:13 AM6/14/05

to

Jon Harrop wrote:
> Matthew Fluet wrote:
>>Now, I will readily admit that if you are in the middle of the
>>edit-compile-debug loop, then MLton's compile times can be daunting.
>>Then it makes sense to develop with another compiler.
>
> I would be more than happy to do that if the SML/NJ and MLton versions of my
> ray tracer were identical. Of course, they are not and I am using a bash
> script to generate them from the same core code. That is for a <100 LOC
> program. I assume it only gets worse for bigger programs.

Actually, I think it only gets better for bigger programs. Or, at
least, the boilerplate that handles the differences in top-level
interface to get a standalone executable becomes proportionally smaller
to the rest of the program. Also, larger programs tend to be better
structured, in the sense that there are not a lot of top-level effects
scattered around.

I will admit that for very small (i.e., single-file, no module level
constructs, lots of top-level effects) programs, then trying to maintain
versions compatible with both SML/NJ and MLton is a little difficult.

I've toyed with the idea of adding an option to MLton to call a
distinguished main function:
http://mlton.org/pipermail/mlton/2004-September/025951.html

Meaning no disrespect, you could argue the other direction as well: that
SML/NJ should have a compilation option so that a single file yields a
single executable, and any top-level effects of the file are performed
when the executable is run, rather than when it is compiled. As far as
I know, one always needs to suspend all the effects under a lambda, then
hand a distinguished lambda to the ml-build script. Even then, you only
get a heap file, though I understand that Matthias is working towards a
heap2exec facility.

Matthew Fluet

unread,

Jun 14, 2005, 11:08:28 AM6/14/05

to

Jon Harrop wrote:
>>>My main gripe is the unfriendliness of the compilers compared to
>>>OCaml's compilers.
>

> Yes, I shall try to make a compendium of example programs and errors.
>
> In the mean time I can list some points about SML itself (not
> Mlton-specific):
>

> 6. I think a single precision float type for storage in general data
> structures is a good idea (this makes the x86 C++ version of my ray tracer
> twice as fast with only trivial changes). There may already be one in SML
> that I've missed. OCaml only has big arrays.

As has been pointed out elsewhere in the thread, the Standard ML Basis
Library has Real<N> optional structures. MLton implements both Real32
and Real64. The default Real structure is equivalent to Real64.

A complete description of MLton's implementation of the Standard ML
Basis Library, including type equivalences, is available at
http://mlton.org/BasisLibrary.

Vesa Karvonen

unread,

Jun 14, 2005, 11:38:43 AM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> Typical OCaml is significantly more concise than typical SML. I'm trying to
> reflect that by using "typical" style in my benchmark programs.

From what I have been reading between the lines, you have not been
programming regularly in SML. So, I'm somewhat curious about where
do you get the notion of typical SML?

Most of the SML shootout programs are written in a significantly
more wordy style than what I would use. For example, here is the
current MLton version of Ackermann:

<---->
fun ack(0,n) = n+1
| ack(m,0) = ack(m-1,1)
| ack(m,n) = ack(m-1,ack(m,(n-1)));

fun atoi s = case Int.fromString s of SOME num => num | NONE => 0;
fun printl [] = print "\n" | printl(h::t) = ( print h ; printl t );

fun main(name, args) =
let
val arg = hd(args @ ["1"]);
val num = atoi arg;
val ack = ack(3, num);
val result = Int.toString ack;
in (
printl ["Ack(3,", arg, "): ", result];
OS.Process.success
) end;

val _ = main( CommandLine.name(), CommandLine.arguments() );
<--->

I'm counting 16 non-empty lines.

I would write the above as (WARNING: untested):

<--->
val rec ack = fn (0, n) => n+1
| (m, 0) => ack (m-1, 1)
| (m, n) => ack (m-1, ack (m, n-1))

val n = valOf (Int.fromString (hd (CommandLine.arguments ()))) handle _ => 0

val () = print ("Ack(3,"^Int.toString n^"): "^Int.toString (ack (3, n))^"\n")
<--->

I'm counting 5 non-empty lines, which beats the current OCaml entry that
has 6 lines. *So, SML is clearly less verbose than OCaml.* :) :) :)

> I'm not saying that SML is as verbose as Java or Fortran. Indeed, I'm not
> even pushing the opinion that extreme brevity is good. I'm trying to be
> objective and fair.

Ok. I'll take your word for it. However, I'd advice not to make
"significant" conclusions based on the current SML programs in the
shootout.

-Vesa Karvonen

Jon Harrop

unread,

Jun 14, 2005, 11:43:39 AM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:
>> Matthias Blume wrote:
>>> To me, the latter is *clearer*.
>>
>> I disagree.
>
> How can you disagree with the above statement? Are you claiming that
> *to me* (Matthias Blume), the latter is not clearer?

I meant "to me, the latter isn't clearer".

> Obviously, what is clear and what isn't is very much a function of
> what one is used to.

Yes, I'm happy that we've reduced this to a subjective difference of
opinion.

Daniel C. Wang

unread,

Jun 14, 2005, 11:52:33 AM6/14/05

to

Oh, lets also not forget that there are several high-quality SML
implementations.

SML/NJ
MLton
Moscow ML
Poly/ML
SML.NET
ML Kit

It be interesting to port your benchmarks to the other platforms.
You will unfortunately, discover lots build environment quirks, but the
source code should be pretty portable.

SML.NET and The ML Kit are interesting targets to try.
OCaml has an .NET variant called F# however lasted I check F# does not
attempt to implement all of OCaml just some subset of it.

Also, lets not forget that the difference between OCaml and SML is a lot
smaller than either ML to something like C++, Java, and C#. I think,
there's lots of great evidence that you want expressive and easy to
optimize language. i.e. a language that's easy for programmers and
compiler writers, pick your favoriate ML. If you want some evil... you
know where to get it.

Speaking of evil. I'd encourage some one to hack up an Ocaml or F#
frontend to build an OCaml to SML translator, so you can run OCaml code
on MLton. :)

This is like a week or so hacking for someone familar with the
languages, and internals of the OCaml or F# compiler.

Jon Harrop

unread,

Jun 14, 2005, 11:54:36 AM6/14/05

to

Vesa Karvonen wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:
>> Vesa Karvonen wrote:
> [...]
>> > The way I see it, SML is not significantly more verbose than OCaml, but
> ^^^^^^^^^^^^^
>> > SML code tends to be formatted over more lines.
>> So, although the SML requires more lines, more words and more bytes, you
>> don't consider it to be more verbose than the OCaml?
>
> I would agree that SML is insignificantly more verbose than OCaml. ;)

Now I'm confused. Are you agreeing with yourself? I'm sure there's a medical
term for that... ;-)

> Hey, I agree here. The end-keywords do increase the amount of words.

And because typical coding style is apparently to put them all on separate
lines, they increase the LOC.

> (Whether the end-keyword is "good" or "bad" is a subject for another
> (religious) debate --- one that I try to avoid.) However, 11 words
> in a ~600 word program is not significant, IMO. (The word count of
> 600 is not from the latest version of the ray-tracer.)

But 10 lines out of 60 is significant. I'm saying that I believe
(subjectively) that LOC is better than words as a metric of development
time. I have no evidence to back that up with though - it's just a
feeling. :-)

> Besides, many
> of the let-expressions could be rewritten to use function
> applications that are arguably even more concise (in both OCaml and
> SML).

Yes, that's a subjective choice and I'm not sure which I prefer.

>> But not in the standard library?
>
> I'm not entirely sure what you mean here. However, the last time I
> checked, the shootout line metrics did not include the standard
> library.

Note that my discussion here has absolutely nothing to do with the great
computer language shootout.

>> OCaml has a tiny standard library compared to Java, yet it thrashes Java
>> in my tests. I'd have said that my ray tracer makes virtually no use of
>> libraries.
>
> Well, that says a lot about Java.

Yes, it seems to be an awful language - slowest, most verbose and lacking
some basic features (e.g. mach_eps).

>> Right, lack of a reliable indenter is also something people will want to
>> know if they're considering coding in SML.
>
> I also recall using extra parentheses to avoid indentation bugs
> while programming in OCaml using Tuareg mode.

Ok, I haven't found a single bug in Tuareg in two years but I seem to have
found several in the SML mode having written 100 LOC.

> However, an Emacs mode
> does not define the language.

It does make the language useful though. Unless there is a better SML editor
that I should be using?

Jon Harrop

unread,

Jun 14, 2005, 11:57:57 AM6/14/05

to

Daniel C. Wang wrote:
> Oh, lets also not forget that there are several high-quality SML
> implementations.
>
> SML/NJ
> MLton
> Moscow ML
> Poly/ML
> SML.NET
> ML Kit
>
> It be interesting to port your benchmarks to the other platforms.
> You will unfortunately, discover lots build environment quirks, but the
> source code should be pretty portable.

Absolutely. I'll do this ASAP. Thanks for the links.

Apart from MLton and SML/NJ, which do you think produces the fastest code?

> SML.NET and The ML Kit are interesting targets to try.
> OCaml has an .NET variant called F# however lasted I check F# does not
> attempt to implement all of OCaml just some subset of it.

Yes, in particular it omits modules and objects.

> Also, lets not forget that the difference between OCaml and SML is a lot
> smaller than either ML to something like C++, Java, and C#. I think,
> there's lots of great evidence that you want expressive and easy to
> optimize language. i.e. a language that's easy for programmers and
> compiler writers, pick your favoriate ML. If you want some evil... you
> know where to get it.

Yes, I think my page displays that quite nicely.

> Speaking of evil. I'd encourage some one to hack up an Ocaml or F#
> frontend to build an OCaml to SML translator, so you can run OCaml code
> on MLton. :)

I've considered doing this but there are some difficulties, like objects.
Even records might be troublesome, I'm not sure...

> This is like a week or so hacking for someone familar with the
> languages, and internals of the OCaml or F# compiler.

That isn't me. :-)

Jon Harrop

unread,

Jun 14, 2005, 12:05:34 PM6/14/05

to

Vesa Karvonen wrote:
> Jon Harrop <use...@jdh30.plus.com> wrote:

>> Typical OCaml is significantly more concise than typical SML. I'm trying
>> to reflect that by using "typical" style in my benchmark programs.
>
> From what I have been reading between the lines, you have not been
> programming regularly in SML. So, I'm somewhat curious about where
> do you get the notion of typical SML?

I asked here and have been told that "end" and "val" typically go on new
lines.

> Most of the SML shootout programs are written in a significantly
> more wordy style than what I would use.

Yes, I am deliberately not using the shootout because I disagree with some
of the ways they're doing things:

1. Not counting lines with "}" in Java and C++ but counting lines with
"done;" in OCaml and "end" in SML.
2. Not using well-formed tasks for benchmarks (e.g. computing digits of pi).
3. Not striving to use typical coding style in different languages.
4. Too much emphasis on low-level performance.
and so on.

>> I'm not saying that SML is as verbose as Java or Fortran. Indeed, I'm not
>> even pushing the opinion that extreme brevity is good. I'm trying to be
>> objective and fair.
>
> Ok. I'll take your word for it. However, I'd advice not to make
> "significant" conclusions based on the current SML programs in the
> shootout.

I think people can draw significant conclusions from my ray tracer
comparisons.

Daniel C. Wang

unread,

Jun 14, 2005, 12:12:47 PM6/14/05

to

Jon Harrop wrote:
{stuff deleted}

>
> I've considered doing this but there are some difficulties, like objects.
> Even records might be troublesome, I'm not sure...

Well, fortunately, lots of OCaml programmers don't use objects. :) I
think this is why F# was created. There's lot of OCaml code that fits
in the Caml subset... your raytracer for example. :)

Jon Harrop

unread,

Jun 14, 2005, 12:16:41 PM6/14/05

to

Daniel C. Wang wrote:

> Jon Harrop wrote:
>> I've considered doing this but there are some difficulties, like objects.
>> Even records might be troublesome, I'm not sure...
>
> Well, fortunately, lots of OCaml programmers don't use objects. :) I
> think this is why F# was created. There's lot of OCaml code that fits
> in the Caml subset... your raytracer for example. :)

I don't think that's true. My code certainly makes use of objects and I very
much doubt that most OCaml code can be compiled by F#. For a start, F#
can't implement most of the OCaml stdlib because it lacks functors.

Jon Harrop

unread,

Jun 14, 2005, 10:36:47 AM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:
>> If my 80 line program has to be split across three files and combined by
>> a script before feeding it to one of the compilers, yes, I think this is
>> a big deal. For one thing, the line numbers reported by the compilers are
>> then wrong. That will surely put people who want to do serious work off
>> using SML.
>
> What are you talking about?

In my SML implementation, the SML/NJ must look like:

val delta = Math.sqrt 2.22044604925031e~16
...
val _ = SMLofNJ.exportFn ("ray", main)

whereas the MLton looks like:

val delta = Math.sqrt (Real.nextAfter(1.0, 2.0) - 1.0)
...
val _ = main ("", CommandLine.arguments ())

> Have you ever had a look at CM?

No, what is it?

>> Can you do conditional compilation in SML?
>
> What do you need conditional compilation for? Can you explain?

I could do:

#ifdef SMLNJ
val delta = Math.sqrt 2.22044604925031e~16
#else
val delta = Math.sqrt (Real.nextAfter(1.0, 2.0) - 1.0)
#endif
...
#ifdef SMLNJ
val _ = SMLofNJ.exportFn ("ray", main)
#else
val _ = main ("", CommandLine.arguments ())
#endif

in one ".sml" file.

Jon Harrop

unread,

Jun 14, 2005, 12:16:24 PM6/14/05

to

Jon Harrop wrote:
>> SML.NET and The ML Kit are interesting targets to try.
>> OCaml has an .NET variant called F# however lasted I check F# does not
>> attempt to implement all of OCaml just some subset of it.
>
> Yes, in particular it omits modules and objects.

Sorry, I meant functors and objects.

Jon Harrop

unread,

Jun 14, 2005, 12:20:55 PM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:

>> I'm not saying that SML is as verbose as Java or Fortran. Indeed, I'm not
>> even pushing the opinion that extreme brevity is good.
>
> Ok, so then, what /is/ the point of this discussion?

I want to hear what language advocates have to say about the conclusions I'm
drawing based on my ray tracer implementations. So far, the comments in
comp.lang.functional are very interesting and I've learned a lot about
OCaml vs SML. I think some other people have too. Contrast this with the
comments from comp.lang.java.programmer where I was just repeatedly told
that startup time is to blame for Java's appauling performance...

>> I'm trying to be objective and fair.
>
> ... about what?

Comparing the languages. When people ask me for advise on which language to
use for a given task, I'd like to be able to back my view up with as much
objective evidence as possible rather than just making stuff up.

Having seen the incredible performance of MLton-compiled SML, I was also
considering porting my book on OCaml for Scientists to SML. However, having
tried SML, I can't yet advise scientists to consider it because it is too
difficult to use.

For me, SML would need to have the following:

1. Friendlier error messages from the compiler.
2. Different compilers which can compile the same code into an executable.
3. A more reliable emacs mode or a better editor.
4. A fast compiler and an optimising compiler which adhere to the standard
as much as possible (the equivalent of ocamlc and ocamlopt).

I'm not sure what the state of bindings to OpenGL, BLAS, LAPACK, FFTW and so
on are but they may also be important.

Matthias Kretschmer

unread,

Jun 14, 2005, 12:33:21 PM6/14/05

to

On 2005-06-14, Jon Harrop <use...@jdh30.plus.com> wrote:

> Matthias Blume wrote:
>> Jon Harrop <use...@jdh30.plus.com> writes:
>>> If my 80 line program has to be split across three files and combined by
>>> a script before feeding it to one of the compilers, yes, I think this is
>>> a big deal. For one thing, the line numbers reported by the compilers are
>>> then wrong. That will surely put people who want to do serious work off
>>> using SML.
>>

>> Have you ever had a look at CM?
>
> No, what is it?

it is the compilation manager. A very nice tool. You put all your stuff
into modules (anyway a good idea IMHO) and then ask the compilation
manager to compile your stuff. Nice about it is that it includes make
like facilities - is extensible (you can add you're own tools and so
on).

If you do everything the way you need it to be you could put your delta
outside into two different files one for mlton and one for smlnj. Then
have a main acceptable by smlnj and an extra file for starting the
process by mlton. One .cm and one .mlb and then using mlton ray.mlb and
ml-build ray.cm Ray.smlnj_main ray to build your system.

It is of course overhead, but that is something you have to pay for when
you want to use different implementations. In bigger projects the
overhead is of course smaller.

.mlb and .cm don't require much contents.

e.g.

ray.cm:
GROUP
structure Ray
IS
ray.sml
$/basis.cm

or even if you use noweb:
GROUP
structure Ray
IS
ray.nw (ray.sml)
$/basis.cm

> I could do:
>
> #ifdef SMLNJ
> val delta = Math.sqrt 2.22044604925031e~16
> #else
> val delta = Math.sqrt (Real.nextAfter(1.0, 2.0) - 1.0)
> #endif
> ...
> #ifdef SMLNJ
> val _ = SMLofNJ.exportFn ("ray", main)
> #else
> val _ = main ("", CommandLine.arguments ())
> #endif
>
> in one ".sml" file.
>

look at the paragraph above, you don't need this.

--
Matthias Kretschmer

Vesa Karvonen

unread,

Jun 14, 2005, 12:54:41 PM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
> Vesa Karvonen wrote:
> > Jon Harrop <use...@jdh30.plus.com> wrote:
> >> Typical OCaml is significantly more concise than typical SML. I'm trying
> >> to reflect that by using "typical" style in my benchmark programs.
> >
> > From what I have been reading between the lines, you have not been
> > programming regularly in SML. So, I'm somewhat curious about where
> > do you get the notion of typical SML?

> I asked here and have been told that "end" and "val" typically go on new
> lines.

Well, I guess we have to agree to disagree about the validity of the
LOC metric. I consider it fundamentally flawed, because the LOC
metric depends highly on individual code formatting style as well as
the presense of closing keywords (e.g. "end" and "done"), which
typically go on separate lines, but have no effect on the
semantic characteristics of programs.

A token (or word) count would be significantly more relevant in my
opinion. A token count metric would completely eliminate dependence
on formatting style and would only give a small benefit to languages
that do not have closing keywords, which better matches my personal
programming experience: the lack of closing keywords does not make a
language significantly more productive or the code significantly
easier to read/write.

I have written enough code in both OCaml and SML to confidently
argue that OCaml (excluding objects) is not significantly less
verbose than SML. Concisely written SML and OCaml (excluding
objects) programs are essentially isomorphic. The difference in LOC
metrics is insignificant (in regards to programmer productivity) and
essentially due to the "end"-keyword in SML.

-Vesa Karvonen

Joachim Durchholz

unread,

Jun 14, 2005, 1:10:11 PM6/14/05

to

Matthias Blume wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:
>
> Yes. Indentation can easily be gotten wrong.

Only if you have a brain-dead editor or editor configuration that
silently converts sequences of blanks to tabs or vice versa.

It's a problem only if you insert a new nesting level and don't
immediately see the end of what should now be more indented. OTOH
anybody who writes nested code that spans more than a page gets what he
deserves ;-)

Regards,
Jo

Matthias Blume

unread,

Jun 14, 2005, 1:34:18 PM6/14/05

to

Matthias Kretschmer <mccr...@gmx.net> writes:

Moreover, once you set up the .cm file (and structure Ray contains
your "main" function), you build the stand-alone simply by saying
(from the shell prompt):

ml-build ray.cm Ray.main ray

and the rest is automatic. During development/debugging you run the
interactive system and (repeatedly, if necessary):

CM.make "ray.cm";

My emacs mode is set up so that this is bound to a keyboard shortcut.

Information on CM can be found here:

http://www.smlnj.org/doc/CM/index.html

Matthias

Matthew Fluet

unread,

Jun 14, 2005, 1:47:05 PM6/14/05

to

Matthias Blume wrote:

> Matthias Kretschmer <mccr...@gmx.net> writes:
>>ray.cm:
>>GROUP
>> structure Ray
>>IS
>> ray.sml
>> $/basis.cm
>

> Moreover, once you set up the .cm file (and structure Ray contains
> your "main" function), you build the stand-alone simply by saying
> (from the shell prompt):
>
> ml-build ray.cm Ray.main ray
>
> and the rest is automatic.

But the "standalone" is really just a heap image, which needs a special
startup script to launch SML/NJ (and, furthermore, any end user of the
program needs to have the same version of SML/NJ installed to run).

> During development/debugging you run the
> interactive system and (repeatedly, if necessary):
>
> CM.make "ray.cm";

For debugging (the whole program) in the interactive system, one needs
to then simulate the command line arguments

Ray.main ("ray", ["100"]);

Matthew Fluet

unread,

Jun 14, 2005, 1:35:58 PM6/14/05

to

Jon Harrop wrote:
>
> Having seen the incredible performance of MLton-compiled SML, I was also
> considering porting my book on OCaml for Scientists to SML. However, having
> tried SML, I can't yet advise scientists to consider it because it is too
> difficult to use.
>
> For me, SML would need to have the following:
>
> 1. Friendlier error messages from the compiler.

Again, eagerly awaiting specific examples of this criticism.

> 2. Different compilers which can compile the same code into an executable.

> 4. A fast compiler and an optimising compiler which adhere to the standard
> as much as possible (the equivalent of ocamlc and ocamlopt).

If you consider point 2 distinct from point 4, then I don't understand
how you advocate OCaml.

I do believe that the Standard ML Basis Library leaves somewhat more
room for implementation dependent choices than would be desirable to
achieve completely portable code. But, again, I think that the
comparison with OCaml is somewhat unfair here, because there is but a
single OCaml implementation. Any bits of the OCaml library
specification that are implementation dependent are benign, since there
are no competing implementations to make a different decision.

Jon Harrop

unread,

Jun 14, 2005, 3:03:28 PM6/14/05

to

Matthew Fluet wrote:
> Jon Harrop wrote:
>> Having seen the incredible performance of MLton-compiled SML, I was also
>> considering porting my book on OCaml for Scientists to SML. However,
>> having tried SML, I can't yet advise scientists to consider it because it
>> is too difficult to use.
>>
>> For me, SML would need to have the following:
>>
>> 1. Friendlier error messages from the compiler.
>
> Again, eagerly awaiting specific examples of this criticism.

Here are some examples, an extra ";" gives:

- fun f a =
= let val b=a in
= print "";
= end
= val a=2;;
stdIn:11.10-12.9 Error: syntax error: deleting SEMICOLON END SEMICOLON
stdIn:13.1-13.6 Error: syntax error: deleting VAL ID

With MLton we get 5 lines which essentially only tell us that there is an
error somewhere before the end:

Error: a.sml 4.5.
Syntax error found at END.
Error: a.sml 5.0.
Parse error.
compilation aborted: parseAndElaborate reported errors

The equivalent OCaml is valid so there's no error to compare with.

Here, SMLNJ gives two cryptic errors before a third which is heading in the
right direction but isn't quite "English" enough:

- fun () =
= print ""
= print "";;
stdIn:1.5-1.7 Error: can't find function arguments in clause
stdIn:1.5-1.7 Error: illegal function symbol in clause
stdIn:21.3-22.11 Error: operator is not a function [tycon mismatch]
operator: unit
in expression:
(print "") print

MLton skips to the appropriate error message but doesn't exactly put it
clearly:

Error: a.sml 2.5.
Function not of arrow type.
function: [unit]
in: (print "") print
compilation aborted: parseAndElaborate reported errors

Should "Function not of arrow type" be something like "attempt to apply a
non-function value" or "non-function value used as a function"?

In contrast, the natively-French-speaking OCaml reports the message clearly,
succinctly and with some (correct) advice, all in one line:

# let f () =
print_string ""
print_string "";;
This function is applied to too many arguments, maybe you forgot a `;'

In this example, SMLNJ at least hints at a type mismatch but the "?"s don't
help:

- datatype a = A
= val b = A
= datatype a = A
= val c = A
= val x = b=c;;
stdIn:9.5-9.12 Error: operator and operand don't agree [tycon mismatch]
operator domain: ?.a * ?.a
operand: ?.a * a
in expression:
b = c

I assume "?.a" is referring to a different type in the same namespace and
with the same name, but that isn't exactly clear.

MLton uses the phrase "incorrect argument" where "wrong type" would be
better, IMHO. The type descriptions are equally wierd:

Error: a.sml 5.10.
Function applied to incorrect argument.
expects: _ * [?.a]
but got: _ * [a]
in: = (b, c)
compilation aborted: parseAndElaborate reported errors

What the hell is that "in: = (b, c)"?!

The last time I tried this on OCaml I got a confusing type error along the
lines of "a used where a was expected". Now you get a decent error it
seems:

# type a = A
let b=A
type a = A
let c=A
let x = b=c;;
Multiple definition of the type name a.
Names must be unique in a given structure or signature.

My main objection to OCaml's error messages are the lack of quotation marks
around names, like "a" in this case. I've been bitten before for using
names in my program which produce grammatically correct error messages from
OCaml... :-)

Note that the ocaml top-level even underlines the bits where the error
occurred.

These are all the kinds of errors that a newbie would make. IIRC, this is
one of the main things that put me off SML when I first saw it in my first
year as an undergrad. I'm surprised things haven't improved over the past
decade.

>> 2. Different compilers which can compile the same code into an
>> executable. 4. A fast compiler and an optimising compiler which adhere to
>> the standard as much as possible (the equivalent of ocamlc and ocamlopt).
>
> If you consider point 2 distinct from point 4, then I don't understand
> how you advocate OCaml.

Note that my criticisms of SML are not related to OCaml. I have criticisms
of OCaml too, of course.

But I still don't understand what you've said. OCaml has ocamlc and
ocamlopt, both of which can produce executables. ocamlc is fast to compile,
ocamlopt is slow to compile but produces fast code. They are much more
compatible than SML/NJ and MLton, although there are differences.

Also, I'm not sure that SML/NJ would be as fast to compile as ocamlc. I
should probably be comparing with another implementation.

> I do believe that the Standard ML Basis Library leaves somewhat more
> room for implementation dependent choices than would be desirable to
> achieve completely portable code. But, again, I think that the
> comparison with OCaml is somewhat unfair here, because there is but a
> single OCaml implementation. Any bits of the OCaml library
> specification that are implementation dependent are benign, since there
> are no competing implementations to make a different decision.

I appreciate your point but my concern here is purely practical. I agree
that, in theory, a standard like SML should be better than an ad-hoc
developed language like OCaml. Indeed, there are parts of OCaml (e.g.
Hashtbl) that are clearly specialised for the OCaml compiler and not for
general use.

However, I am finding OCaml to be better in practice because the two
compilers are so coupled and, therefore, using ocamlc as a quick
alternative is easy and reliable. In contrast, the SML standard appears to
have omitted any discussion of actually running code, so implementors are
free to choose weird and wonderful ways to invoke programs (like the
"@SMLload=..." obscurity).

On the other hand, I appreciate that you only invoke the program once so the
difference between SML/NJ and MLton is less significant for bigger programs
and I also appreciate that I may have accidentally struck an unusual
weakpoint of SML/NJ by looking for mach_eps. If people honestly believe
that my ray tracer is representing SML in an unfairly poor light then I'll
mention this on the web page. But, for now, I shan't promote SML too
strongly due to its practical shortcomings.

Thant Tessman

unread,

Jun 14, 2005, 3:33:54 PM6/14/05

to

Matthew Fluet wrote:

[...]

> But the "standalone" is really just a heap image, which needs a special
> startup script to launch SML/NJ (and, furthermore, any end user of the
> program needs to have the same version of SML/NJ installed to run).

I have a utility I wrote in SML. I distribute the smlnj heap executer
thingy right along with the heap image and a script file to run
everything. Here's the windows version of the batch file:

@echo off
"%~dp0\run.x86-win32.exe" "@SMLload=%~dp0\myprog.x86-win32" %1 %2 %3 %4
%5 %6 %7 %8 %9

Yeah, I'd prefer one file instead of three, but it's not *that*
inconvenient.

-thant

--
We have now sunk to a depth at which restatement of the obvious is the
first duty of intelligent men. -- George Orwell (attributed)

Matthias Blume

unread,

Jun 14, 2005, 3:38:00 PM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

Maybe these error messages can't be called "beautiful", but reading
them (and looking at the code) I would *immediately* know what's
wrong.

> Here, SMLNJ gives two cryptic errors before a third which is heading in the
> right direction but isn't quite "English" enough:
>
> - fun () =
> = print ""
> = print "";;
> stdIn:1.5-1.7 Error: can't find function arguments in clause

What's "cryptic" about this?

> stdIn:1.5-1.7 Error: illegal function symbol in clause

Again, what's "cryptic" about this one?

> stdIn:21.3-22.11 Error: operator is not a function [tycon mismatch]
> operator: unit
> in expression:
> (print "") print

Completely appropriate error message, I'd say.

> MLton skips to the appropriate error message but doesn't exactly put it
> clearly:
>
> Error: a.sml 2.5.
> Function not of arrow type.
> function: [unit]
> in: (print "") print
> compilation aborted: parseAndElaborate reported errors

Same here. Where is the problem?

> Should "Function not of arrow type" be something like "attempt to apply a
> non-function value" or "non-function value used as a function"?

Yes. The two formulations are equivalent, I'd say.

> In contrast, the natively-French-speaking OCaml reports the message clearly,
> succinctly and with some (correct) advice, all in one line:
>
> # let f () =
> print_string ""
> print_string "";;
> This function is applied to too many arguments, maybe you forgot a `;'

I don't like such guesswork by a compiler. It worked out fine here,
but what if print_string was supposed to take three arguments but
doesn't?

> In this example, SMLNJ at least hints at a type mismatch but the "?"s don't
> help:
>
> - datatype a = A
> = val b = A
> = datatype a = A
> = val c = A
> = val x = b=c;;
> stdIn:9.5-9.12 Error: operator and operand don't agree [tycon mismatch]
> operator domain: ?.a * ?.a
> operand: ?.a * a
> in expression:
> b = c
>
> I assume "?.a" is referring to a different type in the same namespace and
> with the same name, but that isn't exactly clear.

The ? refers to a type that is no longer accessible by name.

> MLton uses the phrase "incorrect argument" where "wrong type" would be
> better, IMHO. The type descriptions are equally wierd:

What is the difference between "incorrect argument" and "argument of
wrong type" if that message comes from the type checker?

> Error: a.sml 5.10.
> Function applied to incorrect argument.
> expects: _ * [?.a]
> but got: _ * [a]
> in: = (b, c)
> compilation aborted: parseAndElaborate reported errors
>
> What the hell is that "in: = (b, c)"?!

The application of a function named = to two arguments named b and c.

> The last time I tried this on OCaml I got a confusing type error along the
> lines of "a used where a was expected". Now you get a decent error it
> seems:
>
> # type a = A
> let b=A
> type a = A
> let c=A
> let x = b=c;;
> Multiple definition of the type name a.
> Names must be unique in a given structure or signature.

Well, that's because the rules are different in OCaml. Up to the
offending comparison between b and c, the code was legal SML but not
legal OCaml. That's why you get different error messages.

> Also, I'm not sure that SML/NJ would be as fast to compile as ocamlc.

I am sure it is not. In fact, SML/NJ is pretty slow as far as
compilation speed goes. (But machines have gotten so fast that it is
almost tolerable now.)

> However, I am finding OCaml to be better in practice because the two
> compilers are so coupled and, therefore, using ocamlc as a quick
> alternative is easy and reliable. In contrast, the SML standard appears to
> have omitted any discussion of actually running code, so implementors are
> free to choose weird and wonderful ways to invoke programs (like the
> "@SMLload=..." obscurity).

I don't think the C standard specifies how to run programs either.
(Could be wrong, though.)

> On the other hand, I appreciate that you only invoke the program once so the
> difference between SML/NJ and MLton is less significant for bigger programs
> and I also appreciate that I may have accidentally struck an unusual
> weakpoint of SML/NJ by looking for mach_eps. If people honestly believe
> that my ray tracer is representing SML in an unfairly poor light then I'll
> mention this on the web page. But, for now, I shan't promote SML too
> strongly due to its practical shortcomings.

Actually, I don't quite understand how your ray tracer would point so
strongly to OCaml. Most of the criticisms you list seem to be rooted
in your infamiliarity with the language. In the end, MLton beats
everyone speed-wise (and it even comes with 32-bit floats that you
could try).

Matthias

Matthias Blume

unread,

Jun 14, 2005, 3:40:52 PM6/14/05

to

Matthew Fluet <mfl...@acm.org> writes:

> Matthias Blume wrote:
>> Matthias Kretschmer <mccr...@gmx.net> writes:
>>>ray.cm:
>>>GROUP
>>> structure Ray
>>>IS
>>> ray.sml
>>> $/basis.cm
>> Moreover, once you set up the .cm file (and structure Ray contains
>> your "main" function), you build the stand-alone simply by saying
>> (from the shell prompt):
>> ml-build ray.cm Ray.main ray
>> and the rest is automatic.
>
> But the "standalone" is really just a heap image, which needs a
> special startup script to launch SML/NJ (and, furthermore, any end
> user of the program needs to have the same version of SML/NJ installed
> to run).

Yes, can't deny that. But since other ways of producing "stand alone"
programs in SML/NJ also just give you a heap image, I'd still say
ml-build is nice.

> > During development/debugging you run the
>> interactive system and (repeatedly, if necessary):
>> CM.make "ray.cm";
>
> For debugging (the whole program) in the interactive system, one needs
> to then simulate the command line arguments
>
> Ray.main ("ray", ["100"]);

Right. In my code, the "main" function is often just a wrapper, and I
have at least one other entry point that I could use for testing.
Unless I want to test the wrapper itself, I don't have to invoke it by
hand with string arguments.

Matthias

Marcin 'Qrczak' Kowalczyk

unread,

Jun 14, 2005, 3:51:30 PM6/14/05

to

Vesa Karvonen <vesa.k...@cs.helsinki.fi> writes:

> Besides, many of the let-expressions could be rewritten to use
> function applications that are arguably even more concise (in both
> OCaml and SML).

More concise but less clear. Conciseness usually increases clarity but
only if other things are equal.

> Which of the following is least verbose?
>
> let val x = y in z end
> let x = y in z
> (fun x -> z) y
> (fn x => z) y

Those with "let" have a better order of subexpressions, consistent
with the order of evaluation. I consider an immediately applied
anonymous function a hack which should be avoided for clarity.

It's a pity that in SML the common kind of expression, "let", is quite
verbose. IMHO "end" would be more useful at the end of "case" instead.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Matthew Fluet

unread,

Jun 14, 2005, 3:54:28 PM6/14/05

to

Jon Harrop wrote:
> Matthew Fluet wrote:

>>>2. Different compilers which can compile the same code into an
>>>executable. 4. A fast compiler and an optimising compiler which adhere to
>>>the standard as much as possible (the equivalent of ocamlc and ocamlopt).
>>
>>If you consider point 2 distinct from point 4, then I don't understand
>>how you advocate OCaml.
>
> Note that my criticisms of SML are not related to OCaml. I have criticisms
> of OCaml too, of course.
>
> But I still don't understand what you've said. OCaml has ocamlc and
> ocamlopt, both of which can produce executables. ocamlc is fast to compile,
> ocamlopt is slow to compile but produces fast code. They are much more
> compatible than SML/NJ and MLton, although there are differences.

When I saw you list point 2 separately from point 4 (moreover, with an
intervening point about editors), I assumed that the two points were
rooted in separate underlying principles. As the root principle that
gave rise to point 4 is clear (namely, the edit-compile-debug loop
should be short, without compromising the speed of the final
deliverable), I made a guess at the root principle that gave rise to
point 2.

I concluded that the principle that gave rise to point 2 is that
multiple, independent implementations of a language strengthen the
confidence of a user in both the design of the language and in both
implementations, as there is a means to investigate bugs or otherwise
inexplicable behavior in one implementation by recourse to the other.

Hence, it seems to me that OCaml falls somewhat short here, as it
appears that ocamlc and ocamlopt share much of the same infrastructure.

I suspect that I've misinterpreted your point 2, and that the principle
you really were coming from is what I quote below: namely, that there is
no standard means of deriving a standalone executable from a program.

> In contrast, the SML standard appears to
> have omitted any discussion of actually running code, so implementors are
> free to choose weird and wonderful ways to invoke programs (like the
> "@SMLload=..." obscurity).

This is certainly true.

> I also appreciate that I may have accidentally struck an unusual
> weakpoint of SML/NJ by looking for mach_eps.

This is probably true, though I appreciate that you are coming from an
'OCaml for Scientists' point of view, where such values are of
importance. To be honest, this is the first time I've ever heard of
someone looking for such a value in SML.

Jon Harrop

unread,

Jun 14, 2005, 4:29:59 PM6/14/05

to

Matthias Blume wrote:
>> - fun f a =
>> = let val b=a in
>> = print "";
>> = end
>> = val a=2;;
>> stdIn:11.10-12.9 Error: syntax error: deleting SEMICOLON END SEMICOLON
>> stdIn:13.1-13.6 Error: syntax error: deleting VAL ID
>>

>> Error: a.sml 4.5.
>> Syntax error found at END.
>> Error: a.sml 5.0.
>> Parse error.
>> compilation aborted: parseAndElaborate reported errors
>
> Maybe these error messages can't be called "beautiful", but reading
> them (and looking at the code) I would *immediately* know what's
> wrong.

Sure, but someone new to ML wouldn't have the foggiest idea what the
compilers are talking about.

>> - fun () =
>> = print ""
>> = print "";;
>> stdIn:1.5-1.7 Error: can't find function arguments in clause
>
> What's "cryptic" about this?

What is "clause"?

>> stdIn:1.5-1.7 Error: illegal function symbol in clause
>
> Again, what's "cryptic" about this one?

What is a "function symbol", which "functional symbol", why is it "illegal"
and, again, what is "clause"?

>> stdIn:21.3-22.11 Error: operator is not a function [tycon mismatch]
>> operator: unit
>> in expression:
>> (print "") print
>
> Completely appropriate error message, I'd say.

What is "tycon" and what does "operator: unit" mean?

>> Error: a.sml 2.5.
>> Function not of arrow type.
>> function: [unit]
>> in: (print "") print
>> compilation aborted: parseAndElaborate reported errors
>
> Same here. Where is the problem?

Use of the word "arrow" instead of "function" and reference to what I can
only assume are the internals of the compiler: "parseAndElaborate".

>> Should "Function not of arrow type" be something like "attempt to apply a
>> non-function value" or "non-function value used as a function"?
>
> Yes. The two formulations are equivalent, I'd say.

I believe a newbie will understand the last two but not the first, unless
they happen to understand ML's type system, typeset representation and know
that "arrow" is written "->" in the context of ML. So they need to know a
lot of irrelevant definitions before they can even begin to decipher this
message.

>> In contrast, the natively-French-speaking OCaml reports the message
>> clearly, succinctly and with some (correct) advice, all in one line:
>>
>> # let f () =
>> print_string ""
>> print_string "";;
>> This function is applied to too many arguments, maybe you forgot a `;'
>
> I don't like such guesswork by a compiler. It worked out fine here,
> but what if print_string was supposed to take three arguments but
> doesn't?

Then the first part of the error is correct and the second part is an
incorrect guess. I'd say that was better than an error 5x as long with no
comprehensible information in it.

>> In this example, SMLNJ at least hints at a type mismatch but the "?"s
>> don't help:
>>
>> - datatype a = A
>> = val b = A
>> = datatype a = A
>> = val c = A
>> = val x = b=c;;
>> stdIn:9.5-9.12 Error: operator and operand don't agree [tycon mismatch]
>> operator domain: ?.a * ?.a
>> operand: ?.a * a
>> in expression:
>> b = c
>>
>> I assume "?.a" is referring to a different type in the same namespace and
>> with the same name, but that isn't exactly clear.
>
> The ? refers to a type that is no longer accessible by name.

Great. Then why not have the error message:

stdIn:9.5-9.12 Error: operator and operand don't agree [tycon mismatch]
operator domain: ?.a * ?.a
operand: ?.a * a

where "?" refers to a type that is no longer accessible by name.

in expression:
b = c

>> MLton uses the phrase "incorrect argument" where "wrong type" would be

>> better, IMHO. The type descriptions are equally wierd:
>
> What is the difference between "incorrect argument" and "argument of
> wrong type" if that message comes from the type checker?

You're assuming that the user knows what a type checker is, that this error
is coming from a type checker and that the argument must therefore be
"incorrect" because it has the wrong type. It would be much better to just
say "it has the wrong type", i.e. to minimise the knowledge dependencies
that the user must first satisfy before trying to comprehend your error
message.

>> Error: a.sml 5.10.
>> Function applied to incorrect argument.
>> expects: _ * [?.a]
>> but got: _ * [a]
>> in: = (b, c)
>> compilation aborted: parseAndElaborate reported errors
>>
>> What the hell is that "in: = (b, c)"?!
>
> The application of a function named = to two arguments named b and c.

So that is just a bug in the pretty printer. It should be:

in: b = c

when "=" is infix.

>> On the other hand, I appreciate that you only invoke the program once so
>> the difference between SML/NJ and MLton is less significant for bigger
>> programs and I also appreciate that I may have accidentally struck an
>> unusual weakpoint of SML/NJ by looking for mach_eps. If people honestly
>> believe that my ray tracer is representing SML in an unfairly poor light
>> then I'll mention this on the web page. But, for now, I shan't promote
>> SML too strongly due to its practical shortcomings.
>
> Actually, I don't quite understand how your ray tracer would point so
> strongly to OCaml.

I don't think my ray tracer does point strongly to OCaml. Certainly not the
way the web page is written at the moment. The only advantage of OCaml as
seen from my page appears to be that it is slightly more concise than SML.
In reality, OCaml seems to have lots of practical advantages, which is why
I'm promoting it.

> Most of the criticisms you list seem to be rooted
> in your infamiliarity with the language. In the end, MLton beats
> everyone speed-wise (and it even comes with 32-bit floats that you
> could try).

Yes, but the MLton version was harder and slower to write than the OCaml.
I'm sure that is partly because I am much more familiar with OCaml but I've
listed several objective reasons why I believe this will also be true on
other projects and even when I am equally familiar with SML.

Objectively, a lot more people seem to be learning OCaml than SML, even
though SML is taught in many universities. Indeed, I can think of 7 friends
who were also taught SML as undergrads. Of them, two have continued to use
SML but five have converted to OCaml. Among the natural scientists I know,
none use SML but about half are now using OCaml.

I'd like to convert some OpenGL programs next, to learn about SML's
bindings.

Vesa Karvonen

unread,

Jun 14, 2005, 5:17:22 PM6/14/05

to

Matthias Blume <fi...@my.address.elsewhere> wrote:
> Jon Harrop <use...@jdh30.plus.com> writes:

> > [...] If people honestly believe that my ray tracer is

> > representing SML in an unfairly poor light then I'll mention
> > this on the web page. But, for now, I shan't promote SML too
> > strongly due to its practical shortcomings.

> Actually, I don't quite understand how your ray tracer would point

> so strongly to OCaml. [...]

Looking at the page

http://www.ffconsultancy.com/free/ray_tracer/comparison_cpp_vs_sml.html

the only sentence comparing the languages is

"SML is related to the OCaml language which, whilst not as fast,
is significantly more concise and easier to learn."

So, based on the text on the page, it seems to me that (one of) the
main issue(s) is the conciseness / verbosity argument.

Comparing the SML code to the OCaml code, the SML code needs about 9
more lines due to the "end"-keywords (two of the 11 "end"-keywords
on separate lines could easily be trimmed away). A couple of extra
lines can easily be explained in terms of individual formatting
preferences. A further 4 lines could easily be trimmed by idiomatic
use of [the] SML [Basis Library]. The biggest structural difference
between the programs seems to be the use of the for-expression in
OCaml vs the use of a loop-function in SML.

In my opinion, none of the differences apparent in the ray tracer
point particularly strongly to OCaml. In particular, I don't think
that cramming code in fewer lines makes it more concise or less
verbose.

For the for-loop issue I would recommend writing a simple
(top-level) ~15-line combinator library (if for-loops really are
that important). Once the library is done, for-loops would be at
least as clear in SML as they are in OCaml. In fact, it would be
easy to make the combinator library considerably more flexible and
concise (at least if you count lines) that the OCaml for-loop.

One annoying detail in the the OCaml for-loop

for i = a to b do ... done

is that it executes the loop body for the closed range from a to b,
which, in practise, means that almost every loop is written as

for i = 0 to n-1 do ... done
^^

or as

for i = n-1 to 0 do ... done
^^

IOW, the upper bound is excluded almost always. The upper bound
could be excluded by default in a combinator library leading to
slightly more concise code.

Nested OCaml for-loops take up quite a few lines

for i = 0 to l-1 do
for j = 0 to m-1 do
for k = 0 to n-1 do
...
done
done
done

[30 tokens, 25 words, 7 lines]

but a combinator library could easily support nested iteration
concisely:

for (0 to l && 0 to m && 0 to n)
(fn i & j & k => ...)

[23 tokens, 15 words, 2 lines]

The main inflexibility in the OCaml for-loop is that it only works
on integers. A combinator library for for-loops could easily allow
mixing both integer range iteration as well as iteration through
aggregates and other kinds of value generators (e.g. random numbers,
characters, characters extracted from a stream, etc...). For
example, one could iterate through the cartesian product of some
range of numbers a to b and the elements of a list l like this

for (a to b && list l)
(fn x & y => ...)

Note that such a for-loop (or iteration) combinator library is not
an inherent advantage for SML. A similar library could be written
for both SML and OCaml. Of course, probably only MLton would
currently completely optimize away the overhead of the library, so
maybe it is a practical advantage for SML. :)

-Vesa Karvonen

Jon Harrop

unread,

Jun 14, 2005, 5:17:14 PM6/14/05

to

Matthew Fluet wrote:
> I concluded that the principle that gave rise to point 2 is that
> multiple, independent implementations of a language strengthen the
> confidence of a user in both the design of the language and in both
> implementations, as there is a means to investigate bugs or otherwise
> inexplicable behavior in one implementation by recourse to the other.
>
> Hence, it seems to me that OCaml falls somewhat short here, as it
> appears that ocamlc and ocamlopt share much of the same infrastructure.

I see. That is not what I meant. I literally meant that I'd like to be able
to compile the exact same code either quickly or aggresively.

I appreciate your theoretical point about multiple implementations being
"stronger". However, I'm not sure that this (or many other SML-related
arguments) actually hold any water in practice.

Specifically, it seems to me that MLton has the overriding goal of high
performance, which it unquestionably achieves. But at what cost? Compile
times are an order of magnitude slower, which would make MLton
substantially less practical for my work, and its error messages are (to
me) obfuscated. I had hoped that MLton's bytecode gen would satisfy my
desire for fast compilation but, alas, it is just as slow as native.

"But there are many other ML implementations" I hear you say. Well, yes, but
I am under the impression that the majority of other implementations are
"toys", i.e. not for widespread use. SML/NJ is the exception, which has the
reputation of being bad at optimising. I think my ray tracer has shown than
SML/NJ certainly competes with ocamlopt for performance on x86, but sadly
not on AMD64 (yet!).

From my point of view (and I know a little bit about ML compiler internals),
it seems that writing an SML compiler capable of converting people from
conventional languages to ML is such an enormous undertaking that it would
require a worldwide collaboration. So I'd rather see a single, much better
SML compiler than lots of separate compilers each with their own killer
deficiencies.

Note that the OCaml compilers are reliable despite not having multiple
different implementations of a single standard.

>> I also appreciate that I may have accidentally struck an unusual
>> weakpoint of SML/NJ by looking for mach_eps.
>
> This is probably true, though I appreciate that you are coming from an
> 'OCaml for Scientists' point of view, where such values are of
> importance. To be honest, this is the first time I've ever heard of
> someone looking for such a value in SML.

Yes. Although I would expect a computer scientist to be aware that mach_eps
appears all over the place in numerical analysis.

Vesa Karvonen

unread,

Jun 14, 2005, 5:35:48 PM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> wrote:
[...]

> Objectively, a lot more people seem to be learning OCaml than SML,
> even though SML is taught in many universities. Indeed, I can
> think of 7 friends who were also taught SML as undergrads. Of
> them, two have continued to use SML but five have converted to
> OCaml. Among the natural scientists I know, none use SML but about
> half are now using OCaml.

Have you asked for reasons from your friends as to why they are
using OCaml rather than SML?

My *guess* is that OCaml's considerably larger standard library is
seen as a significant advantage over SML when people choose between
the two languages.

-Vesa Karvonen

Matthias Blume

unread,

Jun 14, 2005, 5:45:24 PM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Yes. Although I would expect a computer scientist to be aware that mach_eps
> appears all over the place in numerical analysis.

Many (most?) computer scientists don't do numerical analysis.

Alan Morgan

unread,

Jun 14, 2005, 5:58:05 PM6/14/05

to

In article <42af3e87$0$11131$ed26...@ptn-nntp-reader01.plus.net>,

Jon Harrop <use...@jdh30.plus.com> wrote:
>Matthias Blume wrote:
>

>>> Should "Function not of arrow type" be something like "attempt to apply a
>>> non-function value" or "non-function value used as a function"?
>>
>> Yes. The two formulations are equivalent, I'd say.
>
>I believe a newbie will understand the last two but not the first, unless
>they happen to understand ML's type system, typeset representation and know
>that "arrow" is written "->" in the context of ML. So they need to know a
>lot of irrelevant definitions before they can even begin to decipher this
>message.

Speaking as a (more or less) newbie, I can confirm. I did learn pretty quickly
that when the SML compiler said certain things that I should not bother understanding
what it was talking about but instead look for the missing semi-colon, a place where
I wrote a::b where I should have written (a::b), or a place where I wrote [a::b]
where I should have written a::b.

I tended to run MLton and SML/NJ on the same file and use the combination of the
error messages to give me a hint.

>>> In contrast, the natively-French-speaking OCaml reports the message
>>> clearly, succinctly and with some (correct) advice, all in one line:
>>>
>>> # let f () =
>>> print_string ""
>>> print_string "";;
>>> This function is applied to too many arguments, maybe you forgot a `;'
>>
>> I don't like such guesswork by a compiler. It worked out fine here,
>> but what if print_string was supposed to take three arguments but
>> doesn't?
>
>Then the first part of the error is correct and the second part is an
>incorrect guess. I'd say that was better than an error 5x as long with no
>comprehensible information in it.

Any fundamental reason why we can't have both? sml --newbie-error-messages?

I recall a Pascal compiler that would point out places where you had missed
a semi-colon, insert them for you if you asked, and then continue compiling
the rest of the code (it didn't fix it in the source file, alas). I don't
miss Pascal, but I do miss that behavior every time I get 200 cascading
errors in C because of a similar mistake.

>Objectively, a lot more people seem to be learning OCaml than SML, even
>though SML is taught in many universities. Indeed, I can think of 7 friends
>who were also taught SML as undergrads. Of them, two have continued to use
>SML but five have converted to OCaml. Among the natural scientists I know,
>none use SML but about half are now using OCaml.

I'm the opposite. I tried playing with O'Caml, but I had remembered enough of
the SML from college that I found O'Caml's syntax to be distinctly icky.

Alan
--
Defendit numerus

ross...@ps.uni-sb.de

unread,

Jun 14, 2005, 6:07:10 PM6/14/05

to

Jon Harrop wrote:
>
> Here are some examples, an extra ";" gives:
>
> - fun f a =
> = let val b=a in
> = print "";
> = end
> = val a=2

> The equivalent OCaml is valid so there's no error to compare with.

It's not. Watch this:

let f a =
let b=a in
print_string "";
let a=2

File "test.ml", line 5, characters 0-0:
Syntax error

Rather worse.

> In this example, SMLNJ at least hints at a type mismatch but the "?"s don't
> help:
>
> - datatype a = A
> = val b = A
> = datatype a = A
> = val c = A
> = val x = b=c;;
> stdIn:9.5-9.12 Error: operator and operand don't agree [tycon mismatch]
> operator domain: ?.a * ?.a
> operand: ?.a * a
> in expression:
> b = c

Here is the equivalent situation with OCaml:

type t = A
module M =
struct
let b = A
type t = A
let c = A
let d = b=c
end

File "test.ml", line 7, characters 14-15:
This expression has type t but is here used with type t

Well...

> These are all the kinds of errors that a newbie would make. IIRC, this is
> one of the main things that put me off SML when I first saw it in my first
> year as an undergrad. I'm surprised things haven't improved over the past
> decade.

Error messages are a tricky issue, and I have yet to see a compiler
that really is perfect at that. In any case, from my experience there
is no significant difference in quality between OCaml and the main SML
implementations. If anything, OCaml tends to be worse because of the
presence of more tricky features like labelled argumetns, objects, and
polymorphic variants, which make for much more obscure errors.

- Andreas

Jon Harrop

unread,

Jun 14, 2005, 6:07:56 PM6/14/05

to

Vesa Karvonen wrote:
> Have you asked for reasons from your friends as to why they are
> using OCaml rather than SML?

I'll ask, but my guess is ease of use and awareness due to the shootout.

> My *guess* is that OCaml's considerably larger standard library is
> seen as a significant advantage over SML when people choose between
> the two languages.

I was under the impression that OCaml had a much smaller standard library.
Looking at it quickly, SML seems to add:

Array: slices, more fold* functions, mono arrays, ...
Math: defs for pi and e.
Real: multiple precision, isFinite, isNan, isNormal, nextAfter, ...
SubString: *
Time: Many functions
Vector: *

but where are polymorphic hashing, stacks, queues, hashtables, maps, sets
etc.?

Matthias Blume

unread,

Jun 14, 2005, 6:16:16 PM6/14/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

Those are, by design, not part of the Basis library. But, e.g., the
SML/NJ library has most of them.

Jon Harrop

unread,

Jun 14, 2005, 6:41:10 PM6/14/05

to

ross...@ps.uni-sb.de wrote:

> Jon Harrop wrote:
>> The equivalent OCaml is valid so there's no error to compare with.
>
> It's not. Watch this:
>
> let f a =
> let b=a in
> print_string "";
> let a=2
>
> File "test.ml", line 5, characters 0-0:
> Syntax error
>
> Rather worse.

Indenting would highlight that error before you even tried to compile the
code.

> Here is the equivalent situation with OCaml:
>
> type t = A
> module M =
> struct
> let b = A
> type t = A
> let c = A
> let d = b=c
> end
>
> File "test.ml", line 7, characters 14-15:
> This expression has type t but is here used with type t
>
> Well...

Yes, that's the error I was trying to get. Note that this is harder to
reproduce (requiring a nested module) than the SML equivalent.

> Error messages are a tricky issue, and I have yet to see a compiler
> that really is perfect at that.

Yes, absolutely. In fact, I'd be hard pressed to recommend any compiler as
an good example of error reporting.

> In any case, from my experience there
> is no significant difference in quality between OCaml and the main SML
> implementations. If anything, OCaml tends to be worse because of the
> presence of more tricky features like labelled argumetns, objects, and
> polymorphic variants, which make for much more obscure errors.

The only aspect of OCaml's error reporting which causes me grief is objects,
where the errors remind me of the STL and C++. I'm surprised that you think
MLton and SMLNJ errors are as comprehensible as OCaml's though. I'll have
to put more effort in.

Johan

unread,

Jun 15, 2005, 3:39:13 AM6/15/05

to

Slightly off-topic in your discussion, but on-topic for what Jon sens to
be considering, I hope.

>
> I'm not sure what the state of bindings to OpenGL, BLAS, LAPACK, FFTW and so
> on are but they may also be important.
>

If you are thinking of what to advocate for natural science, I believe
that libraries are VERY important to mention. All libraries are
certainly not important to everyone, but lacaml and gsl-bindings are the
ONLY reason I decided to switch from c++ to ocaml instead of SML,
haskell or clean. Knowing about what libraries exist is of great
importance IMO.

/ johan

alex goldman

unread,

Jun 15, 2005, 7:21:34 AM6/15/05

to

Jon Harrop wrote:

> Matthew Fluet wrote:
>> I concluded that the principle that gave rise to point 2 is that
>> multiple, independent implementations of a language strengthen the
>> confidence of a user in both the design of the language and in both
>> implementations, as there is a means to investigate bugs or otherwise
>> inexplicable behavior in one implementation by recourse to the other.
>>
>> Hence, it seems to me that OCaml falls somewhat short here, as it
>> appears that ocamlc and ocamlopt share much of the same infrastructure.
>
> I see. That is not what I meant. I literally meant that I'd like to be
> able to compile the exact same code either quickly or aggresively.

Matthew, I have a question for you. What SML compiler do you use to develop
MLton? If MLton itself, do you wait 10+ minutes every time you change
something?

Matthew Fluet

unread,

Jun 15, 2005, 9:26:05 AM6/15/05

to

As I've said elsewhere in this thread, compiling MLton from clean
sources with MLton is significantly faster than compiling MLton from
clean sources with SML/NJ. Also, depending on where in the dependency
graph you make a change, it can be significantly faster to type-check
the MLton sources with MLton than with SML/NJ. But, it is certainly
faster to re-compile MLton with a minor change using SML/NJ.

So, I use the best tool for the immediate job at hand:

I type check changes to the sources with MLton;
When that quiesces, I re-compile the changed sources with SML/NJ.
I test the resulting program on small examples of interest,
if bugs are evident, GOTO Edit.
When I am satisfied that there are no obvious bugs, I compile the
changed sources with MLton.
I test the resulting program on our regression suite and a self-compile.

I admit that it is all about speed -- but, you can't underestimate how
slow the SML/NJ compiled MLton is. It has been infeasible for quite
some time to use a SML/NJ compiled MLton to bootstrap. It is also a
fact that a SML/NJ compiled MLton takes much longer to run through the
regression suite -- it is faster to build a MLton compiled MLton and run
that one through the regression suite, since both the compile and test
time is shorter than the SML/NJ compiled MLton test time.

I don't think that I or any other MLton developer has claimed that there
are no drawbacks to MLton's compile times. In fact, I think that is one
of the main reasons that we have been very conservative of implementing
extensions of Standard ML, it makes compatibility with other compilers
that much harder to maintain.

Personally, for a program of the size of Jon's raytracer, I probably
wouldn't think of trying to develop the program dually under SML/NJ and
MLton -- I just don't find the compile time of that small a program
particularly hard to live with. And, it certainly cuts down the
frustration that Jon has articulated in trying to maintain a program
that is suitable for both compilers. On the other hand, I am much more
likely to pop open the SML/NJ top-level loop to check the type of a
Basis Library function than I am to run the MLton type-checker on the
program (though, it is also problematic that at the time, I probably
don't have a syntactically valid program).

So, again, like pretty much every choice that one faces in developing a
program, it is all about choosing the right tool for the job.

Matthew Fluet

unread,

Jun 15, 2005, 9:55:51 AM6/15/05

to

Jon Harrop wrote:
> ross...@ps.uni-sb.de wrote:
>
>>Jon Harrop wrote:
>>
>>>The equivalent OCaml is valid so there's no error to compare with.
>>
>>It's not. Watch this:
>>
>>let f a =
>> let b=a in
>> print_string "";
>>let a=2
>>
>>File "test.ml", line 5, characters 0-0:
>>Syntax error
>>
>>Rather worse.
>
> Indenting would highlight that error before you even tried to compile the
> code.

Perhaps, but the argument "Well, I wouldn't write that in the first
place, so the error message is irrelevant" could equally be made by SML
programmers for your examples. Likewise, the "newbie" argument says
that there is nothing to make a first time user believe that they should
be suspicious of

let f a =
let b = a in
print_string "";
let a = 2

(which is what the default caml-mode does for me).

This isn't meant to start a flame war, but rather just to point out how
easy it is for experts in one language to empathize with newbies in
another language, while forgetting what it was like to be a newbie in
their own language and how much idiosyncratic knowledge an expert has
internalized to the point that it is hard to understand how anyone else,
no matter how inexperienced, can fail to have this intuition.

>>Error messages are a tricky issue, and I have yet to see a compiler
>>that really is perfect at that.
>
> Yes, absolutely. In fact, I'd be hard pressed to recommend any compiler as
> an good example of error reporting.

I agree completely.

Jon Harrop

unread,

Jun 15, 2005, 10:10:58 AM6/15/05

to

Matthew Fluet wrote:
> Perhaps, but the argument "Well, I wouldn't write that in the first
> place, so the error message is irrelevant" could equally be made by SML
> programmers for your examples.

Could it? I tried to give examples where that wasn't the case. This is the
only one of my SML examples where the indenter would help the user:

- fun () =
= print ""
= print "";;

> This isn't meant to start a flame war, but rather just to point out how

> easy it is for experts in one language to empathize with newbies in
> another language, while forgetting what it was like to be a newbie in
> their own language and how much idiosyncratic knowledge an expert has
> internalized to the point that it is hard to understand how anyone else,
> no matter how inexperienced, can fail to have this intuition.

Yes. Objectively, I did mention the migration of people from SML to OCaml
which I do think is evidence supporting what I am saying here.

Jon Harrop

unread,

Jun 15, 2005, 10:17:07 AM6/15/05

to

Matthew Fluet wrote:
> I don't think that I or any other MLton developer has claimed that there
> are no drawbacks to MLton's compile times. In fact, I think that is one
> of the main reasons that we have been very conservative of implementing
> extensions of Standard ML, it makes compatibility with other compilers
> that much harder to maintain.

This begs the obvious (stupid) question: how hard would it be to make MLton
able to compile very quickly by omitting the "whole program" bit?

> Personally, for a program of the size of Jon's raytracer, I probably
> wouldn't think of trying to develop the program dually under SML/NJ and
> MLton -- I just don't find the compile time of that small a program
> particularly hard to live with. And, it certainly cuts down the
> frustration that Jon has articulated in trying to maintain a program
> that is suitable for both compilers. On the other hand, I am much more
> likely to pop open the SML/NJ top-level loop to check the type of a
> Basis Library function than I am to run the MLton type-checker on the
> program (though, it is also problematic that at the time, I probably
> don't have a syntactically valid program).

Now that I think of it, OCaml spents most of its time type checking my main
programs and often takes much more than 2 secs to compile, so maybe MLton
would be faster on my application...

Matthias Blume

unread,

Jun 15, 2005, 10:36:24 AM6/15/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Matthew Fluet wrote:
>> Perhaps, but the argument "Well, I wouldn't write that in the first
>> place, so the error message is irrelevant" could equally be made by SML
>> programmers for your examples.
>
> Could it? I tried to give examples where that wasn't the case. This is the
> only one of my SML examples where the indenter would help the user:
>
> - fun () =
> = print ""
> = print "";;

Here for example, my SML indenter (that's emacs sml-mode) does this:

fun () =
print ""
print "";;

Rather obvious, isn't it?

Bottom line: Don't use your editor's indentation algorithm as an
excuse for poor compiler error messages.

Matthew Fluet

unread,

Jun 15, 2005, 11:01:37 AM6/15/05

to

Jon Harrop wrote:
> Matthew Fluet wrote:
>
>>I don't think that I or any other MLton developer has claimed that there
>>are no drawbacks to MLton's compile times. In fact, I think that is one
>>of the main reasons that we have been very conservative of implementing
>>extensions of Standard ML, it makes compatibility with other compilers
>>that much harder to maintain.
>
> This begs the obvious (stupid) question: how hard would it be to make MLton
> able to compile very quickly by omitting the "whole program" bit?

Impossible. MLton's entire compilation strategy (defunctorization,
monomorphisation, and defunctionalization) requires having access to the
whole-program.

A better question is how many of the intervening optimization passes
could be eliminated. The resulting code would be much less efficient,
but (hopefully) semantically equivalent, and useful for debugging.

This is certainly a direction to be explored. And there are all sorts
of tradeoffs to be navigated -- insufficient dead-code elimination
leaves a bigger program that slows down the necessary compiler passes;
insufficient inlining that leaves arithmetic operations outlined would
lead to unacceptable performance.

However, even under the best estimates, it isn't clear that this would
beat a smart cutoff recompilation system.

Stephen Weeks

unread,

Jun 15, 2005, 12:22:53 PM6/15/05

to

> notice how O'Caml even gives you an example of a pattern you've
> missed! Oh, how i've wished SML/NJ would do that

MLton gives examples for nonexhaustive matches.

Stephen Weeks

unread,

Jun 15, 2005, 12:26:08 PM6/15/05

to

If you're interested in a performance comparison of the various SML
compilers, have a look at

http://mlton.org/Performance

Short summary: MLton is almost always produces the fastest code.

Jon Harrop

unread,

Jun 15, 2005, 2:13:37 PM6/15/05

to

Stephen Weeks wrote:
> If you're interested in a performance comparison of the various SML
> compilers, have a look at
>
> http://mlton.org/Performance

Wow! Fantastic page! I notice the OCaml ray tracer was ported by a certain
"Stephen Weeks". :-)

Looks like either ML-kit or PolyML is my next best bet. I think I'll try
PolyML.

> Short summary: MLton is almost always produces the fastest code.

I hope its grammar it better than that. ;-)

Jon Harrop

unread,

Jun 15, 2005, 2:19:51 PM6/15/05

to

Matthew Fluet wrote:
> A better question is how many of the intervening optimization passes
> could be eliminated. The resulting code would be much less efficient,
> but (hopefully) semantically equivalent, and useful for debugging.

I don't understand why memoization can't give whole program compilation the
speed of partial recompilation when necessary?

It won't be asymptotically as good, of course, because you still need to
read the input. But reading the input won't be the bottleneck...

> This is certainly a direction to be explored. And there are all sorts
> of tradeoffs to be navigated -- insufficient dead-code elimination
> leaves a bigger program that slows down the necessary compiler passes;
> insufficient inlining that leaves arithmetic operations outlined would
> lead to unacceptable performance.

I guess the question is where is MLton spending its time. I've heard that
it's recompiling the entire Basis library but why can't previous results be
reused?

> However, even under the best estimates, it isn't clear that this would
> beat a smart cutoff recompilation system.

I don't mind whether or not it beats another system. I just want it 10x
faster. :-)

William Lovas

unread,

Jun 15, 2005, 3:03:38 PM6/15/05

to

Yes, i know -- in general, i've found that MLton tends to produce better
error messages than SML/NJ (where good =~ clear + concise). At the end
of the day, my code has to work in SML/NJ 110.0.7 -- what we standardize
on at CMU -- and with MLton's long compile times, i usually find it easier
to develop at NJ's toplevel.

But i will say that whenever i get stuck reading an SML/NJ error message
for more than about 25 seconds, i turn to MLton to figure out what's really
going on :)

William

Matthew Fluet

unread,

Jun 15, 2005, 3:17:30 PM6/15/05

to

Jon Harrop wrote:
> Matthew Fluet wrote:
>
>>A better question is how many of the intervening optimization passes
>>could be eliminated. The resulting code would be much less efficient,
>>but (hopefully) semantically equivalent, and useful for debugging.
>
> I don't understand why memoization can't give whole program compilation the
> speed of partial recompilation when necessary?

I don't understand what you think is suceptible to memoization. The
first two major analyses/transformations that mlton performs are
monomophisation and defunctionalization. Each of these requires the
whole program to be performed, and neither yields information that is
likely to be useful for compiling other programs.

> It won't be asymptotically as good, of course, because you still need to
> read the input. But reading the input won't be the bottleneck...

Again, I don't follow your point.

>>This is certainly a direction to be explored. And there are all sorts
>>of tradeoffs to be navigated -- insufficient dead-code elimination
>>leaves a bigger program that slows down the necessary compiler passes;
>>insufficient inlining that leaves arithmetic operations outlined would
>>lead to unacceptable performance.
>
> I guess the question is where is MLton spending its time. I've heard that
> it's recompiling the entire Basis library but why can't previous results be
> reused?

See above.

As to where MLton is spending time, compile your program with -verbose 2
or -verbose 3 to see timing statistics.

Benedikt Schmidt

unread,

Jun 15, 2005, 4:23:38 PM6/15/05

to

ross...@ps.uni-sb.de writes:

> Jon Harrop wrote:
>>
>> Here are some examples, an extra ";" gives:
>>
>> - fun f a =
>> = let val b=a in
>> = print "";
>> = end
>> = val a=2
>
>> The equivalent OCaml is valid so there's no error to compare with.
>
> It's not. Watch this:
>
> let f a =
> let b=a in
> print_string "";
> let a=2
>
> File "test.ml", line 5, characters 0-0:
> Syntax error
>
> Rather worse.

Just for the record, the camlp4 parser (recursive descent) produces
better error messages than the standard one most of the time. But
it doesn't really help here:

$ ocamlc -pp camlp4o a.ml
File "a.ml", line 5, characters 0-1:
Parse error: 'and' or 'in' expected (in [expr])
Uncaught exception: Stream.Error("'and' or 'in' expected (in [expr])")
Preprocessor error

Benedikt

David Hopwood

unread,

Jun 15, 2005, 4:16:10 PM6/15/05

to

To me the most striking thing about those tables, is how many of the
benchmarks failed to compile or run on some of the implementations.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Jon Harrop

unread,

Jun 15, 2005, 4:23:58 PM6/15/05

to

David Hopwood wrote:
> To me the most striking thing about those tables, is how many of the
> benchmarks failed to compile or run on some of the implementations.

Indeed. Does anyone know why they failed? Were they mostly just too slow or
something more sinister?

Jon Harrop

unread,

Jun 15, 2005, 4:21:30 PM6/15/05

to

Matthew Fluet wrote:
> I don't understand what you think is suceptible to memoization.

I know very little about the internals of MLton so my ideas are likely to be
completely wrong. However, I can't help but think that during the 4s it
takes to recompile a 100LOC program, it must be redoing an awful lot of
computations (or parts thereof) that it has already done many times before.

> The
> first two major analyses/transformations that mlton performs are
> monomophisation and defunctionalization. Each of these requires the
> whole program to be performed, and neither yields information that is
> likely to be useful for compiling other programs.

But do they not yield information useful for compiling very similar
programs, or a completely different 100LOC program because most of the time
is spent recompiling the basis library?

> As to where MLton is spending time, compile your program with -verbose 2
> or -verbose 3 to see timing statistics.

2.7s in parseAndElaborate
3.93s in pre codegen
4.36s in compile SML
4.86s total

I should probably study MLton's internals before suggesting anything. :-)

Joachim Durchholz

unread,

Jun 15, 2005, 4:52:55 PM6/15/05

to

Jon Harrop wrote:

> David Hopwood wrote:
>
>>To me the most striking thing about those tables, is how many of the
>>benchmarks failed to compile or run on some of the implementations.
>
> Indeed. Does anyone know why they failed? Were they mostly just too slow or
> something more sinister?

Given that one of the entries is explicitly of the "took too long to
complete" flavor, I guess it's something more sinister :-(

Regards,
Jo

Matthew Fluet

unread,

Jun 15, 2005, 5:00:57 PM6/15/05

to

The "smith-normal-form" benchmark is infinite precision integer
intensive. SML/NJ has a very naive implementation of the IntInf
structure, while MLton uses the GNU MP library.

Matthew Fluet

unread,

Jun 15, 2005, 4:56:22 PM6/15/05

to

Jon Harrop wrote:
> David Hopwood wrote:
>
>>To me the most striking thing about those tables, is how many of the
>>benchmarks failed to compile or run on some of the implementations.
>
> Indeed. Does anyone know why they failed? Were they mostly just too slow or
> something more sinister?

I think most of it arises from implementations that haven't implemented
the final version of the Standard ML Basis Library.

Message has been deleted

Oleg Trott

unread,

Jun 15, 2005, 6:44:16 PM6/15/05

to

(repost)

Jon,

There is a rather subtle, but grave, error in your benchmark, making
SML programs appear 70% faster than they should. I am surprised no one
else caught it, considering the interest you benchmark generated.

The order in which the objects are grouped is very important for the
performance of the subsequent ray-tracing: witness the fact that
replacing "push_front" with "push_back" in the C++ version makes it run
150% faster.

While your Java, C++ and OCaml programs group the objects in the same
order, the SML program does not. When you manually unrolled the loops
inside the SML "create" function, you probably mixed up the order or
thought it was unimportant.

The fragment

let val objs = [aux (~r') (~r'), aux r' (~r'),
aux (~r') r', aux r' r', obj]

should be changed to

let val objs = [aux r' r', aux (~r') r',
aux r' (~r'), aux (~r') (~r'), obj]

Cheers,

Oleg Trott, Ph.D.

Jon Harrop

unread,

Jun 15, 2005, 6:47:04 PM6/15/05

to

Oleg Trott wrote:
>> > Jon, There is a rather subtle, but grave, error in your benchmark,
>> > making SML programs appear 70% faster than they should. I am

>> > surprised no one else caught it, considering the interest your

>> > benchmark generated. The order in which the objects are grouped is
>> > very important for the performance of the subsequent ray-tracing:
>> > witness the fact that replacing "push_front" with "push_back" in the
>> > C++ version makes it run 150% faster. While your Java, C++ and
>> > OCaml programs group the objects in the same order, the SML program
>> > does not. When you manually unrolled the loops inside the SML
>> > "create" function, you probably mixed up the order or thought it was
>> > unimportant. The fragment let val objs = [aux (~r') (~r'), aux r'

>> > (~r'), aux (~r') r', aux r' r', obj] in should be changed to let

>> > val objs = [aux r' r', aux (~r') r', aux r' (~r'), aux (~r') (~r'),

>> > obj] in Cheers, Oleg Trott, Ph.D.

Yes indeed. Good work that man! I'll correct the results and upload them...

Sam Lindley

unread,

Jun 15, 2005, 6:53:10 PM6/15/05

to

>> However, an Emacs mode
>> does not define the language.
>
> It does make the language useful though. Unless there is a better SML
> editor
> that I should be using?

The SML.NET plugin for Visual Studio has some nice features... (not sure if
anyone actually uses it though :-))

Sam

Jon Harrop

unread,

Jun 16, 2005, 12:03:48 AM6/16/05

to

Oleg Trott wrote:
> There is a rather subtle, but grave, error in your benchmark, making

> SML programs appear 70% faster than they should...

Thank you. I have just finished updating the pages with new results and
conclusions. Needless to say, SML looks much less appetising now...

Matthias Blume

unread,

Jun 16, 2005, 12:14:42 AM6/16/05

to

William Lovas <wlo...@force.stwing.upenn.edu> writes:

> On 2005-06-15, Stephen Weeks <swe...@sweeks.com> wrote:
>>> notice how O'Caml even gives you an example of a pattern you've
>>> missed! Oh, how i've wished SML/NJ would do that
>>
>> MLton gives examples for nonexhaustive matches.
>
> Yes, i know -- in general, i've found that MLton tends to produce better
> error messages than SML/NJ (where good =~ clear + concise). At the end
> of the day, my code has to work in SML/NJ 110.0.7 -- what we standardize
> on at CMU -- and with MLton's long compile times, i usually find it easier
> to develop at NJ's toplevel.

I *VERY STRONGLY* recommend not using 110.0.x. You should move on to
110.54. (That's not because of error messages, though.)

Matthias Blume

unread,

Jun 16, 2005, 1:08:03 AM6/16/05

to

Jon Harrop <use...@jdh30.plus.com> writes:

> Oleg Trott wrote:
>> There is a rather subtle, but grave, error in your benchmark, making
>> SML programs appear 70% faster than they should...
>
> Thank you. I have just finished updating the pages with new results and
> conclusions. Needless to say, SML looks much less appetising now...

I must say that I am growing irritated by this.

Let me quote from the "comparison" table on your web site and make some
comments:

> OCaml: Two compatible compilers (ocamlc and ocamlopt)
> SML: Many incompatible compilers (MLton, SML/NJ, ML-kit, PolyML, MoscowML, CML)

This should be: 2 versions of 1 OCaml compiler vs. many independently
developed compilers of a common language (SML). By the way, CML is
not a compiler but a library.

> OCaml: Fast 0.4s compile time
> SML: Slow 4s compile time with MLton...

Ok, I give you this one. OCaml is impressively fast to compile. But
then, even 4 seconds isn't the end of the world.

> or slow run time with any other compiler

The version that I translated for you pretty much matched OCaml's
performance on my machine. Of course, you don't seem to show that
version. (At least I couldn't find it anywhere on your website.)

> OCaml: Typically concise coding style
> SML: Typically verbose coding style

Bullshit! We've been through this one enough by now. Just because
there are some extra "end"s and some extra lines does not mean the
style is "verbose". It might take a few more tokens here and there
(depending on coding style), but that is irrelevant. To many of us it
is more readable, and to me that's what counts. By the way, your SML
coding style is far from "typical" SML coding style, so you have no
leg to stand on when you make claims like the above.

> OCaml: x86 and AMD64 versions
> Only x86

While true if you restrict your focus on Intel and AMD chips only,
this is more than just a bit misleading. A more honest statement
would be: "currently no AMD64 backend". Your version of the
statements sounds like OCaml supports twice as many architectures as
SML. (Yes, we need to get our act together wrt. AMD and 64-bit in
general. And we intend to do so.)

> OCaml: For and while loops
> SML: No imperative looping constructs

While true, it is not like there is a "feature" that is missing.
While- and for-loops are trivial to implement using tail recursion.
Tail recursion has the advantage that it does not force the programmer
to use imperative language features where this is not needed or
appropriate. As I said before, I never seem to write code that
screems for for- or while-loops -- except when I am transliterating
your code.

> OCaml: Labelled and optional function arguments
> SML: Records can be (ab)used to get the same effect

Why the "(ab)"? You claim to try and draw objective comparisons, but
this is clearly expressing personal bias. With apologies to Jacques,
one could be nasty and say that labeled arguments are compensating for
the lack of true records. :-)

> OCaml: Object orientation
> SML: Records can be (ab)used to get some of the effects

In my opinion, not having object orientation in the language is a
good thing.

> OCaml: Stacks, queues, sets, maps, hash tables
> SML: Vectors, slices

You forgot:
SML: stacks, queues, sets, maps, hash tables, priority queues, atoms (and more)

I'm sure OCaml has most if not all what SML has and then some, but the
things you list on your page are present in either case. Not fair.

> Finally, OCaml and SML should clearly be more popular than Java!

That one you got absolutely right!

Matthias